Shape-Based Classification of Partially Observed Curves, With Applications to Anthropology

We consider the problem of classifying curves when they are observed only partially on their parameter domains. We propose computational methods for (i) completion of partially observed curves; (ii) assessment of completion variability through a nonparametric multiple imputation procedure; (iii) development of nearest neighbor classifiers compatible with the completion techniques. Our contributions are founded on exploiting the geometric notion of shape of a curve, defined as those aspects of a curve that remain unchanged under translations, rotations and reparameterizations. Explicit incorporation of shape information into the computational methods plays the dual role of limiting the set of all possible completions of a curve to those with similar shape while simultaneously enabling more efficient use of training data in the classifier through shape-informed neighborhoods. Our methods are then used for taxonomic classification of partially observed curves arising from images of fossilized Bovidae teeth, obtained from a novel anthropological application concerning paleoenvironmental reconstruction.


INTRODUCTION
Modern functional and curve data come in all shapes and sizes (pun intended!). A particular type of functional data that is starting to receive attention in recent years consists of univariate functions that are only observed in sub-intervals of their interval domains. Names for such data objects abound: censored functional data [1]; functional fragments [2,3]; functional snippets [4]; partially observed functional data [5]. Similar work with multivariate functional data or parametric curves in R d (d ≥ 2) are conspicuous in their absence. The methodological focus of this paper, consequently, is twofold: develop easily implementable computational algorithms for completion of partially observed planar curves and assess completion variability; incorporate the completion procedure into a procedure to classify partially observed curves. An equally important objective relates to taxonomic classification of partial curves representing outlines of fossilized teeth of extant, southern African bovids (antelopes and buffaloes) extracted from a novel anthropological imaging dataset.
The leitmotif of our approach lies in the explicit use of shapes of curves as a mechanism to not only counter the ill-posed nature of the problem of "sensibly" imputing or completing the missing piece of a partially observed curve, but also to use the metric geometry of the shape space of curves profitably when developing a suitable classifier. The rationale behind using shapes of curves can be explained quite simply. Fundamental to the routine task of comparing and identifying objects by humans or a computer is an implicit understanding of a set of symmetries or transformations pertaining to its shape: those properties or features of the object that are unaffected by nuisance information (e.g., orientation of the camera under which the object is imaged). Such an understanding assumes added importance when the object is only partially observed (e.g., identifying a chair hidden behind a table based on the backrest only) since it eliminates the need to consider a substantially large class of possible completions of the object. In the context of partially observed curves, working with their shapes leads to completions that are compatible with the shapes of fully observed curves in a training dataset. Relatedly, from an operational perspective, any formulation of completion of a missing piece of a curve based on an endpoints-constrained curve, either through deterministic or probabilistic model-based techniques, suffers from having too many degrees-of-freedom. As a result, the parameter space of missing pieces to search over needs to be constrained to obtain meaningful curve completions; we propose to impose such a shape-related constraint.
For example, in the anthropological application, any sensible completion of a bovid tooth should assume the shape of a tooth. We can constrain the parameter space comprising open curves, with endpoints constrained to match that of the partially observed curve, while determining a sensible completion based on the requirement that the completion should be tooth-like. Figure 1 shows an example of using shape information to complete a bovid tooth using Algorithm 1 (Section 3) and compares it to an arbitrary completion devoid of explicit shape information.
An important consideration when considering shape of a curve is its scale. Strictly speaking, scaling a curve does not alter its shape, and it is hence a nuisance transformation. However, in our motivating application from anthropology, the size of bovid teeth is known to have important taxonomical information and can hence potentially improve discriminatory power in the classification problem [6]. We will therefore accord due consideration to scale information when comparing shapes of curves; in shape analysis vernacular, this is referred to as size-and-shape analysis. For simplicity, we will continue to chraracterize our approach as shape-informed.
Research in geometry-based statistical analysis of shapes of arbitrarily parameterized planar curves is quite mature; see, for example, [7,8] for foundational details and the R package fdasrvf for computational tools. Leveraging this, our main contributions are as follows.
1) We develop a gradient-based algorithm (Algorithm 1) for shape-informed partial matching and completion with respect to a complete template/donor curve. 2) In order to assess and visualize variability of completions from Algorithm 1, given a training dataset of fully observed curves, we propose an adaptation of the hot-deck imputation method used on traditional Euclidean data to generate several imputations or completions (Algorithm 2). 3) We propose two nearest neighbor classification procedures for partially observed curves based on shape distances by utlizing completions obtained from any of the above two algorithms.

Related Work
Partially observed curves arise as data in several applications. In medical imaging, the appearance of anatomies in images of various modalities is often summarized through the shapes of their outlines. Partial curves arise due to (i) low resolution and contrast of many medical imaging modalities (e.g., PET or CT); (ii) a boundary of an organ being obscured by other organs or hard to identify due to similar appearance of neighboring tissues [9]. In handwriting analysis, a key task is the segmentation of samples of handwriting (curves) into letters or syllables, followed by imputation of incomplete curves [10]. Shapes of occluded objects, such as tanks, are also routinely used in military applications, where only part of the object's boundary is reliable and the rest must be imputed based on prior shape knowledge [9].
There is a substantial literature on missing data and shape analysis, however, most of the work is restricted to data obtained as multivariate morphological measurements. For example, [11] examines missing data in the morphology of crocodile skulls based on linear morphometric measurements of the skulls, in contrast to using landmarks or entire outlines (curves). Missing data methods for landmark-based shape data have been developed in [12,13,14,15]. By defining landmarks on each shape, the problem can be framed in a more traditional statistical setting where each landmark can be thought of as a covariate and more traditional missing data techniques, such as the EM algorithm and multiple imputation, can be used. [16] look at four different methods for dealing with missing landmark data: Bayesian Principal Component Analysis (BPCA), least-squares regression, thin-plate splines (TPS), and mean substitution. Additional work on missing data can be found in [17], which focuses on missing data in Procrustes analysis, and [18], which considers occluded landmark data.
The work most closely related to this current study can be found in [19], which studies the problem of matching a partially observed shape to a full shape. [19] performs partial matching using the Square-Root Velocity framework, and this is the framework we use as well in the sequel. Our work in a certain sense goes beyond their work and incorporates missing data techniques into the completion procedure, and additionally is tailored towards the classification task. [9] incorporates prior shape information in Bayesian active contours that can be used to estimate object boundaries in images when the class of the object of interest is known; they demonstrate the usefulness of this approach when the object boundary is partially obscured. [20] considers the problem of identifying shapes in cluttered point clouds. They formulate a Bayesian classification model that also heavily relies on prior shape information. Finally, there is some recent work on missing data techniques for functional data analysis [21,22,23].

SHAPES OF PARAMETERIZED CURVES
The main objects of interest in this work are parameterized curves and their shapes. Defining a suitable distance metric to compare their shapes is of fundamental importance in order to suitably formalize the notion of a "best completion of a partial curve". From several available in the literature, we consider two distances that are suitable for our needs. We provide a description of the mathematical formulation for these two distances in the following, and refer the interested reader to [24] for more details. As discussed in the Introduction, the size of bovid teeth contains potentially taxon (class)-distinguishing information, and we hence consider the notion of size-andshape of a curve. Throughout, for ease of exposition, we simply say shape to mean size-and-shape.
Denote by S 1 the unit circle on the plane, and let β : S 1 → R 2 be an absolutely continuous, simple, parameterized closed curve representing the full outline of a bovid tooth. We will identify S 1 with the unit interval [0, 1] ⊂ R and enforce the endpoint constraint β(0) β(1) to represent a closed curve. Denote by B the space of all such β. If β 1 and β 2 are assumed to be parameterized according to arc-length, then β 1 − β 2 [ 1 0 |β 1 (t) − β 2 (t)| 2 dt] 1/2 is a viable distance between them, where |·| is the standard Euclidean norm in R 2 . In order to account for nuisance information that does not alter the shape of β 1 and β 2 , one must further remove variability due to translation and rotation. The two variabilities are accounted for by defining equivalence classes [β] {Oβ + T|O ∈ SO(2), T ∈ R 2 }, where SO(2) is the group of 2 × 2 rotation matrices, i.e., orthogonal matrices with determinant equal to 1. Note that the L 2 distance between β 1 and β 2 is unchanged if both curves are translated and rotated by the same T ∈ R 2 or O ∈ SO(2). Thus to compare the shapes of two curves β 1 and β 2 in B, we can use the non-elastic shape distance This optimization problem can be solved in a straightforward fashion through Procrustes analysis [25]. The distance is termed non-elastic as it requires one to fix curve parameterizations to arc-length. Note that while d NE is defined on B, it is in fact a distance on the shape space S β {[β]: β ∈ B} of arc-length parameterized closed curves consisting of equivalence classes as points. This ensures that d NE (β 1 , β 2 ) 0 if there exists (T, O) ∈ SO(2) × R 2 such that β 2 Oβ 1 + T; in other words, the distance measured with d NE is zero for two curves having the same shape.
If one desires to allow flexible parameterizations for shape analysis, the L 2 metric is no longer a feasible choice as it is not invariant to re-parameterizations: where c: S 1 → S 1 belongs to the class Γ of orientation-preserving diffeomorphisms of S 1 that represent re-parameterizations of curves in B. When S 1 is identified with [0, 1], elements of the group Γ can be viewed in the following manner. Consider the class of {c: R → R:c(t + 1) c(t) + 1, continuous and increasing}. Each function in the class is such thatc(t) − t is periodic with period 1. Moreover, each function of the class induces a reparameterization c s : S 1 → S 1 with c s (e 2πit ) e 2πic(t) , wherec is referred to as the lift of c s , which is then orientation-preserving. Operationally, the construction implies thatc can be expressed as c(t) c(t) + c for some c : [0, 1] → [0, 1], which is a diffeomorphism of [0, 1], except at t 1, and c ∈ (0, 1]. We thus construct a diffeomorphism of S 1 by "unwrapping" S 1 at some point s, and generating such a c by identifying s with 0 (and 1). Henceforth, we will refer to such a c as an orientiationpreserving reparameterization of S 1 , and carry out computations with [0, 1] as the parameterization domain.
Since re-parameterization completely preserves the image of a curve β, a distance based on a Riemannian metric that captures infinitesimal bending and stretching can be used. Several families of such metrics, termed as elastic have been considered [26,27]; however, almost all of them are difficult to compute in practice and require non-trivial approximations.
A solution to this key issue was proposed in [7]. Specifically, a particular elastic metric is related to the usual L 2 one when a curve is transformed bijectively to its Square-Root Velocity Function (SRVF): B ∋ β1Q(β) : q _ β| _ β| −1/2 ∈ L 2 , where _ β is the time-derivative. Under this transformation, q 1 − q 2 (q 1 , c is the reparameterization action on the SRVF. Translations are automatically removed by the use of the derivative. Let  [7,24] for more details.
The corresponding elastic distance d E between two curves β 1 , β 2 ∈ B is defined using their SRVFs, wherein in addition to rotations, re-parameterizations are also now optimized over: where the equivalence class (2), c ∈ Γ} now represents an elastic shape, i.e., an equivalence of q under the action of SO(2) and Γ, which can be applied in any order. The optimization over SO (2) is solved via Procrustes analysis as before, while the one over Γ is addressed using Dynamic Programming or a gradient descent algorithm. This process is referred to as registration: it provides an optimal, under the elastic metric, correspondence between the shapes of q 1 and q 2 . Details of computing d E can be found in [24]. Correspondingly, define the shape space of SRVF-transformed closed curves as S q {[q]: q ∈ Q}.
In summary, if closed planar curves representing outlines of bovid teeth are assumed to be arc-length parameterized, we can use the non-elastic distance d NE on the shape space S β to compare their shapes. On the other hand, if the curves are allowed to have arbitrary parameterizations, it is more appropriate to consider their SRVF transforms and the shape space S q , equipped with the elastic distance d E . Moreover, it is clear that the distances d NE and d E are valid for open curves as well, i.e., in the case β (0) ≠ β (1). We will use the distances for both open and closed curves without qualification; context will disambiguate their usage.

PARTIAL SHAPE MATCHING AND COMPLETION
We first focus on how a single partially observed curve can be completed. Indeed, this requires a template or donor curve that is fully observed, so that the partially observed one can be matched and compared to different pieces of the fully observed one. Once a match has been established, a completion can be subsequently determined. In principle, the two tasks can be carried out sequentially or in parallel; in this paper, we adopt the former approach and leave the latter for future work.
Accordingly, the key tasks are to (i) match the observed partial curve to a piece of the donor curve; (ii) impute or complete the observed curve by finding the closest match to the residual piece of the donor curve from a set of curves with fixed endpoints. These are non-trivial tasks since the set of curves B (and Q) is infinite-dimensional. The problem is made tractable by considering equivalence classes of curves that share the same shape and size, as defined earlier. Specifically, we propose to leverage the shape distances d NE and d E in Eqs 1, 2, and develop an optimization-based framework to carry-out completion/ imputation and classification sequentially. We first define some important quantities.
• A curve β is viewed as being composed of two pieces β o and β m , where the subscripts o and m identify the observed and missing portions of β, such that 1] , for some 0 < τ < 1. The corresponding SRVF q similarly decomposes into (q o , q m ) for the same τ.
• β β o + β m denotes the concatenation of β o and β m , i.e., the complete curve. Throughout, β o will denote a partially observed curve, which conceptually is understood to be the observed portion of a curve β; in similar fashion, β m will throughout represent the missing piece of β.
and L, then In line with our intention to use shape information of curves, we note that completion of β o with respect to a donor curve β donor can be broken down into the following two steps.
(i) Determine the piece of β donor that best matches the shape of β o . (ii) Determine an open β m curve that then best matches the shape of the residual piece of the donor in (i); the required completion is then β o + β m .
Two points are worth considering here. First, the optimal β m is constrained to share the same endpoints as the determined piece of β donor . Second, by virtue of its definition, the completion β o + β m exactly matches the partially observed curve β o when restricted to a suitable subset of the parameter domain. The latter is motivated by the quality of image data of bovid teeth, under which it is reasonable to assume that the partially observed curve is obtained under negligible measurement error.
The key consideration for developing an algorithm for the two steps is the choice of an objective function that quantifies the quality of matches, informed by either of the shape distances d NE and d E in Eqs 1, 2, respectively. Repeated computation of the elastic distance d E is computationally expensive (due to the additional optimization over Γ), and hence time-consuming inside an iterative algorithm (the potential donor set has large sample size). Since our main objective in this paper is classification of the partially observed curves, we use the more convenient non-elastic distance d NE in order to carry out partial matching and completion. However, we will employ the elastic distance d E when designing a classifier in order to better access pure shape features of curves that are potentially classdistinguishing. We consider a two-step algorithm based on optimizing an objective function over two parameter spaces: Ω P for the partial match in step (i), and Ω C for the completion step (ii), defined as: The parameter set Ω P consists of shape-preserving transformations for arc-length parameterized curves β and the length of β o , whereas Ω C represents the subset of endpoint constrained curves within B. In the partial matching step, a piece on the donor β donor of optimal length L* starting at s 1 * , rotation O* and translation T* is determined resulting in β donor . The domains [s 1 , s 2 ] of an arbitrary restriction β (s 1 ,s 2 ) donor and [0, τ] of β o are always rescaled to [0, 1] to ensure that they have the same domain. For a fixed s 1 , the optimal translation T* and rotation O* are given explicitly via Procrustes analysis (see, for example, [28]). The search for the optimal s * 1 and L* can then be performed exhaustively on [0, 1].
Note that β (s * 2 ,s * 1 ) donor refers to the residual piece on β donor once β donor is removed owing to the circular ordering on the parameter domain S 1 of β donor . The main challenge lies in carrying out step 3 of the algorithm in which the missing data β m is determined by searching over Ω C ⊂ B for the optimal curve that best matches the residual piece β (s * 2 ,s * 1 ) of β from the partial matching step. The challenge relates to the fact that Ω C is a nonlinear subset of B.
We propose to optimize over Ω C with a gradient-descent algorithm. First, rescale [s * 2 , s * 1 ] to [0, 1]. Then, consider an orthonormal basis {b i : [0, 1] → R 2 , i 1, 2, ...} with b i (0) 0 and b i (1) 0, which enforce the endpoints constraint on the curve. In particular, we use a modification of the Fourier basis for each of the two coordinate functions, given by sin(2πjt) Then, the gradient of E (β m ) at a current estimate β curr m can be approximated using directional derivatives along basis directions b i as where 〈·, ·〉 is the usual L 2 inner-product on B. A single gradient update in the completion algorithm is then given by where ϵ > 0 is a small step size. This is repeated until convergence. In practice, we reduce the dimension of the problem by truncating the basis at a finite number N; this additionally ensures that the optimal completion β * m is relatively smooth. Two preliminary results from this two-step algorithm for bovid teeth are shown in Figure 2. The black curve is the donor β donor , the red curve is β o , and the optimal completion β * m , after a set number of iterations, is in blue.

ASSESSING VARIABILITY IN COMPLETION THROUGH MULTIPLE IMPUTATION
Algorithm 1 describes how a partially observed curve β o can be completed given a donor curve β donor . The completion is deterministic and uncertainty estimates are unavailable. Further, the application of this procedure is only possible when a training dataset consisting of several curves is available. An attractive way to examine completion variability is to consider a multiple imputation framework for missing data. There are numerous multiple imputation methods to handle missing data in traditional multivariate settings; see [29,30] for a broad overview and details on missing data techniques. Our choice is a nonparametric hot-deck multiple imputation procedure. We describe this technique in a regression setting with response y ∈ R n and n × p design matrix X, where each case j 1, . . . , n is defined as the response-predictor pair (y j , x j ).
(i) Replace a missing value y miss of y with randomly selected observed values in y, chosen from a donor pool of fixed size K < n comprising fully observed cases that are"similar" to the incomplete case. (ii) Repeat step (i) M times to create M completed datasets. (iii) Analyze the M completed datasets independently (e.g., mean estimation) and combine results using Rubin's combining rules [29].
As a first step towards carrying out this program for partially observed curves, we propose an adaptation of the hot-deck imputation procedure for a classification task. However, the development of methods to combine classification results across M datasets, similar to Rubin's rules, is an interesting research problem in its own right, and we leave that for future work (Section 7). Once steps (i) and (ii) are completed, it is possible to visualize variability associated with completion using Algorithm 1 by plotting the completions. Moreover, variability in classification results can also be computed as a function of a donor set of size K and number of completed datasets M. Algorithm 2 outlines our adapted hot-deck imputation procedure for generating M completions of a partially observed curve β o given a training dataset D {β 1 , . . . , β n } consisting of n fully observed curves from B. The main Algorithm 1: Partial match and completion. 1 In practice, the basis is truncated to some finite number. We have found that between 40 and 80 total basis elements per coordinate function are sufficient, although this depends on the geometric complexity of the observed curves. In the application considered in Section 6, we used 80 basis elements. change to the classic hot-deck imputation procedure described above lies in how "similarity" between cases (here curves) is assessed in (i). In particular, steps 3-8 in Algorithm 2 are used to compute the (shape) similarity between a partially observed curve and each curve in the training dataset D. Then, the rest of (i) is carried out in step 9. Finally, (ii) is carried out in steps 10-13.
Note that once s * 1 and s * 2 are rescaled to 0 and 1, respectively, in line 5, the optimal partial match of β i to β o then corresponds to the piece β (0,1) The key feature of Algorithm 2 is the use of the elastic distance, albeit not exactly in the form defined in (2) since an optimal rotation O* ∈ SO(2) has already been determined in line 5-we hence refer to this distance as partial elastic distance. The rationale for this is as follows. Once a piece of the donor β i of length L* corresponding to parameter values s * 1 and s * 2 has been extracted, the lengths of β (s * 1 ,s * 2 ) i and β o can be quite different. Thus, computing the non-elastic distance between the two open curves under the assumption of arc-length parameterization in order to compare their shapes might not be appropriate. In contrast, in Algorithm 1, the distances themselves were not of chief interest. Our approach hence is to assume at this stage that β (s * 1 ,s * 2 ) i and β o are arbitrarily parameterized and hence use the partial elastic distance d E to compare their shapes using their corresponding SRVFs; this also explains why we ignore using the optimal translation T* resulting from Algorithm 1. To see that δ i in line 8 of Algorithm 2 is indeed the partial shape distance, note that when a particular rotation O* is fixed, from line 6 inf c∈Γ,O∈SO (2) Figure 3 provides an illustration of the hot-deck imputation procedure with M 10 and K 10 for two partially observed bovid teeth using Algorithm 2; see Section 6 for details on the bovid dataset.  Consider a training dataset D train d{(y i , β i )} n i 1 consisting of fully observed curves β i ∈ B and corresponding class labels y i ∈ {1, . . . , G}. The goal is to classify a partially observed curve β o to one of the G classes using training data D train .
A distance-based classification procedure is a natural choice, compatible with how completion and imputation is achieved. Accordingly, we consider the k n -nearest neighbor classification technique. A neighborhood of a curve in B can be defined with respect to both non-elastic and elastic shape distances d NE and d E . Effectively, although fully observed curves in D train assume values in B, the classification procedure will be defined on their shapes assuming values in the shape space S β (or S q under the SRVF transform).
The advantage of using the shape space lies in the fact that S β is made up of equivalence classes of B under the equivalence relation characterized by shape-preserving transformations. For a fixed radius r, neighborhoods as balls of radius r around a fixed point β* constructed on B using shape distances d NE are necessarily larger than corresponding ones on B using the usual L 2 distance, since, by virtue of its definition, for every r > 0, In a k n -nearest neighbor setting, the radius r is distance, say r kn , of the k th n closest curve to β*, and changes with the training data. However, r kn computed using the distance d NE will be smaller than one computed using the L 2 distance; thus one is able to find k n neighbors at a smaller distance from β*. This leads to better performance of the classifier for large sample size n and improves rates of convergence of the predicted class probabilities (see, for example, [31]). Similar comments apply to the elastic distance d E , albeit under subtler conditions since it is induced by the elastic Riemannian metric on B, which is not directly related to the L 2 metric on B.
We will use the elastic distance d E again to accommodate for the possibility that curves in D train and β o + β m can have arbitrary parameterizations following completion of β o with any β m . Let N k n (β o +β m )d{β i1 , . . . , β i kn } ⊂ D train be k n nearest curves to a particular completion β o + β m of β o in the training data. We consider two nearest neighbor classifiers.
The class probability is thus obtained by averaging over all completions obtained by sampling M donor curves with replacement from the donor set B donor in Algorithm 2. The knn-imp classifier is a novel extension of the knn classifier to accommodate variability in completions through the hot-deck multiple imputation procedure. However, at the outset, it is not clear if it will generally outperform the knn classifier, since performance will heavily depend on quality of the completion step in Algorithm 1 and shape variability in the training dataset.

TRIBE AND SPECIES CLASSIFICATION OF BOVID TEETH
We examine performance of Algorithm 1 and Algorithm 2 with respect to classification of images of fully and partially observed bovid teeth using the two nearest neighbor methods under two settings.
(i) A simulated setting, where curves pertaining to partially observed teeth are created from fully observed ones with known class labels. (ii) A real data setting comprising "true" partially observed teeth with unknown class labels.
There are numerous measures of classification performance. We use the log-loss measure to asses performance: let n p denote the number of partially observed curves β 1 o , . . . , β n p o to be classified in G classes {1, . . . , G} with unknown class labels y 1 , . . . , y n p and let D train denote the training dataset of fully observed curves. Then where the class probability is defined as earlier depending on whether the knn or knn-imp classifier is used. Evidently, a low Log-loss is indicative of good classification. Note that the Log-loss is positive with no upper bound. The Log-loss can be used only when the class labels are known. In the real data setting with unknown labels, the Log-loss is not used; instead, classification accuracy is assessed relative to classification done by an expert (co-author JKB). All computations are performed using routines available in the R [32] package fdasrvf [33] on a 16-node Intel Xeon-based computational cluster in the Computer Science Department at Loyola University Chicago. Full code for the analyses can be found on Github [34].
Our motivating application stems from anthropology, where fossil bovid teeth associated with our human ancestors are used to reconstruct past environments. Bovids are useful because they are ecologically sensitive to their environment and typically dominate the South African faunal assemblages [35,36,37,38].
The tooth images for our study were obtained from four institutions in South Africa: National Museum, Bloemfontein; Ditsong Museum, Pretoria; and Amathole Museum, King William's Town. Images were also taken at the Field Museum, Chicago, United States. The complete methodology as to how the teeth images were collected is outlined in [6]. Briefly, the occlusal surface of each of the three molars from the upper and lower dentitions for each extant bovid specimen were photographed separately. The specimen and the camera were levelled using a bubble level. A scale was placed next to the occlusal surface for every image.
Each tribe has unique dental characteristics that are shared by its members. Further, the complexity of the occlusal surface outline varies across the tribes. As such, considering shape for tribe classification is a natural endeavor in this application. Generally, classification at the species level is more difficult since the variability of shapes of occlusal surface outlines across species within the same tribe is not as large.

Simulated Setting
In this setting, a partially observed tooth was created from a fully observed one with known class label in D train , and a class probability is calculated using both nearest neighbor methods; the procedure is repeated for each tooth in D train and the Log-loss is computed for the knn and knn-imp classifiers for choices: A partially observed tooth was created in the following manner. The raw representation of each tooth in D train comprised of 60 points around the occlusal surface of the tooth that were obtained from the program MLmetrics [6,39]. For each tooth, the 60 points were split into two sets roughly divided by a line connecting the mesostyle to the entostyle in maxillary teeth and metastylid to the ectostylid in mandibular teeth. This type of cut was chosen as this break point is commonly observed in fossilized bovid teeth. Figure 4 provides an illustrative example of the procedure. Figure 5 shows Log-loss curves associated with the knn-imp classifier as a function of k n (number of neighbors) for the above-mentioned choices of M and K, and also the corresponding curve for the knn classifier. In all cases, for smaller values of the number of nearest neighbors chosen, the knn-imp classifier outperforms the knn classifier. As the number of nearest neighbors increases, not performing imputation performs as well or better than imputation in most cases. In fact, for some teeth there are certain combinations of M and K that are better in terms of Log-loss for imputation regardless of the choice k n of nearest neighbors. Figure 6 shows Log-loss curves for species classification and paints a fairly similar picture to Figure 5: when the number of Frontiers in Applied Mathematics and Statistics | www.frontiersin.org

Species classification
October 2021 | Volume 7 | Article 759622 8 nearest neighbors is chosen to be small (i.e., fewer than 5), there is always at least one imputation setting that has lower Log-loss than no imputation. However, in all cases as the number of nearest neighbors chosen gets close to 20, no imputation performs better in terms of Log-loss than doing imputation.

Real Data Setting
A small set n p 7 of real, partially observed fossil bovid teeth from the site of Gladysvale, South Africa, extracted from images, with unknown class labels were used. We use the image numbers to label them: IMG4825, IMG4980, IMG4983, IMG4990, IMG5139, IMG9973 and IMG5514. Training data D train is the same as the one used in the simulated setting. Recall that in the simulation setting, partially observed teeth had roughly half of the number of sampled points as the fully observed ones. In the real data setting, this is not the case: in four partially observed teeth (IMG4825, IMG4980, IMG4983, IMG9973), more than half of the tooth is observed; in one (IMG4990) less than half of the tooth is observed; and, in the remaining two (IMG5139, IMG5514), approximately half of the tooth is observed. This impacts the number of points chosen to represent (and parametrize) the open, partially observed curves. Since we cannot know the length of the missing piece for real partially observed curves, numbers of points to sample along the curves were determined based on expert advice from co-author JKB.
For both knn-imp and knn classifiers, we set the number of neighbors k n 10; for the knn-imp classifier we used K 10 and M 10. These choices were based on performances in the simulated setting. Table 2 shows predicted class probabilities associated with the knn classifier. Each row has an entry in bold indicating the "true" (according to an expert) class of these teeth. We observe that the classifications without imputation are highly accurate for the 7 teeth. One can see that 6 out of 7 of these teeth are classified to the correct tribe. In addition, the probability of belonging to the correct tribe in the 6 correctly classified teeth was 1. However, in the one case where the classification is wrong, the predicted probability was 0. Table 3 shows similar results for the knn-imp classifier. The same 6 teeth that were correctly classified before are again correctly classified, however, with probabilities that are all lower than 1. Again, the tooth that was incorrectly classified previously is again incorrectly classified and the probability predicted of belonging to the correct class is again 0. Notably, the partially observed tooth from IMG9973 was difficult to classify when using either classifier; interestingly, in both cases it was classified with high probability to the Neotragini class. Table 4 shows predicted class probabilities associated with the knn classifier. We saw in Tables 2, 3 that each of the 5 teeth from the Alcelaphini tribe was correctly classified, with probability at least 0.5. However, at the species level, only 2 of these 5 teeth (IMG4983, IMG4990) have high probability associated with the correct species; the three other teeth (IMG4825, IMG4980 and IMG5139) have a probability of belonging to the correct species of 0.4. In addition, for the two remaining teeth that belong to the Tragelaphini and Antilopini tribes, the predicted probability for the correct species was 0.1 and 0, respectively.

DISCUSSION
We have presented a computational approach for classifying partially observed curves. In particular, we presented two algorithms to complete and classify partially observed planar curves and simultaneously assess variability involved with the completion through a multiple imputation procedure. To our knowledge, this is the first work in literature to explicitly use the notion of shapes of parameterized curves in addressing the problem considered from the missing data perspective; coarsening the parameter space of suitable open curves, from which the partially observed curves are completed, through the notion of shape equivalence results in sensible completions. Moreover, shape-based distances used to define classifiers deliver satisfactory classification performance. The results from application of the algorithms to the dataset of images of bovid teeth are quite promising and are deserving of further, extensive, investigation involving several different classifiers.
Through the application of the proposed framework on real data, we have found that hot-deck imputation can sometimes deteriorate classification performance; there is an intuitive explanation for these findings. Classification performance is greatly affected by the "amount of information" contained in the observed partial curve. By "amount of information", we specifically mean the ability to discriminate between different classes. In particular, if the observed partial curve contains a lot of information about its class membership compared to the missing portion, then imputation injects additional variability into the problem, which has a negative effect on classification performance. On the other hand, if the observed partial curve is not easily distinguishable across the different classes in the training data, then the variability coming from the imputation procedure provides valuable information, thus improving classification performance. Knowledge about information content in an observed partial curve for classification can be obtained either from a training dataset consisting of fully observed curves with class labels or from a subject matter expert. In such cases, a Bayesian classification model with a judicious choice of prior on class-specific templates can be developed; such an approach will extend the one recently proposed for univariate functional data [23] to the curve setting, and constitutes ongoing work.
As with any methodological development that represents a first foray into tackling a challenging problem, our approach suffers from a few shortcomings, which inevitably present many possible avenues for future research. Algorithm 1 can be improved. Ideally, the partial match and completion steps are carried out jointly. Moreover, assuming curves to be arc-length parameterized, while convenient, can sometimes be unrealistic in practice, especially when data curves are extracted as part of an elaborate pre-   processing procedure. This points towards developing a version of Algorithm 1 based on the corresponding SRVFs q o and q m ; the main challenge here is how to handle the interplay between points {s * 1 , s * 2 } and their images {c(s * 1 ), c(s * 2 )} under arbitrary reparameterizations. In the current work, an explicit statistical model to handle the several sources of variation (e.g., measurement error in extracting curves from images) that can profoundly affect both completion and classification is conspicuous in its absence; without such a model, it is difficult to quantify uncertainty about the completions, which quite naturally percolates down to the classification task. An attractive model-based approach is to not just estimate the missing piece of the partially observed curve, but instead estimate an entire template that has a portion that is very similar in shape to the partially observed curve. Such an approach has recently been used for traditional univariate functional data under a Bayesian formulation [23] and appears promising.
Our primary task in this paper is classification. However, it is unclear how one can use the proposed algorithms if interest was in computing statistical summaries in the presence of partially observed curves, such as the mean shape or PCA on the space of shapes. For example, output of Algorithm 2 is a set of M closed curves β o +β l m , l 1, . . . , M with the property that each β o +β l m exactly matches β o on a subset of the parameter domain; it is not clear how the M completions can be combined (e.g., a Karcher mean of closed curves) to construct a representative summary completion. This is related to how estimates from imputations can be combined with a handle on within and across sample variabilities using formal rules (e.g., Rubin's rules). Development of such general rules in the present setting is far from straightforward.
More generally, while the hot-deck imputation procedure worked reasonably well when combined with the completion task, there is a pressing need to systematically develop missing data concepts and imputation methods to better address the special structure of missingness in the context of shapes of curves. The following challenges naturally arise: (i) Is the notion of Missing Completely at Random (MCAR), so profitably used in traditional settings, ever a reasonable assumption for shapes of curves? It is almost impossible to disentangle measurement error from reasons for why a piece of a curve is missing. (ii) Conditional probability measures associated with random functions when conditioned on its values in a sub-domain are notoriously difficult, and rarely exist. Given this, how does one adapt, or perhaps circumvent, the traditional notion of sampling from the conditional distribution of the missing values conditioned on the observed values to the present setting? Much remains to be done in this direction.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.