A Variational Model for Data Fitting on Manifolds by Minimizing the Acceleration of a Bézier Curve

We derive a variational model to fit a composite B\'ezier curve to a set of data points on a Riemannian manifold. The resulting curve is obtained in such a way that its mean squared acceleration is minimal in addition to remaining close the data points. We approximate the acceleration by discretizing the squared second order derivative along the curve. We derive a closed-form, numerically stable and efficient algorithm to compute the gradient of a B\'ezier curve on manifolds with respect to its control points, expressed as a concatenation of so-called adjoint Jacobi fields. Several examples illustrate the capabilites and validity of this approach both for interpolation and approximation. The examples also illustrate that the approach outperforms previous works tackling this problem.


Introduction
This papers addresses the problem of fitting a smooth curve to data points d 0 , . . ., d n lying on a Riemannian manifold M and associated with real-valued parameters t 0 , . . ., t n .The curve strikes a balance between a data proximity constraint and a smoothing regularization constraint.
Several applications motivate this problem in engineering and the sciences.For instance, curve fitting is of high interest in projection-based model order reduction of one-dimensional dynamical systems [PA16].In that application, the dynamical system depends on the Reynolds number and the model reduction is obtained by computing suitable projectors as points on a Grassmann manifold.Finding a projector is however a time-and memoryconsuming task.Based on projectors precomputed for given parameter values, fitting is used to approximate the projector associated with a new parameter value.In [GMM + 17], the same strategy is used to approximate wind field orientations represented as points on the manifold of positive semidefinite covariance matrices of size p and rank r.Further applications are involving rigid-body motions on SE(3), like Cosserat rods applications [San10], or orientation tracking [Par10].Sphere-valued data are also of interest in many applications of data analysis, for storm tracking and prediction, or the study of bird migration [SKK + 14].
There exists different approaches to tackle the curve fitting problem.Among others, we name here the subdivision schemes approach [Dyn09; WNG07] or the Lie-algebraic methods [Shi08].However, the most popular approach nowadays is probably to encapsulate the two above-mentioned constraints into an optimization problem where Γ is an admissible space of curves γ : [t 0 , t n ] → M, t → γ(t), D 2 dt 2 denotes the (Levi-Civita) second covariant derivative, • γ(t) denotes the Riemannian metric at γ(t) from which we can define a Riemannian distance d(•, •).Finally, λ ∈ R is a parameter that strikes the balance between the regularizer dt and the fitting term This approach leads to the remarkable property that, when the manifold M reduces to the Euclidean space and Γ is the Sobolev space H 2 (t 0 , t n ), the solution to (1) is the natural cubic spline [GS93].
Optimization on manifolds has gained a lot of interest this last decade, starting with the textbook [AMS08] that summarizes several optimization methods on matrix manifolds.Recently, toolboxes have emerged, providing easy access to such optimization methods, e.g., Manopt [BMA + 14] and MVIRT [Ber17].The former received a very positive return in many different topics of research, with applications for example in low-rank modelling in image analysis [ZYZ + 15], dimensionality reduction [CG15], phase retrieval [SQW17] or even 5G-like MIMO systems [YSZ + 16].The latter stems from recent interest in manifoldvalued image and data processing, phrased as variational models on the product manifold M N , where N is the number of pixels.Starting with total variation (TV) regularization of phase-valued data [SC11;SC13], different methods for TV on manifolds [LSK + 13; WDS14] have been developed as well as second order methods [BBS + 16; BLS + 14] up to infimal convolution [BFP + 17] and total generalized variation [BFP + 17; BHS + 18].Furthermore, different algorithms have been generalized to manifold-valued data, besides the previous works using gradient descent or cyclic proximal point methods, a Douglas-Rachford splitting [BPS16], iteratively reweighted least squares [GS16] and more general half-quadratic minimization [BCH + 16] have been introduced.
The curve fitting problem (1), has been tackled differently the past few years.Samir et al. [SAS + 12] considered the case where Γ is an infinite dimensional Sobolev space of curves, and used the Palais-metric to design a gradient descent algorithm for (1) (see, e.g., [SDK + 12] for an application of this approach).Another method consists in discretizing the curve γ in N points, and therefore consider Γ = M N (see, e.g., [BA11] for a result on SO(3)).Finally, the limit case where λ → 0 is already well studied and known as the geodesic regression [Fle13;KDL18;Ren11].
A recent topic concerns curve fitting by means of Bézier curves.In that approach, the search space Γ is reduced to as set of composite Bézier curves.Those are a very versatile tool to model smooth curves and surfaces for real-and vector-valued discrete data points (see [Far02] for a comprehensive textbook), but they can also be used to model smooth curves and surfaces for manifold-valued data [AGS + 16; PN07].The advantage to work with such objects, compared to classical approaches, are that (i) the search space is drastically reduced to the so-called control points of the Bézier curves (and this leads to better time and memory performances) and (ii) it is very simple to impose differentiability for the optimal curve, which is appreciated in several of the above-mentioned applications.However, while obtaining such an optimal curve reduces directly to solving a linear system of equations for data given on a Euclidean space, there is up to now no known closed form of the optimal Bézier curve for manifold valued data.
In this work, we derive a gradient descent algorithm to compute a differentiable composite Bézier curve B : [t 0 , t n ] → M that satisfies (1), i.e., such that B(t) has a minimal mean squared acceleration, and fits the set of n+1 manifold-valued data points at their associated time-parameters.We consider the manifold-valued generalization of Bézier curves [PN07] in the same setting as in Arnould et al. [AGS + 15], or more recently in [GMA18].We employ the (squared) second order absolute differences introduced in [BBS + 16] to obtain a discrete approximation of the regularizer from (1).The quality of the approximation depends only on the number of sampling points.We exploit the recursive structure of the De Casteljau algorithm [PN07] to derive the gradient of the objective function with respect to the control points of B(t).The gradient is built as a recursion of Jacobi fields that, for numerical reasons, are implemented as a concatenation of so-called adjoint Jacobi fields.Furthermore, the corresponding variational model only depends on the number of control points of the composite Bézier curve, and not on the number of sampling points.We finally obtain an approximating model to (1) that we solve with an algorithm only based on three tools on the manifold: the exponential map, the logarithmic map, and a certain Jacobi field along geodesics.
The paper is organized as follows.We introduce the necessary preliminaries -Bézier curves, Riemannian manifolds and Riemannian second order finite differences-in Section 2. In Section 3 we derive the gradient of the discretized mean squared acceleration of the composite Bézier curve with respect to its control points, and thus of the regularizer of (1).In Section 4, we present the corresponding gradient descent algorithm, as well as an efficient gradient evaluation method, to solve (1) for different values of λ.The limit case where λ → ∞ is studied as well.Finally, in Section 5, we validate, analyze and illustrate the performance of the algorithm for several numerical examples on the sphere S 2 and on the special orthogonal group SO(3).We also compare our solution to existing Bézier fitting methods.A conclusion is given in Section 6.

Bézier functions and composite Bézier spline
Consider the Euclidean space R m .A Bézier curve of degree K ∈ N is a function β K : [0, 1] → R m parametrized by control points b 0 , . . ., b K ∈ R m and taking the form where B j,K (t) = K j t j (1 − t) K−j are the Bernstein basis polynomials [Far02].For example the linear Bézier curve β 1 is just the line segment (1 − t)b 0 + tb 1 connecting the two control points b 0 and b 1 .The explicit formulae of the quadratic and cubic Bézier curves read A composite Bézier curve is a function B : [0, n] → R m composed of n Bézier curves and defined as where K i ∈ N denotes the degree of the i th Bézier curve β K i of B and b i j , j = 0, . . ., K i , are its control points.Furthermore, B(t) is continuous if the last and first control points of two consecutive segments coincide [Far02].We introduce p i as the point at the junction of two consecutive Bézier segments, i.e., p i : Differentiability is obtained if the control points of two consecutive segments are aligned.We introduce further b .
Example 1 (C 1 conditions for composite cubic Bézier curves).We consider the case where the Bézier segments are all cubic, as represented in Fig. 1.The composite Bézier curve

Riemannian Manifolds
We consider a complete m-dimensional Riemannian manifold M. We refer to [dCar92;ONe66] for an introduction to Riemannian manifolds and to [AMS08] for optimization thereon.We will use the following terminology and notations.
We denote by T a M the (Euclidean) tangent space to M at a ∈ M; T M := ∪ a∈M T a M is the tangent bundle to M; •, • a denotes the inner product in the tangent space T a M at a and from which we deduce the norm of v ∈ T a M denoted by v a = v, v a .For a (not necessarily unique) shortest geodesic between a and b ∈ M, we write g(•; a, b) : R → M, t → g(t; a, b), parametrized such that g(0; a, b) = a and g(1; a, b) = b.This choice of parametrization also means that the covariant derivative D dt of g with respect to time is called the Riemannian logarithm which is (locally) the inverse of the exponential.A Riemannian manifold is called symmetric in x ∈ M if the geodesic reflection s x at x ∈ M given by the mapping γ(t) → γ(−t) is an isometry at least locally near x, for all geodesics through γ(0) = x.If M is symmetric in every x ∈ M, the manifold is called (Riemannian) symmetric space or symmetric manifold.
In the following we assume, that both the exponential and the logarithmic map are available for the manifold and that they are computationally not too expensive to evaluate.Furthermore, we assume that the manifold is symmetric.

Composite Bézier curves on manifolds
One well-known way to generalize Bézier curves to a Riemannian manifold M is via the De Casteljau algorithm [PN07, Sect.2].This algorithm only requires the Riemannian exponential and logarithm and conserves the interpolation property of the first and last control points.Some examples on interpolation and fitting with Bézier curves or Bézier surfaces, i.e., generalizations of tensor product Bézier curves, can be found in Absil et al.
, the manifold-valued Bézier curve of order K driven by K + 1 control points b 0 , . . ., b K ∈ M. We introduce the points x [0] i = b i and iterate the construction of further points.For i = 0, . . ., K − k, k = 1, . . ., K, we define as the i th point of the k th step of the De Casteljau algorithm, and obtain The De Casteljau algorithm is illustrated on Fig. 2 for a Euclidean cubic Bézier curve β 3 (t; b 0 , b 1 , b 2 , b 3 ).The general cubic Bézier curve can be explicitly expressed on a manifold M as 3 ) = g t; g t; g(t; x The conditions of continuity and differentiability are generalized to manifolds in [PN07]. Lemma 2 (Differentiability conditions [PN07]).Consider the composite Bézier curve The composite Bézier curve B(t) is continuous and continuously differentiable if the two following conditions hold: The composite Bézier curve B : [0, n] → M is then defined completely analogously to the Euclidean case from Eq. (2).The differentiability conditions (4) of Lemma 2 have to be satisfied at each junction point p i , i = 1, . . ., n − 1. Example 3 (Composite cubic Bézier curves).The composite cubic Bézier curve See Fig. 1 for an example on M = R with n = 5 segments and Fig. 3 for an example on M = S 2 with n = 3 segments.

Discrete approximation of the mean squared acceleration
We discretize the mean squared acceleration (MSA) of a curve γ : [0, 1] → M, by discretizing the corresponding integral i.e., the regularizer from (1).We approximate the squared norm of the second (covariant) derivative by the second order absolute finite difference introduced by Bačák et al.
Consider three points x, y, z ∈ M and the set of mid-points of x and z for all (not necessarily shorest) geodesics g connecting x and z.The manifold-valued second order absolute finite differences is defined by This definition is equivalent, on the Euclidean space, to

and obtain by the trapezoidal rule
For Bézier curves γ(t) = B(t) we obtain for the regularizer in (1) the discretized MSA A(b) that depends on the control points b and reads 3 The gradient of the discretized mean squared acceleration In order to minimize the discretized MSA A(b), we aim to employ a gradient descent algorithm on the product manifold M M , where M is the number of elements in b.In the following, we derive a closed form of the gradient ∇ b A(b) of the discretized MSA (7).This gradient is obtained by means of a recursion and the chain rule.In fact, the derivative of ( 6) is already known [BBS + 16], such that it only remains to compute the derivative of the composite Bézier curve.We first introduce the following notation.We denote by M the directional derivative of f : M → M evaluated at x 0 , with respect to its argument x and in the direction η ∈ T x M. We use the short hand whenever this directional derivative is evaluated afterwards again at x.
We now state the two following definitions, which are crucial for the rest of this section.
x is defined as the tangent vector that fulfills For multivariate functions f (x, y), we denote the gradient of f with respect to x at (x 0 , y 0 ) by writing ∇ M,x f (x 0 , y 0 ).We shorten this notation as ∇ M,x f = ∇ M,x f (x, y) when this gradient is seen as a function of x (and y).
Definition 5 (Chain rule on manifolds [AMS08,p. 195]).Let f : M → M, h : M → M be two functions on a manifold M and where The remainder of this section is organized in four parts.We first recall the theory on Jacobi fields in Section 3.1 and their relation to the differential of geodesics (with respect to start and end point).In Section 3.2, we apply the chain rule to the composition of two geodesics, which appears within the De Casteljau algorithm.We use this result to build an algorithmic derivation of the differential of a general Bézier curve on manifolds with respect to its control points (Section 3.3).We extend the result to composite Bézier curves in Section 3.4, including their constraints on junction points p i to enforce the C 1 condition (4), and finally gather these results to state the gradient ∇ M A(b) of the discretized MSA (7) with respect to the control points.
Figure 4: Schematic representation of the variation Γ g,ξ (s, t) of a geodesic g w.r.t. the direction ξ ∈ T x M. The corresponding Jacobi field along g and in the direction ξ is the vector field

Jacobi fields as derivative of a geodesic
In the following, we introduce a closed form of the differential D x g(t; •, y) of a geodesic g(t; x, y), t ∈ [0, 1], with respect to its start point x ∈ M. The differential with respect to the end point y ∈ M can be obtained by taking the geodesic g(t, y, x) = g(1 − t; x, y).
As represented in Fig. 4, we denote by γ x,ξ , the geodesic starting in γ x,ξ (0) = x and with direction D dt γ x,ξ (0) = ξ ∈ T x M. We introduce ζ(s) := log γ x,ξ (s) y, the tangential vector in T γ x,ξ (s) M pointing towards y.Then, the geodesic variation Γ g,ξ (s, t) of the geodesic g(•; x, y) with respect to the tangential direction ξ ∈ T x M is given by where ε > 0. The corresponding Jacobi field J g,ξ along g is then given by the vector field that represents the direction of the displacement of g if x is perturbed in a direction ξ.
We directly obtain J g,ξ (0) = ξ, and J g,ξ (1) = 0 as well as J g,ξ (t) ∈ T g(t;x,y) M. Since Γ g,ξ (s, t) = g(t; γ x,ξ (s), y) we obtain by the chain rule Remark.This relation between the derivative of geodesics and Jacobi fields is of high interest on symmetric spaces, where Jacobi fields can be computed in closed form, as summarized in the following Lemma.
where the Jacobi field J g,ξ : R → T g(t;x,y) M along g and in the direction ξ is given by with d g = d(x, y) denoting the length of the geodesic g(t; x, y), t ∈ [0, 1].
The Jacobi field of the reversed geodesic ḡ(t) := g(t; y, x) = g(1 − t; x, y) is obtained using the same orthonormal basis and transported frame but evaluated at s = 1 − t.We thus obtain D y g(t; x, y)

Derivative of coupled geodesics
Let M be a symmetric Riemannian manifold.We use the result of Lemma 6 to directly compute the derivative of coupled geodesics, i.e., a function composed of g 1 (t) := g(t; x, y) and g 2 (t) := g(t; g 1 (t), z).By Definition 5, we have and by (10), we obtain where the variation direction used in the Jacobi field is now the derivative of g 1 (t) in direction η.Similarily, we compute the derivative of a reversed coupled geodesic g 3 (t) := g(t; z, g 1 (t)) as Note that the Jacobi field is here reversed, but that its variation direction is the same as the one of the Jacobi field introduced for g 2 (t).In a computational perspective, it means that we can use the same ONB for the derivatives of both g 3 and ḡ3 Furthermore, in this case, the variation direction is also computed by a Jacobi field since Finally the derivative of g 2 (resp.g 3 ) on symmetric spaces is obtained as follows.Let {ξ m } be an ONB of T x M for the inner Jacobi field along g 1 , and {ξ m } be an ONB of T g 1 (t) M for the outer Jacobi field along g 2 (resp.
and stating l ∈ T g 1 (t) M, the derivative of g 2 (resp.g 3 ) with respect to x in the direction η ∈ T x M reads and accordingly for g 3 .

Derivative of a Bézier curve
Sections 3.1 and 3.2 introduced the necessary concepts to compute the derivative of a general Bézier curve β K (t; b 0 , . . ., b K ), as described in Equation (3), with respect to its control points b j .For readability of the recursive structure investigated in the following, we introduce a slightly simpler notation and the following setting.
Let K be the degree of a Bézier curve β K (t; b 0 , . . ., b K ) with the control points b 0 , . . ., b K ∈ M. We fix k ∈ {1, . . ., K}, i ∈ {0, . . ., K − k} and t ∈ [0, 1].We introduce for the i th Bézier curve of degree k in the De Casteljau algorithm, and g its derivative with respect to one of its control points x in the direction η ∈ T x M.

Remark. Clearly any other derivative of g [k]
i with respect to x = b j , j < i or j > i + k is zero.In addition we have η i [η] = η for x = b i and zero otherwise.
Theorem 7 (Derivative of a Bézier curve).Let k ∈ {1, . . ., K} and i ∈ {0, . . ., K − k} be given.The derivative η with respect to its control point x := b j , i ≤ j ≤ i + k, and in the direction η ∈ T x M is given by For readability we set a := g i+1 (t), and f := g We prove the claim by induction.For k = 1 the function g i is just a geodesic.The case i < j < i + 1 does not occur and the remaining first and third cases follow by the notation introduced for k = 0 and Lemma 6.
For k > 1 we apply the chain rule (9) to D x f [η] and obtain i , while the dashed one represents a reversed Jacobi field.
Consider the first term D a f D x a[η] and j < i + k.By (10) and the notation from ( 14), one directly has We proof the second term similarily.For j > i, by applying the chain rule and using the reversed Jacobi field formulation of Lemma 6 for the derivative of a geodesic with respect to its end point, we obtain  Using the notations (13), we have The derivative of β 2 at t with respect to b 0 in the direction η ∈ T bo M, is given by 0 ,η (t).The derivative of β 2 at t with respect to b 2 in the direction η ∈ T b 2 M can be seen as deriving by the first point after inverting the Bezier curve, i.e. looking at β2 (t) = β 2 (1 − t), hence we have analogously to the first term The case D b 1 β 2 [η], η ∈ T b 1 M, involves a chain rule, where b 1 appears in both g [1] 0 (as its starting point) and g 1 (as its end point).Using the two intermediate results (or Jacobi fields of geodesics) η [1] Example 9 (Cubic Bézier curve).Consider the cubic Bézier curve As in Example 8, we use the notations (13) and define j+1 ), j = 0, 1, and 1 ).
The derivation of β 3 with respect to b 0 or b 3 follows the same structure as in Example 8.The case of D b 1 β 3 [η], however, requires two chain rules.The needed Jacobi fields follow the tree structure shown in Fig. 6b: given η ∈ T b 1 M, we define at the first recursion step 1 ,η (t), and at the second recursion step 1 ,η [1] 1

(t).
Note that both η The case of D b 2 β 3 [η] is obtained symmetrically, with arguments similar to b 1 .

Joining segments and deriving the gradient
In this subsection we derive the differential of a composite Bézier curve B(t) consisting of n segments and take the C 1 conditions into account.We simplify the notations from Section 2.1 and set the degree fixed to K i = K for all segments, e.g., K = 3 for a cubic composite Bézier curve.Then the control points are b i j , j = 0, . . ., K, i = 0, . . ., n − 1.We further denote by p i = b i−1 K = b i 0 , i = 1, . . ., n − 1 the common junction point of the segments and p 0 and p n the start and end points, respecively.For ease of notation we denote by b − i = b i K−1 and b + i = b i 1 , i = 1, . . ., n − 1, the two points needed for differentiability (C 1 ) condition investigation, cf.Fig. 1 for an illustration of the framework on M = R, with K = 3.
Tree-representation of the construction of a cubic Bézier curve.The thick line tracks the propagation of b 1 within the tree.
(b) Tree-representation of the recursive construction of η  One possibility to enforce the C 1 condition (4) is to include it into the composite Bézier curve by replacing b + i with This way both the directional derivatives of B(t) with respect to b + i and p i change due to a further (most inner) chain rule.
Lemma 10 (Derivative of a composite Bézier curve with C 1 condition).Let B be a composite Bézier curve and p i , b + i , b − i introduced as above.Replacing b + i = g(2; b − i , p i ) eliminates that variable from the composite Bezier curve and keeps the remaining differentials unchanged, despite the following which now read and In both cases, the first interval includes i − 1 = 0, when i = 1.
Proof.Both first cases are from the derivation of a bezier curve as before, for both second cases replacing b + i = g(2; b − i , p i ) yields one (for p i additional) term.
We now derive the gradient of the objective function (7).We introduce the abbreviation Theorem 11.Let M be a m-dimensional manifold, x be one of the control points of a composite Bézier curve B, and {ξ 1 , . . ., ξ m } be a corresponding orthonormal basis (ONB) of T x M. The gradient ∇ M,x A(b) of the discretized mean squared acceleration A(b) of B, w.r.t.x, discretized at N + 1 equispaced times t 0 , . . ., t N is given by Proof.As ∇ M,x A(b) ∈ T x M, we seek for the coefficients a := a (x) such that Therefore, for any tangential vector η := m =1 η ξ ∈ T x M, we have, By definition of A and Definition 4 this yields We compute D x d 2 2,i [η] using the chain rule (Definition 5) as which, by Definition 4, again becomes The term on the left of the inner product is given in [BBS + 16, Sec.3] and the right term is given in Section 3.3.While the former can be computed using Jacobi fields and a logarithmic map, the latter is the iteratively coupling of Jacobi fields.Furthermore, the differential D x B j [η] can be written as Hence, we obtain and by ( 17), ( 18) and ( 19), it follows which yields the assertion (16).

Application to the fitting problem
The fitting problem has been tackled different ways this last decades.The approach with Bézier curves is more recent, and we refer to [AGS + 16; AGS + 15; GMA18] for a detailed overview of these methods.
In this section, we present the numerical framework we use in order to fit a composite Bézier curve B to a set of data points d 0 , . . ., d n ∈ M associated with time-parameters t 0 , . . ., t n , such that we meet (1).For the sake of simplicity, we limit the study to the case where t i = i, i = 0, . . ., n.Therefore, the fitting problem (1) becomes where Γ B ∈ M M is the set of the M control points of B. Remark that, compared to (1), the optimization is now done on the product manifold M M .Furthermore, the fitting term now implies a distance between d i and p i , as The section is divided in three parts: the product manifold M M is defined in Section 4.1, where the contribution of the fitting term in the gradient of E is also presented.Then, we propose an efficient algorithm to compute the gradient of the discretized MSA, based on so-called adjoint Jacobi fields.We finally shortly mention the gradient descent algorithm we use as well as the involved Armijo rule.

Fitting and interpolation
Let us clarify the set Γ B ∈ M M from (20).We will explicitly present the vector b and state its size M .The set Γ B is the set of the M remaining free control points to optimize, when the C 1 continuity constraints are imposed.We distinguish two cases: (i) the fitting case, that corresponds to formulae presented in Section 3, and (ii) the interpolation case (λ → ∞) where the constraint d i = p i is imposed as well.
For a given composite Bézier curve B : [0, n] → M consisting of a Bézier curve of degree K on each segment, and given the C 1 conditions (4), the segments are determined by the points We investigate the length of M. First, b + i is given by p i−1 and b − i via (4).Second, for the segments i = 2, . . ., n we can further omit p i−1 since the value also occurs in the segment i as last entry.Finally, the first segment contains the additional value of b + 0 that is not fixed by C 1 constraints.The first segment is thus composed of K + 1 control points, while the n − 1 remaining segments are determined by K − 1 points.In total we obtain M = n(K − 1) + 2 control points to optimize.
Minimizing A(b) alone leads to the trivial solution, for any set of control points b = (x, . . ., x), x ∈ M, and this is why the fitting term from (20) is important.
Fitting (0 < λ < ∞).If the segment start and end points are obstructed by noise or allowed to move, we employ a fitting scheme to balance the importance given to the data points d 0 , . . ., d n .Equation (20) reads where λ ∈ R + sets the priority to either the data term (large λ) or the mean squared acceleration (small λ) within the minimization.The gradient of the data term is given in [Kar77], and the gradient of Ã is given by Interpolation (λ → ∞).For interpolation we assume that the start point p i−1 and end point p i of the segments i = 1, . . ., n are fixed to given data d i−1 and d i ∈ M, respectively.The optimization of the discrete mean squared acceleration A(b) reads Since the p i are fixed by constraint, they can be omitted from the vector b.We obtain Since there are n + 1 additional constraints, the minimization is hence performed on the product manifold

Adjoint Jacobi fields
In the Euclidean space R m , the adjoint operator T * of a linear bounded operator T : R m → R q is the operator fulfilling The same can be defined for a linear operator S : T x M → T y M, x, y ∈ M, on a mdimensional Riemannian manifold M. The adjoint operator S * : T y M → T x M satisfies S(η), ν y = η, S * (ν) x , for all η ∈ T x M, ν ∈ T y M.
We are interested in the case where S is the differential operator D x of a geodesic F (x) = g(t; x, y) for some fixed t ∈ R and y ∈ M. The differential D x F : T x M → T F (x) M can be written as where α are the coefficients of the Jacobi field (11), and ξ , Ξ (t) are given as in Lemma 6.To derive the adjoint differential (D x F ) * : T F (x) M → T x M we observe that for any ν ∈ T F (x) M we have Hence the adjoint differential is given by We introduce the adjoint Jacobi field J * F,ν : R → T x M, ν ∈ T F (x) M as Note that evaluating the adjoint Jacobi field J * involves the same transported frame {Ξ 1 (t), . . .Ξ m (t)} and the same coefficients α as the Jacobi field J, which means that the evaluation of the adjoint is in no way inferior to the Jacobi field itself.
The adjoint D * of the differential is useful in particular, when computing the gradient Especially for the evaluation of the gradient of the composite function h • F we obtain The main advantage of this technique appears in the case of composite functions, i.e., of the form h • F 1 • F 2 (the generalization to composition with K functions is straightforward).
The gradient The recursive computation of η [3] = ∇ M,x h(x) is then given by the following algorithm Example 12.For h = d 2 2 : M 3 → R we know ∇ M 3 h by Lemma 3.1 and Lemma 3.2 from [BBS + 16].Let t 1 , t 2 , t 3 ∈ [0, 1] be time points, and b ∈ M M be a given (sub)set of the control points of a (composite) Bézier curve B. We define F : )) as the evaluations of B at the three given time points.The composition h • F hence consists of (in order of evaluation) the geodesic evaluations of the De Casteljau algorithm, the mid point function for the first and third time point and a distance function.
The recursive evaluation of the gradient starts with the gradient of the distance function.Then, for the first and third arguments, a mid point Jacobi field is applied.The result is plugged into the last geodesic evaluated within the De Casteljau "tree" of geodesics.At each geodesic, a tangent vector at a point g(t j ; a, b) is the input for two adjoint Jacobi fields, one mapping to T a M, the other to T b M.This information is available throughout the recursion steps anyways.After traversing this tree backwards, one obtains the required gradient of h • F .
Note also that even the differentiability constraint (4) yields only two further (most outer) adjoint Jacobi fields, namely (2).They correspond to variation of the start point b − i and the end point p i , respectively as stated in (10).
For completeness the algorithm is given in Algorithm 1.The step sizes are given by the Armijo line search condition presented in [AMS08, Def.4.2.2].Let N be a Riemannian manifold, x = x (k) ∈ N be an iterate of the gradient descent, and β, σ ∈ (0, 1), α > 0. Let m be the smallest positive integer such that We set the step size to s k := β m α in Algorithm 1.
As a stopping criterion we use a maximal number k max of iterations or a minimal change per iteration d N (x k , x k+1 ) < ε.In practice, this last criterion is matched first.
The gradient descent algorithm converges to a critical point if the function F is convex [AMS08, Sec.4.3 and 4.4].The mid-points model (6) posesses two advantages: (i) the complete discretized MSA (7) consists of (chained) evaluations of geodesics and a distance function, and (ii) it reduces to the classical second order differences on the Euclidean space.However, this model is not convex on general manifolds.An example is given in the arXiv preprint (version 3) of [BBS + 16]1 , Remark 4.6.Another possibility (also reducing to classical second order differences in Euclidean space) is the so-called Log model (see, e.g., [Bou13]) The (merely technical) disadvantage of the Log model is that the computation of the gradient involves further Jacobi fields than the one presented above, namely to compute the differentials of the logarithmic map both with respect to its argument D x log y x as well as its base point D y log y x.Still, these can be given in closed form for symmetric Riemannian manifolds [BFP + 17, Th. 7.2][Per18, Lem.2.3].To the best of our knowledge, the joint convexity of the Log model in x, y and z is still an open question.

Examples
In this section, we provide several examples of our algorithm applied to the fitting problem (20).
We validate it first on the Euclidean space and verify that it retrieves the natural cubic smoothing spline.We then present examples on the sphere S 2 and the special orthogonal group SO(3).We compare our results with the fast algorithm of Arnould et al. [AGS + 15], generalized to fitting, and the so-called blended cubic splines from [GMA18].The control points of the former are obtained by generalizing the optimal Euclidean conditions of (20) (in R m , this is a linear system of equations) to the manifold setting; the curve is afterwards reconstructed by a classical De Casteljau algorithm.In the latter, the curve is obtained as a blending of solutions computed on carefully chosen tangent spaces, i.e., Euclidean spaces.
The following examples were implemented in MVIRT [Ber17] 2 , and the comparison implementations from [AGS + 15] and [GMA18] use Manopt [BMA + 14] 3 .Note that both toolboxes use a very similar matrix structure and are implemented in Matlab, such that the results can directly be compared.

Validation on the Euclidean space
As a first example, we perform a minimization on the Euclidean space M = R 3 .A classical result on R m is that the curve γ minimizing (1) is the natural (C 2 ) cubic spline when Γ is a Sobolev space H 2 (t 0 , t n ).We compare our approach to the tangential linear system approach derived in [AGS + 15] in order to validate our model.We use the following data points as well as the control points p i = d i , and  where the exponential map and geodesic on R 3 are actually the addition and line segments, respectively.Note that, by construction, the initial curve is continuously differentiable but (obviously) does not minimize (1).The parameter λ is set to 50.The MSA of this initial curve Ã(b) is approximately 18.8828.The data points are given to the algorithm of [AGS + 15] to reconstruct the optimal Bézier curve B(t), which is the natural C 2 cubic spline.The result is shown in Fig. 7a together with the first order differences along the curve in Fig. 7b as a dotted curve.
The set of control points (p 0 , b + 0 , . . ., b − 3 , p 3 ) is optimized with our proposed method.We discretize the second order difference using N = 1600 points.The resulting curve and the first order difference plots are also obtained using these sampling values and a first order forward difference.The gradient descent algorithm from Algorithm 1 employs the Armijo rule (24) setting β = 1 2 , σ = 10 −4 , and α = 1.The stopping criteria are i ) < = 10 −15 or ∇ M n Ã 2 < 10 −9 , and the algorithm stops when one of the two is met.For this example, the first criterion stopped the algorithm, while the norm of the gradient was of magnitude 10 −6 .
Both methods improve the initial functional value of Ã(b) ≈ 18.8828 to a value of Ã(b min ) ≈ 4.981218.Both the linear system approach and the gradient descent perform equally.The difference of objective value is 2.4524 × 10 −11 smaller for the gradient descent, and the maximal distance of any sampling point of the resulting curves is of size 4.3 × 10 −7 .Hence, in the Euclidean space, the proposed gradient descent yields the natural cubic spline, as one would expect.This can also be seen on the first order differences (c).

Examples on the sphere S 2
Validation on a geodesic.As a second example with a known minimizer, we consider the manifold M = S 2 , i.e., the two-dimensional unit sphere embedded in R 3 , where geodesics are great arcs.We use the data points aligned on the geodesic connecting the north pole p 0 and the south pole p 2 , and running through a point p 1 on the equator.We define the control points of the cubic Bézier curve as follows: where x 0 and x 2 are temporary points and p i = d i .We obtain two segments smoothly connected since log p 1 b − 1 = − log p 1 b + 1 .The original curve is shown in Fig. 8a, where especially the tangent vectors illustrate the different speed at p i .
The control points are optimized with our interpolating model, i.e., we fix the start and end points p 0 , p 1 , p 2 and minimize A(b).
The curve, as well as the second and first order differences, is sampled with N = 201 equispaced points.The parameters of the Armijo rule are again set to β = 1 2 , σ = 10 −4 , and α = 1.The stopping criteria are slightly relaxed to 10 −7 for the distance and 10 −5 for the gradient, because of the sines and cosines involved in the exponential map.
The result is shown in Fig. 8b.Since the minimizer is the geodesic running from p 0 to p 2 through p 1 , we measure the perfomance first by looking at the resulting first order difference, which is constant, as can be seen in Fig. 8c.As a second validation, we observe that the maximal distance of the resulting curve to the geodesic is of 2.2357 × 10 −6 .These evaluations again validate the quality of the gradient descent.
Effect of the data term.As a third example we investigate the effect of λ in the fitting model.We consider the data points as well as the control points p i = d i , and The remaining control points b − 1 and b − 2 are given by the C 1 conditions (4).The corresponding cuve B(t), t ∈ [0, 3], is shown in Fig. 9 in dashed black.When computing a minimal MSA curve that interpolates B(t) at the points p i , i = 0, . . ., 3, the acceleration is not that much reduced, see the blue curve in Fig. 9a.
For fitting, we consider different values of λ and the same parameters as for the last example.The optimized curve fits the data points closer and closer as λ grows, and the limit λ → ∞ yields the interpolation case.On the other hand smaller values of λ yield less fitting, but also a smaller value of the mean squared acceleration.In the limit case, i.e., λ = 0, the curve (more precisely the control points of the Bézier curve) just follows the gradient flow to a geodesic.
The results are collected in Fig. 9.In Fig. 9a, the original curve (dashed) is shown together with the solution of the interpolating model (solid blue) and the gradient flow result, i.e., the solution of the fitting model (solid black) with λ = 0. Fig. 9b illustrates the effect of λ even further with a continuous variation of λ from a large value of λ = 10 to a small value of λ = 0.01.The image is generated by sampling this range with 1000 equidistant values colored in the colormap viridis.Furthermore, the control points are also shown in the same color.
Several corresponding functional values, see Tab. 1, further illustrate that with smaller values of λ the discretized MSA also reduces more and more to a geodesic.For λ = 0 there is no coupling to the data points and hence the algorithm does not restrict the position of the geodesic.In other words, any choice for the control points that yields a geodesic, is a solution.Note that the gradient flow still chooses a reasonably near geodesic to the initial data (Fig. 9a, solid black).
Comparison with the tangential solution.In the Euclidean space, the method introduced in [AGS + 15; GMA18] and the gradient descent proposed here yield the same curve, i.e., the natural cubic spline.On Riemannian manifolds, however, all approaches approximate the optimal solution.Indeed, their method provides a solution by working in different tangent spaces, and ours minimizes a discretization of the objective functional.In this example we take the same data points d i as in (25) now interpreted as points on M = S 2 .Note that the control points constructed in (26) still fit to the sphere since each b ± i is built with a vector in T p i M and using the corresponding exponential map.The result is shown in Fig. 10a.The norm of the first order differences is given in Fig. 10b.The initial curve (dashed black) has an objective value of 10.9103.The tangent version (dotted) reduces this value to 7.3293.The proposed method (solid blue) yields a value of 2.7908, regardless of whether the starting curve is the initial one, or the one already computed by [AGS + 15].Note that, when p 3 = [0, 0, −1] T , the tangent version is even not able to compute any result, since the construction is performed in the tangent space of p 0 and since log p 0 p 3 is not defined.

An example of orientations
Finally we compare our method with the blended splines introduced in [GMA18] for orientations, i.e., data given on SO(3).Let We further compare both curves by looking at their absolute first order differences.In the third line of Fig. 11, we display 17 orientations along the initial curve with its first order difference as height.The line without samples is the absolute first order differences of the minimizer b, scaled to the same magnitude.The line is straightened especially for the first Bézier segment.Nevertheless, the line is still bent a little bit and hence not a geodesic.This can be seen in the fourth line which represents 17 samples of the minimizing curve compared to its control points in the last line, which are drawn on a straight line.

Conclusion and future work
In this paper, we introduced a method to solve the curve fitting problem to data points on a manifold, using composite Bézier curves.We approximate the mean squared acceleration of the curve by a suitable second order difference and a trapezoidal rule, and derive the corresponding gradient with respect to its control points.The gradient is computed in closed form by exploiting the recursive structure of the De Casteljau algorithm.Therefore, we obtain a formula that reduces to a concatenation of adjoint Jacobi fields.
The evaluation of Jacobi fields is the only additional requirement compared to previous methods, which are solely evaluating exponential and logarithmic maps.For these, closed forms are available on symmetric manifolds.
On the Euclidean space our solution reduces to the natural smoothing spline, the unique acceleration minimizing polynomial curve.The numerical experiments further confirm that the method presented in this paper outperforms the tangent space(s)-based approaches with respect to the functional value.
It is still an open question whether there exists a second order absolute finite difference on a manifold, that is jointly convex.Then convergence would follow by standard arguments of the gradient descent.For such a model, another interesting point for future work is to find out whether only an approximate evaluation of the Jacobi fields suffices for convergence.This would mean that, on manifolds where the Jacobi field can only be evaluated approximately by solving an ODE, the presented approach would still converge.

Figure 1 :
Figure 1: Schematic representation of the composite cubic Bézier curve B : [0, 5] → M, for M = R (black), and its control points (green circles).Continuous differentiability is reached at the junction of the segments (the blue arrows draw the first derivative of B).
satisfies D dt g(t; a, b) g(t) = d M (a, b), for all t ∈ [0, 1], where d M (a, b) is the geodesic distance between a and b ∈ M. The Riemannian exponential reads exp a : T a M → M, v → b = exp a (v) and we denote by r a ∈ R the maximal radius such that the exponential map is bijective on D

Figure 2 :
Figure 2: Construction of a cubic Bézier curve via the De Casteljau algorithm.

i
(t) = g(t; a, b).Note that while f depends on the control points b i , . . ., b i+k and is a Bézier curve of degree k, both a and b are Bézier curves of degree k − 1.The former does not depend on b i+k , and the latter is independent of b i .

Figure 5 :
Figure 5: Schematic representation of the cases where elements compose the chained derivative of the i th Bézier curve of order k in the De Casteljau algorithm.The solid line represents a Jacobi field along g [k]

Finally
, as D x b[η] = D b f [D x b[η]] = 0 for x = b i , the assumption follows.

Figure 6 :
Figure 6: Construction and derivation tree of a Bézier curve β 3 (t; b 0 , b 1 , b 2 , b 3 ).The derivative with respect to a variable b i is obtained by a recursion of Jacobi fields added at each leaf of the tree.

Figure 7 :
Figure7: The initial interpolating Bézier curve in R 3 (dashed, a) with an MSA of 18.8828 is optimized by the linear system method (LS) from [AGS + 15] (dotted) and by the proposed gradient descent (GD, solid blue).As expected, both curve coincide, with an MSA of 4.981218.The correspondance is further illustrated with their first order differences in (b).
Absolute first order difference.

Figure 8 :
Figure 8: The initial curve (dashed, a) results in a geodesic (solid, b) when minimizing the discreteacceleration.This can also be seen on the first order differences (c).

Figure 9 :
Figure 9: The composite Bézier curves are composed of three segments.The initial curve (dashed, a) is optimized with the interpolating model (solid blue, a) as well as with the fitting model for a continuum of values of λ, from λ = 10 (violet, b) to λ = 0.01 (yellow, b).The limit case where λ = 0 yields an unconstrained geodesic (solid black, b).
First order differences.

Figure 10 :
Figure 10: The initial composite Bézier curve on S 2 (dashed, a) has an MSA of 10.9103.The curve obtained with the linear system (LS, dotted) of [AGS + 15] has an MSA of 7.3293.The solution returned by our proposed method (GD, solid blue) outperforms them all with an MSA of 2.7908.This is further illustrated by the first order derivative approximated by first order differences in (b).

Figure 11 :
Figure 11: From top row to bottom row: (1) the initial data points (cyan), (2) the control points computed by the blended Bézier approach from [GMA18], (3) 17 points on the corresponding curve, where the height represents the absolute first order difference, and the comparing curve is the first order differences from our model, (4) 17 points along the resulting curve from gradient descent, and (5) the resulting control points of the curve in (4).

Table 1 :
Functional values for the three-segment composite Bézier curve on the sphere and different values of λ.Note that the case λ = ∞ corresponds to interpolation.