- 1Clinical Hospital of Chengdu Brain Science Institute, University of Electronic Science and Technology of China, Chengdu, China
- 2Neuroinformatics, Cuban Neurosciences Center, Havana, Cuba
- 3China-Cuba Belt and Road Joint Laboratory on Neurotechnology and Brain–Apparatus Communication, University of Electronic Science and Technology of China, Chengdu, China
- 4Faculty of Mathematics and Computer Sciences, University of Havana, Havana, Cuba
- 5Center for Mind/Brain Science (CIMeC), University of Trento, Trento, Italy
Introduction: We extend intrinsic local polynomial regression (ILPR) to multivariate covariates on arbitrary Riemannian manifolds, providing a general framework to study the action of isometric maps on manifold-valued regression estimators.
Methods: We derive a closed-form expression for the ILPR estimator on manifolds endowed with an Euclidean pullback metric (EPM), a broad class that encompasses infinitely many Riemannian metrics. Pseudocode is provided along with an explicit upper bound on computational complexity. For the multivariate local linear case, we characterize the asymptotic properties of the ILPR–EPM estimator, including a closed-form expression for its bias, establishing statistical consistency under EPM.
Results: Simulation studies on the manifold of symmetric positive definite (SPD) matrices—benchmarking diffusion tensor imaging (DTI) data and high-dimensional covariance models—show that the Log–Cholesky metric achieves a favorable trade-off between estimation accuracy and computational efficiency. Using real neurophysiological data from the Harmonized-MultiNational quantitative electroencephalography (HarMNqEEG) dataset (N = 1,965), we apply ILPR to symmetric positive definite cross-spectral covariance matrices summarized via a 360-region cortical parcellation.
Discussion: We show that Log–Euclidean and Log–Cholesky estimators yield statistically and neurophysiologically indistinguishable population-level results, while the Log–Cholesky formulation substantially reduces computational cost. These results establish ILPR–Log–Cholesky as a statistically sound and computationally efficient intrinsic regression method for large-scale, high-dimensional manifold-valued data.
1 Introduction
1.1 Regression on Euclidean space
Regression models the relationship between a dependent variable Yi∈𝒴 and an independent variable Xi∈𝒳. This relationship is inferred from a sample Σ = {(Xi, Yi)∈𝒳×𝒴:i = 1, 2, ⋯N}. The goal is to find the best-fitting curve, surface, or hypersurface that describes the relationship between the variables. Non-parametric regression makes minimal assumptions about the functional form of the relationship between the variables. In Euclidean space, 𝒳×𝒴 = ℝp×ℝn, if Σ corresponds to a set of independent and identical realizations of a sample population (X, Y), non-parametric regression aims to determine a smooth regression function m(·) that satisfies
without specifying the shape of m(·). Unlike parametric regression, which assumes a specific mathematical equation (e.g., m(X) = α0+α1X), non-parametric regression allows for a more flexible approach where the shape of m(·) is inferred from the data Σ [1–3].
Local polynomial regression (LPR) is a non-parametric regression method in the Euclidean space. This method fits a local polynomial model to the data in a neighborhood around each data point, rather than using a single global polynomial to fit the entire sample. LPR allows a more adaptive model to handle complex nonlinear relationships in the data. In precise terms, LPR of order k is a non-parametric method that consists of estimates a k+1 times differentiable regression function m(x) and its derivatives (Partial derivative of degree j of function m(X) with respect to X evaluated at x), 0 < j ≤ k. In contrast to global polynomial regression, LPR effectively handles cases where the relationship between the variables is heterogeneous across the dataset. Other non-parametric regressions in Euclidean space are Kernel regression and Spline Regression; for brevity, they will not be considered in this study [1–4].
1.2 Why Riemannian Regression?
In various fields, including Medical Imaging, Computational Anatomy, Machine Learning, Computer Vision, Brain–Computer Interface, and Finance, we increasingly encounter data with responses that lie on Riemannian Manifolds—referred to as Riemannian-valued data. Unlike traditional Euclidean data, Riemannian-valued data lie within nonlinear spaces that exhibit a curved geometry globally. A prime example, and the particular case of study in the simulation section of this article, is the manifold of Symmetric Positive Definite (SPD) matrices, of central importance in modeling and the analysis of similarity or diffusion measures, like covariance matrices that allow determining functional, effective connectivity or diffusion of water molecules in neuroimage [5–9]. However, not all similarity measures used in neuroscience belong to the SPD manifold, as shown in Table 1, which summarizes the main similarity measures and their respective natural manifolds [10, 11].
On Riemannian Manifolds, traditional estimators, like the Euclidean mean (), fail to accommodate the intrinsic curved geometry. For instance, the straightforward computation of the mean in Euclidean terms often results in a value that does not lie within the manifold. To address this, the Karcher Mean or Fréchet mean—methods that account for the manifold's geometry—are used to compute the mean of Riemannian-valued data more appropriately [12].
Similarly to the case of the mean, there is increasing evidence in the literature that demonstrates that Euclidean-based regression methods, like local polynomial regression (LPR) and similar approaches, are inadequate to handle Riemannian Valued data and can lead to poor performance on classification tasks compared to Riemannain based Regression techniques [5, 12–21]. For example, in Figure 1, we can appreciate a qualitative comparison between traditional LPR (blue Points in Figures 1A, B) and the Riemannian ILPR method (green Points Figures 1C, D) introduced in this study on synthetic regression problem on the Unit Circle (UC) (). From this simulation, we can infer that under the same smoothing parameter, ILPR yields a better reconstruction from the noise data (black points) of TRUE DATA than LPR, while at the same time respecting the geometric boundaries of the UC [22, 23].
Figure 1. Contrasts Riemannian Regression [intrinsic local polynomial regression (ILPR)] with Euclidean Regression [local polynomial regression (LPR)] through simulated data on the Unit Circle Manifold (UC). Red indicates TRUE DATA points, black marks noisy observations, blue represents LPR, and green signifies ILPR. The scalar covariate bears the notation X, and the formula (Y1, Y2) = (cos(X), sin(X)) parametrically represents the points on the circle. (A) Displays a 3D scatter plot of the LPR estimates on the UC, highlighting the deviation from the manifold. (C) Shows the ILPR estimates in a similar 3D plot, demonstrating alignment with the circle. The top view in (B) presents the Euclidean LPR estimates, which incorrectly fall off the manifold, while (D) shows the top view for the Riemannian ILPR estimates, maintaining fidelity to the circular manifold structure.
In general, Riemannian regression manages to solve the limitation of Euclidean regression by reformulating traditional Euclidean regression techniques in terms of Riemannian operations (such as the exponential and logarithm maps, parallel transport, and intrinsic distance), therefore ensuring that the estimation remains consistent with the geometrical structure of the data [12].
1.3 State of the art of Riemannian Regression
There are two main classes of Riemannian Regression methods: extrinsic and intrinsic. The extrinsic approach utilizes embeddings to map responses into high-dimensional Euclidean spaces, where traditional regression techniques can be applied directly. As a consequence, the intrinsic geometric structure of the responses is distorted, and some essential information is lost. On the other hand, the intrinsic approach allows for the finding of estimators that adhere to the manifold geometry by reformulating regression problems in terms of intrinsic Riemannian operations. However, although the intrinsic approach offers, from a theoretical point of view, a more accurate estimation of Riemannian-valued data, in practice, it often involves solving complex optimization problems without exact solutions and computationally expensive approximate methods that may affect its performance relative to extrinsic counterparts [19, 21]. Consequently, it is necessary to elaborate theoretical tools that allow overcoming the current limitations of intrinsic regression.
Across the literature, there are many examples of intrinsic regression; among them, we have Multivariate General Linear Models (MGLM) that extend the traditional parametric linear regression in Euclidean space to Riemannian manifolds [24]. Additive Regression Models (ARM) on the Tangent Space extend Additive Regression to a Riemannian Manifold with a Lie structure [1, 25]. Fréchet Regression (FR) extends non-parametric regression methods from Euclidean Space to Metric Spaces. This type of regression employs only the distance between elements and is valid in general Metric Spaces, where Taylor expansions may not be possible [19]. Intrinsic local polynomial regression (ILPR) is a non-parametric regression method extending the classical Euclidean-based LPR to Riemannian Manifolds [21]. Determining a suitable intrinsic Riemannian regression methodology depends on the specific research aims and the available computational resources.
1.4 Research focus
The focus of this paper is to address the deficiencies or challenges of ILPR formulation, originally introduced by Yuan et al. [21], that limit both its theoretical and practical applications. These definitions are (1) a lack of a close analytical formula, (2) a lack of theoretical results about the general statistical properties (3) practical computational barriers. We found that by establishing a rigorous definition of ILPR on an arbitrary Riemannian Manifold and studying the action of an isometric map, we can establish general theoretical properties of ILPR that have practical applications in partially addressing the deficiencies mentioned earlier, especially on the SPD manifold.
1.5 Structure of the article
We present the following key contributions:
(a) A rigorous definition of multivariate ILPR, which generalizes the original formulation to include responses on arbitrary Riemannian manifolds with multivariate descriptors. This definition enables the study of the action of isometric maps on the ILPR estimator, as formalized in Theorem (3.3).
(b) Theorem (3.4) establishes a closed-form expression for multivariate ILPR on Riemannian manifolds equipped with an Euclidean pullback metric (EPM), covering an infinite family of metrics beyond those previously considered in the literature.
(c) Theorem (3.5) establishes the statistical consistency of the ILPR estimator for the EPM family.
(d) We derive an explicit upper bound on the computational complexity of the ILPR algorithm under EPM metrics.
(e) Lemma (2.4) shows that the Log–Cholesky metric constitutes an EPM on the manifold of symmetric positive definite matrices.
(f) Simulation benchmarks on the SPD manifold, including diffusion tensor imaging (DTI) and high-dimensional covariance models, demonstrate that the Log–Cholesky metric yields a favorable trade-off between estimation accuracy and computational efficiency compared to established approaches.
(g) An application to real neurophysiological data demonstrates the practical utility of the proposed framework: ILPR is applied to lifespan EEG source functional connectivity estimated as symmetric positive definite cross-spectral covariance matrices from the HarMNqEEG dataset (N = 1,965), summarized via a 360-region cortical parcellation. This analysis demonstrates that the Log–Euclidean and Log-Cholesky estimators yield statistically indistinguishable population-level connectivity trajectories, while the Log–Cholesky formulation substantially reduces computational costs.
2 Background
2.1 Riemannian manifolds
2.1.1 Basic elements
A manifold ℳ is a topological space that resembles an Euclidean space locally. Specifically, for each element Y∈ℳ, it is possible to associate a Euclidean space called a tangent space, denoted as TYℳ. Suppose we define in the tangent space a map gY:TYℳ×TYℳ → ℝ such that gY is an inner product that varies smoothly concerning Y∈ℳ. In that case, the pair (ℳ, g) constitutes a Riemannian manifold, with g being a Riemannian metric [20, 26].
On a manifold, a curve is a differentiable map γ:[t0, t1] → ℳ such that for all t∈[t0, t1], . The geometric structure defined on a Riemannian manifold (ℳ, g), along with the concept of a curve, allows us to define the geodesic distance between two elements Y1 and Y2 in ℳ, denoted by distg(Y1, Y2), as the length of the shortest path between Y1 and Y2
where is the set of all curves such that γ(t0) = Y1 and γ(t1) = Y2. A geodesic is a smooth curve on ℳ whose tangent vector γ′ does not change length or direction as one moves along the curve. For V∈TYℳ there is a unique geodesic, denoted by γY, V(t), whose domain contains [0, 1] such that γY, V(0) = Y and .
In general, a Riemannian manifold is a nonlinear space, and unlike a Euclidean space, linear combinations may not remain in the manifold. However, three essential operations can be defined to manipulate and study the manifolds: the exponential map, the logarithmic map, and the parallel transport. The exponential map, denoted by expY:TYℳ→ℳ, is defined as expY(V) = γY, V(1), where γY, V is the unique geodesic with initial condition γY, V(0) = Y and . The Logarithmic Map, denoted by logY:ℳ→TYℳ, is defined as the inverse of the exponential map, that is, . Finally, the parallel transport, denoted by πY1→Y2:TY1ℳ→TY2ℳ, is a linear map on a Riemannian manifold that moves a tangent vector V∈TY1ℳ along a curve without changing its direction or orientation.
2.1.2 Symmetric positive definite manifold (SPD)
The SPD manifold consists of a collection of symmetric, positive-definite matrices, which are commonly used to represent covariance or similarity measures in data sets or, more precisely [12, 14, 27–30]:
where Y1∈SPDn and Mn(ℝ) is the space of matrices of size n with real components. By itself, the SPD manifold does not define a Riemannian manifold; to do so effectively, it is necessary to define a scalar product on the tangent space, that is, a Riemannian metric. This can be done in an infinite number of ways, and in Table 2, we summarize a subset of the most commonly used Riemannian metrics on SPD-valued data. The selection of one over another depended on the specific problem at hand that we sought to address. For example, in many cases, it is necessary to analyze a large sample, where a balance between accuracy and computational efficiency is more necessary than just accuracy. In such cases, the Log–Cholesky and Log–Euclidean metrics are ideal [27, 29]. However, there is a specific problem where it is also crucial to consider the congruence invariance or affine invariant property, which guarantees that distance remains unchanged under the action of the general linear group. In such cases, it is necessary to use the affine invariant metric that allows accurate estimation at the expense of higher computational cost due to the heavy evaluation of matrix inverses and logarithms [12, 31].
2.1.3 Isometric Riemannian manifolds (IRM) and Euclidean pullback metrics (EPM)
Smooth maps, or differentiable functions, allow us to represent a Riemannian manifold within another, conserving the fundamental geometrical properties. This is useful in our context because it allows us to express intrinsic geometrical problems on a Riemannian manifold, such as the estimation of the ILPR estimator in terms of Riemannian operation over a new manifold that, if constructed in the right manner, can have a more straightforward geometrical structure that allows exploring the intrinsic geometrical problem more simply. An important concept associated with smooth maps is pullback metrics, which relate the geometrical structures on the domain and image manifolds. Formally, the pullback metric is defined as:
Definition 2.1 (pullback metric and differential). Let f:ℳ→𝒩 be a smooth map between manifolds. The directional derivative dYf:TYℳ→Tf(Y)𝒩 in the direction of V∈TYℳ is defined as
On the other hand, if f:(ℳ, gf) → (𝒩, g) is a smooth map between Riemannian manifolds with an injective differential dYf (immersion). Then gf is a pullback metric of the metric g by f if for all V∈TYℳ:
When a pullback metric connects the geometrical structure of two Riemannian manifolds, all the corresponding Riemannian operations (exponential map, logarithm map, parallel transport, etc.) are related (see Table 3 and Figure 2). To illustrate this concept, we can consider the SPD manifold with the Log–Euclidean metric. Using Definition (2.1), we can quickly check that the Log–Euclidean metric (gLE) is the pullback metric of the Euclidean metric (gE), that is, gLE = log*gE [27].
Table 3. Relationships between Riemannian operations in a manifold equipped with a pullback metric [32].
Figure 2. Visual representation of the relationships between Riemannian operations in a manifold equipped with a pullback metric.
A distinct class of smooth mappings is constituted by those that facilitate the transition of a geometric structure from one Riemannian manifold to another while conserving the manifold's inherent attributes, notably including geometric properties such as the angles between curves and the distances between points. These specific mappings are known in the literature as isometric maps and are of paramount significance in differential geometry.
Definition 2.2. (Isometric maps and isometric Riemannian manifold)
Let be f:(ℳ, gf) → (𝒩, g) be diffeomorphism with its image (ℳ≃ff(ℳ)) such that gf = f*g (see Definition 2.1) then f is an isometric maps. In addition, if an isometric map f exists between the Riemannian manifolds (ℳ, gf) and (f(ℳ), g), we say that they are isometric Riemannian manifolds (IRM) by f.
An example of isometric Riemannian manifolds is the pair and . In this case, the logarithm function establishes an isometric map between the Riemannian manifolds. This means that Symn≃log(SPDn), and the Riemannian metric gLE is obtained from gE by pulling it back using the log function, that is, gLE = log*gE [27]. In other words, the two manifolds possess the same underlying geometry and can transform into each other without altering the distances between points or angles between curves.
Another key concept to developing this research is the Euclidean pullback metric (EPM). These objects represent a special type of pullback metric, expressed as the pullbacks of the Euclidean metric. This metric is associated with isometric maps and has a wide range of applications.
Definition 2.3 (Euclidean pullback metrics (EPM)). Consider the Riemannian manifold (ℳ, gf). We define the metric g as the Euclidean pullback metric if there exists an isometric map f:(ℳ, gf) → (𝒩, gE), where gE represents the Euclidean metric, and the relationship gf = f*gE holds. In other words, we consider the metric g on ℳ to be a Euclidean pullback metric when we can use a pullback from the Euclidean metric via the isometric maps f.
Suppose we define a Riemannian metric gf on a space ℳn as the pullback of the Euclidean metric through an isometric map f. In that case, we consider both manifolds and to be isometric Riemannian manifolds (IRM). The diffeomorphism of the map f:ℳn→f(ℳn) and the characterization of g as the pullback of the Euclidean metric gE via f explain this property. The class of EPMs is broad and analytically convenient, as many relevant Riemannian metrics can be expressed or locally approximated in this form around a suitable reference point, preserving the geometric information necessary for statistical inference. On the other hand, the Nash embedding theorem ensures that any smooth Riemannian manifold can be isometrically embedded into a sufficiently high-dimensional Euclidean space [33], providing the theoretical foundation for extending EPM-based results to more general settings. Under suitable regularity conditions, this opens the way to establishing the consistency and asymptotic properties of the ILPR estimator on arbitrary smooth manifolds in a general case. Within SPD, the affine-invariant metric—although not an EPM—can be locally approximated by one, while the Mixed Euclidean Metric offers a direct extension of the current framework by relaxing the isometric constraint yet retaining analytical tractability [34].
Lemma (2.4) encapsulates several crucial EPMs on the SPD manifold. This Lemma presents proof establishing the Log–Cholesky metric as an EPM for the first time in the literature. The proof is provided in Supplementary Appendix B.
Lemma 2.4 (Log–Cholesky metric as EPM). All Riemannian metrics g in Table 4 are EPMs, by the isometric maps f:SPDn→f(SPDn).
2.2 Multivariate analysis
2.2.1 The unbalanced block Kronecker product
To accommodate multivariate covariates within our theoretical framework, we adopt the classical Kronecker product, denoted by ⊗ [35], along with its generalization—the Unbalanced Block Kronecker product, denoted ⊠. This operator, a specific case of the Tracy–Singh product [36], plays a central role in our analysis. We begin with the definition of block-partitioned matrices:
Definition 2.5 (Block Matrix). A matrix B = (Bij) is said to be block-partitioned if each element Bij is a matrix of size u×v, for i = 1, …, s and j = 1, …, q. We denote this structure as B∈Ms, q(Mu, v(ℝ)). If no such partition is imposed, we write B∈Ms, q(ℝ).
A block-partitioned matrix admits the representation:
where Eij is the elementary matrix with a one at position (i, j) and zeros elsewhere [36]. We now define operators that extract the j-th block column or i-th block row of a matrix:
Definition 2.6 (Block Column and Row Operators). Let B∈Ms, q(Mu, v(ℝ)). The block-column and block-row operators are defined by
where and are vectors with a one in component (i, j) and zeros elsewhere.
We are now ready to define the Unbalanced Block Kronecker product:
Definition 2.7 (Unbalanced Block Kronecker Product). Let A∈Mn, ℓ(ℝ) and B∈Ms, q(Mu, v(ℝ)). The Unbalanced Block Kronecker product is defined as
If B∈Ms, q(ℝ), then A⊠B
= B⊗A.
This operator is especially useful for expressions of the form [A⊗B A⊗C], which can be compactly rewritten as
This formulation simplifies algebraic manipulations while preserving key properties of the standard Kronecker product, such as compatibility with transposition, inversion, and mixed multiplication [36]. Moreover, the Unbalanced Block Kronecker product allows us to express multivariate quadratic forms in a canonical matrix form. Specifically, expressions such as
can be reduced to standard forms like XWX⊤, facilitating the derivation of Theorem 3.4. We conclude this subsection with a useful identity linking the ⊠ operator with block column and row sums:
Lemma 2.8 (Block Column and Row Sum Decomposition). Assume the following:
(a) N divides s;
(b) A∈M1, N(Mn, s/N(ℝ));
(c) B∈MN, 1(Ms/N, q(ℝ));
(d) Λ = Diag(λ1, …, λN)∈MN, N(ℝ).
Then the identity
holds, where N, n, s, q are positive integers.
The proof is provided in Supplementary Appendix C.
2.2.2 Taylor expansion for multivariate regression
A foundational result in nonparametric regression is that expressing local regression estimators as solutions to weighted least squares problems requires a Taylor expansion of the regression function [1–3]. In this study, we focus on regression settings where the covariates lie in an Euclidean space. At the same time, the response variables take values in a Riemannian matrix manifold—an intrinsically nonlinear subset of Mn, q(ℝ). The regression function m(·) in this context is a matrix-valued vector function and therefore requires an appropriately defined multivariate Taylor expansion [37, 38]. Several formulations for differentiating matrix-valued functions with respect to vector inputs have been proposed in the literature [39–41]. Among these, we adopt the formalism introduced by van den Boom and colleagues [38], which provides a convenient and generalizable framework for our purposes.
Definition 2.9 (Partial Derivative of Matrix-Valued Functions). Let X, x∈ℝp, and consider a matrix-valued function , which can be expressed as
where each is a scalar function, and Ei, j∈Mn, q(ℝ) denotes the matrix with a 1 in position (i, j) and zeros elsewhere. Then the first-order partial derivative of A(X) with respect to X, evaluated at x, is given by:
where is the transpose gradient operator in ℝp.
Using this definition, any matrix-valued function m(X)∈Mn, q(ℝ) that is k+1 times differentiable in a neighborhood of x admits a Taylor expansion of the form:
Here, each coefficient denotes the j-th order partial derivative of m(X) with respect to X, evaluated at x. The notation (X−x)⊗j refers to the j-fold Kronecker product:
A key result that facilitates handling these derivatives, especially for products of matrix-valued functions, is the following lemma:
Lemma 2.10 (Product Rule for Matrix-Valued Functions). Suppose the following conditions hold:
(a) X, x∈ℝp, where X is the variable and x is the evaluation point.
(b) .
Then, the derivative of the product satisfies the following:
In particular, if B(X) = B is constant, then:
This result, originally established in [38, 41], plays an essential role in the derivation of our main theoretical contributions.
2.3 Extrinsic regression
2.3.1 Multivariate local polynomial regression on Euclidean space (LPR)
Statistical textbooks typically introduce the LPR method in the vector Euclidean space. To expose a more explicit connection between this regression technique and ILPR, we extend LPR to the matrix manifold . With the standard Euclidean distance, this manifold corresponds to an Euclidean space, equivalent to ℝn2. This equivalence arises from the vectorization operator establishing an isomorphic relationship between these two spaces. Given these inherent similarities, regression methods in ℝn2 can be seamlessly extended to Mn(ℝ). Consequently, the classical Definition of multivariate LPR in ℝn2 can be directly applied to Mn(ℝ) [1–3].
Here is the formal Definition of classical multivariate LPR of degree k in the Euclidean space at x, given a set of independent and identically sampled points from a population (X, Y):
Definition 2.11 (Multivariate local polynomial regression (LPR)). Let Σ be as defined above. Classical Multivariate local polynomial regression (LPR) of degree k in at x consists of estimates of the regression function m(x) such that E[Y−m(x)|X = x] = 0n×n and the derivatives for j = 1, 2, ⋯ , k, under the assumption that m is a k+1 times differentiable function at X = x.
The assumptions of Definition (2.11) imply that in a neighborhood of x, the vector-valued matrix function t(X) = m(X)−m(x) is a k+1 time differentiable function. Thus, we can use the Taylor expansion introduced in Subsection 15 to obtain:
where are the Taylor coefficients of t(X), that is, the partial derivative of degree j of the function t(X) with respect to X evaluated at x. This polynomial is fitted locally by a weighted least squares regression problem:
where α0 = m(x), for j = 1, 2, ⋯ , k and Kh(·) is a Kernel function with bandwidth h. It is important to remark that the is a block matrix, more precisely
Looking at the Taylor expansion Equation 18 is clear that is an estimator of . For scalar covariate vectors and responses Yi, the optimization problem Equation 19) reduces to the classical univariate LPR problem [1, 3].
2.3.2 Extrinsic local polynomial regression (ELPR)
As commented in the introduction, Euclidean regression, such as local polynomial regression (LPR), is not adequate to preform estimation over the sample with responses on Riemannian manifolds [5, 13, 15–18, 20]. Additionally, we noted that extrinsic regression is a variant of Riemannian regression method that allows extending traditional Euclidean-based methods by locally embedding the response manifold into a higher dimensional Euclidean space (Figure 3). Within this new embedding space, standard Euclidean smoothing conditions are satisfied, and therefore, they can be applied directly. Via the inverse of the local embedding, an estimator on the original Riemannian manifold can be obtained. Key to this approach is to ensure that the local embedding preserves as much as possible of the intrinsic geometry of the manifold and the sample's true variability, which can never be fully archived:
Figure 3. Extrinsic regression framework. Data points on a Riemannian manifold (black dots) are mapped into a higher dimensional Euclidean space via an embedding function f, enabling conventional regression techniques. The regression estimates are subsequently projected back to the manifold using the inverse mapping.
Following Lin et al. [17], these embeddings are constructed using equivariant or locally isometric maps, such as the Riemannian logarithmic map centered at a convenient reference point within the manifold, which effectively captures the sample's true geometrical variability. In literature, a standard way to select a reference point to construct the local isomeric maps is the Karcher mean or average of the responses on the Riemannian manifold and the identity element after performing parallel transport to it [5, 12, 28, 42]. Then, standard smoothing is performed in Euclidean Space, and the inverse map is used to project the estimates back onto the original Riemannian manifold [17, 20].
Extrinsic local polynomial regression (ELPR) constitutes an instance of this approach. The logarithmic map, centered at the Karcher mean, associated with the Riemannian manifold, is used to define an equivariant map that allows mapping the sample responses into the tangent space, which is an Euclidean space. Then, over the tangent space LPR, it is performed. In the following simulation section, we will use simulation benchmarks to compare the computational performance of ELPR in terms of accuracy and speed with that of LPR.
3 Multivariate intrinsic local polynomial regression on isometric Riemannian manifolds (ILPR-IRM)
3.1 Multivariate intrinsic local polynomial regression on an arbitrary Riemannian matrix manifold (ILPR)
As previously discussed, ILPR extends LPR to handle data with responses on Riemannian manifolds [21]. While initially formulated within the constrained SPD Manifold, the ILPR definition can be adapted to a general Riemannian manifold. Accordingly, we propose a revised Definition of ILPR capable of accommodating more general settings involving covariate vectors in IRMs.
Definition 3.1 (Intrinsic local polynomial regression (ILPR)). Let be a set of independent and identically distributed samples from a population (X, Y), and let ω∈ℳn. ILPR of degree k on the Riemannian manifold (ℳn, g) at is a statistical methodology used to estimate the regression function m(x) such that 𝔼[logm(X)Y|X = x] = 0n×n and the derivatives , j = 1, 2, ⋯ , k, under the assumption that m is a k+1 time differentiable function at X = x. Here, π and log represent the parallel transport and Logarithm Map induced by the metric g.
The flexibility of Definition (3.1), in comparison to earlier formulations, stems from its elimination of the need for parallel transport from the matrix identity In and from a clear specification of the role of multivariate covariates. In Yuan et al. [21], the ILPR estimator was originally defined for univariate covariates and responses on the SPD manifold, relying on an expansion in the tangent space parallel-transported to the identity. Although this formulation represented an important advance for the statistical analysis of SPD data, it imposed structural limitations: the estimator was inherently tied to the SPD geometry, its consistency depended on transport to In, and it could not naturally account for the action of isometries, a crucial consideration given that many Riemannian metrics can be constructed asisometric or deformative transformations of simpler geometries [34]. Definition (3.1) removes these restrictions by introducing a manifold-agnostic and intrinsically multivariate formulation. Specifically, it extends ILPR to arbitrary smooth finite-dimensional Riemannian manifolds (ℳn, g) and to multivariate covariates X∈ℝp. This intrinsic and coordinate-free construction eliminates dependence on the SPD identity and ensures compatibility with any smooth finite-dimensional Riemannian manifold. Furthermore, by centering the parallel transport at the local or Fréchet mean rather than the identity, the estimator gains both theoretical generality and numerical stability, in line with the improvements reported by [28, 42] for covariance-based manifold learning.
In the case of the Riemannian matrix manifold, the function t(X) = πm(x) → ω(logm(x)m(X)) has an image on the tangent space Tωℳn, which is Euclidean and a vector subspace of Mn(ℝ). Therefore, it is a vector value matrix function and admits a Taylor expansion. Consequently, the assumptions in Definition (3.1) imply that, in a neighborhood of x, we can approximate the function t(X) = πm(x) → ω(logm(x)m(X)) by a multivariate Taylor polynomial
where are Taylor coefficients of t(X). Like the Euclidean case, this series expansion allows rewriting the ILPR in Definition (3.1) as a weighted regression problem that determines the intrinsic local estimator (I;E).
Definition 3.2 (Intrinsic local estimator (ILE)). Let (ℳn, g) be a Riemannian manifold and the sample . Then the ILPR Estimator of degree k in (ℳn, g) at for the data Σ, , is the solution of the following optimization problem:
where Kh(·) is a Kernel function with smoothing parameter h, while distg, exp, and π are the geodesic distance, exponential map, and parallel transport induced by the metric g. We denote , then is an estimator of for j = 1, 2, ⋯ , k and is an estimator of m(x).
In the Definition (3.2), for simplicity in the notation we use . Each instance of in this article is determined by the variables x, ω, k, h, Σ, and g. While this article employs the Leave-One-Out Cross-Validation (LOOCV) method for selecting smoothing parameters, alternative variants may be preferable in practice [1–4]. The optimal bandwidth corresponds to the minimum of the LOOCV score CV(h)
When we choose to disregard the geometric characteristics of the data represented by Σ, effectively treating the Riemannian manifold of the responses (ℳn) as a Euclidean subspace within Mn(ℝ), the ILPR as defined in Definition (3.3) simplifies to the LPR as described in Definition (2.3.1).
3.2 Action of an isometric map on ILPR
This section explores the action of isometric maps on ILPR estimation. This study is relevant because, as highlighted earlier, the ILPR estimation relies on solving a weighted least squares optimization problem, whose complexity depends on the Riemannian manifold to which the data responses belong. An isometric map allows for connecting the original ILPR estimation with one on an induced Riemannian manifold, which has a more straightforward geometrical structure than the original, thereby requiring a simpler optimization problem, making the theoretical exploration more tractable (Figure 4).
Figure 4. and (f(ℳn), g) are IRMs by the isometry f. The dots on the manifolds ℳn and f(ℳn) represent the sample Σ and f(Σ), respectively. The discontinued lines represent the ILEs and on their respective Riemannian manifolds. The ILEs are related by the isometric map f.
In precise mathematical terms, we will address the following problems in this section. If is the ILE for the sample situated on a Riemannian manifold . And is the induced ILE on the induced sample with responses on the Isometric Riemannian manifold (f(ℳn), g) by the an isometric maps f. What is the relationship that bridges between the ILE () and the induced ILE ()? The following theorem provides the answer to this question.
Theorem 3.4 (Action of isometric maps on ILPR). Under the following conditions:
(a) The smooth Riemannian manifolds and (f(ℳn), g) are related by the isometry embedding f.
(b) The embedding satisfies f(ℳn)⊆Ms(ℝ) where s is an integer multiple of n.
(c) For all , both the intrinsic local estimator and the induced exist.
There exist a transformation matrix A∈GLs(ℝ) such that for all V∈Tωℳn, the differential . Consequently, the relationship between the ILEs is given by:
for j = 0, 1, 2, …, k, where δj0 is the Kronecker delta and denote the j-fold Kronecker product of the p×p identity matrix.
Condition (c) in Theorem 3.3 is a geometric existence assumption ensuring that the intrinsic optimization problems defining both and are well posed. More precisely, it requires that the Kernel support be contained in a geodesic neighborhood centered at ω whose radius is strictly smaller than the injectivity radius of the manifold. Within such a neighborhood, geodesics are unique, the logarithm map is single-valued, and curvature is sufficiently controlled to guarantee local convexity of the intrinsic least squares objective. Under these conditions, the intrinsic local estimator exists and is uniquely defined. Since the mapping f is an isometry, injectivity radius, geodesic convexity, and curvature bounds are preserved under the embedding, implying that existence of the intrinsic local estimator on is equivalent to existence on its isometric image (f(ℳn), g). For the manifold of symmetric positive definite matrices SPD(n) endowed with the Log–Euclidean or Log–Cholesky metrics, these geometric conditions hold globally, providing a concrete and practically relevant class of manifolds for which condition (c) is automatically satisfied.
The proof of Theorem 3.3 is provided in Supplementary Appendix D. It relies on the fact that isometric maps between Riemannian manifolds allow the introduction of an explicit change of variables between the weighted least squares optimization problems defining the intrinsic local estimator and its induced counterpart. After constructing this change of variables, the stated relationship between the estimators follows directly. These results provide a key theoretical tool for analyzing and solving ILPR problems on isometric Riemannian manifolds and serve as the foundation for deriving closed-form expressions for intrinsic local estimators on manifolds equipped with Euclidean pullback metrics, ultimately leading to consistency results for the ILPR framework.
3.3 The exact solution of the ILPR–EPM
In this section, we present the closed-form analytical expression for the multivariate ILPR equipped with a metric on the EPM family, as defined in Definition (2.3). The key outcome of this section is Theorem (2.4), which is a consequence of Theorem (2.3) and the Lemma (2.8).
Theorem 3.4 (ILPR–EPM closed expression). Under the following conditions
(a) is a smooth Riemannian manifold equipped with an Euclidean pullback metric (EPM) gf induced by the isometry f.
(b) f(ℳn)⊆Ms(ℝ) where s is an integer multiple of n.
(c) For all , the intrinsic local estimators (ILEs) and exist.
The closed analytical expression of the and are:
where
The proof of Theorem (3.4) is elaborated in Supplementary Section E. An important observation from Equations 25, 26 in Theorem (3.4) is that
In short, this theorem provides a closed analytical expression for the intrinsic local estimator (ILE) on an arbitrary Riemannian manifold with an Euclidean pullback metric (EPM). To illustrate this with an example, let us consider the case of local constant regression, where following the theorems, the estimator is given by , and the metric gf is induced by an isometric map f. Theorem 3.4 shows that evaluating can be achieved by instead evaluating the corresponding estimator on the response manifold f(Σ), which often will involve using the simpler geometry of the target manifold. Moreover, this estimator admits a closed-form solution given by Equation 26, with the terms f(Y) and W defined in Equation 27. For the local constant setting, explicit formulas for e0 and X are also provided, completing the analytical characterization.
from this expression can compute as
now we can compute the local constant estimator in the following manner
Finally by Equation 25 we can compute as follows
It is interesting to note that the expression for the induced local estimator on the Euclidean space resembles that of the Nadaraya-Watson smoothing, which is what we would anticipate. In general, the estimator on the Euclidean space is similar to the LPR. Specifically, when we are dealing with a univariate covariate (where p = 1), the vectorization of the equation in Equation 26 provides us with the exact expression for LPR in the Euclidean space.
The SPD manifold equipped with an EPM is a specific case covered by Theorem (3.4). This fact is particularly beneficial since the theorem provides the exact analytical expression of the ILE for any Riemannian Metric on the EPM family; this includes metrics used in research like the Log–Cholesky, Log–Euclidean, Power-Euclidean, and Cholesky, which are discussed in Section 2.1.3 and known to be EPMs, as established by Lemma (2.4). For example, for the Log–Cholesky metric, we demonstrate in Lemma (2.4) that an isometric map exists, allowing the Log–Cholesky metric to be rewritten as the pullback of the Euclidean metric. Then, from Theorem (3.4) and the expression of the isometric maps, we conclude that the Local Constant Estimator of the Log–Cholesky metric has the following closed expression:
where ℒ and ℒ−1 are the Cholesky and inverse Cholesky maps respectively.
Several Riemannian metrics commonly used in applications—particularly those defined on the SPD manifold—can be expressed as EPMs through smooth isometric embeddings. Prominent examples include the Log–Euclidean and Log–Cholesky metrics [16, 27], both of which are exact EPMs (Lemma 2.4). Other widely used geometries, such as the affine-invariant metric [12] or, more generally, the mixed-Euclidean family [32], are not globally expressible as pullbacks of the Euclidean metric. Nevertheless, their metric tensors can be locally approximated by an EPM through a second-order expansion around a convenient reference point. Furthermore, by the Nash Embedding Theorem [33], any smooth Riemannian manifold admits a global isometric embedding into a sufficiently high-dimensional Euclidean space, implying that every Riemannian geometry can, at least in principle, be represented as a pullback of the Euclidean metric under an appropriate embedding. Within this broader geometric context, the EPM family provides a unified theoretical framework that encompasses both exact and locally approximate realizations of the metrics most frequently encountered in practice. Consequently, Theorem 3.4—which establishes a closed analytical expression for the ILPR under an EPM—provides a crucial step toward generalizing the statistical theory of intrinsic local smoothing. Since any smooth Riemannian metric can be embedded or locally approximated by an EPM, this result forms a rigorous foundation for establishing asymptotic and consistency properties of the ILPR estimator under broad and geometrically general conditions.
3.4 Consistency of the ILPR estimator
Yuan et al. [21] established the consistency of the ILPR estimator on the SPD manifold equipped with the Log–Euclidean and Affine–Invariant metrics in the univariate case. However, the general statistical properties—particularly the consistency of the ILPR estimator for multivariate descriptors and over arbitrary Riemannian manifolds—remain largely unexplored. Here, we address this gap by establishing the statistical consistency of the ILPR estimator on a Riemannian manifold (ℳ, g) endowed with an EPM in the local linear case k = 1.
The derivation of the explicit bias formula builds on the geometric property that, under an EPM, the manifold (ℳ, g) can be viewed as the image of an Isometric embedding f:ℳ → ℝn. This isometry allows us to push forward the regression problem from the curved manifold to an equivalent one in Euclidean space, where the local geometry is flat, and the analytical tools of classical multivariate local linear regression apply. The idea of the proof is therefore twofold. First, we use the embedding f to map both the response and the estimator from the manifold to the Euclidean space. In this Euclidean setting, the ILPR estimator reduces to its Euclidean counterpart , for which Theorem 2 provides a closed-form expression. This enables a direct computation of the asymptotic bias term, since the expectation of the estimator can be expanded in powers of the bandwidth h, and the leading term depends on the curvature (second derivatives) of the regression map in Euclidean coordinates. Second, by applying the continuous mapping theorem and using the smoothness of the isometry f, we pull back this result to the manifold, obtaining the bias behavior of in the original Riemannian setting.
Conceptually, the bias formula reveals how the local curvature of the regression function—expressed in the embedded Euclidean coordinates—governs the small-sample deviation of the estimator. Thus, even in finite samples, the explicit form of the bias quantifies how the geometry of the manifold and the bandwidth choice jointly determine the accuracy of the ILPR–EPM estimator. The full proof of the theorem and all supporting lemmas are presented in Supplementary Appendix G.
Theorem 3.5 (Statistical Consistency of the ILPR–EPM). Assume that:
(a) A sample consists of independent and identically distributed observations from a population (X, Y) with responses on the smooth Riemannian manifold , where gf is an EPM induced by the isometric map f.
(b) The embedding satisfies f(ℳn)⊆Ms(ℝ), with s an integer multiple of n.
(c) For all , the intrinsic local estimators (ILEs) and exist for the given configurations.
(d) The point x lies in the interior of the support of the distribution FX.
(e) Regularity conditions (R1)–(R3) in Supplementary Section F are satisfied.
Then, the ILEs exhibit the following asymptotic behavior for fixed n and p as N → ∞:
where oP denotes the order term in probability (convergence in probability).
Conditions (a–c) ensure the existence of the intrinsic local estimators (ILEs). Conditions (d) and (e) correspond to a standard interior-point assumption in the local polynomial regression literature (e.g., Ruppert and Wand [2]). Their primary role is to ensure that the local design matrix is asymptotically invertible, that the Kernel moments behave in the standard manner, and that the ILPR estimator admits the asymptotic expansions used in Equations (69–70) of the Supplementary material. In the absence of these conditions, boundary effects would arise, and boundary-corrected estimators or an alternative asymptotic theory would be required.
Theorem 3.5 establishes the statistical consistency of the ILPR estimator in the local linear setting for responses on Riemannian manifolds endowed with an EPM, including the Log–Euclidean and Log–Cholesky metrics as special cases. This result broadens the class of geometries for which ILPR consistency can be rigorously justified, thereby linking classical Euclidean local regression theory with intrinsic statistical analysis on curved manifolds.
3.5 Implementation of ILPR–EPM and its computational complexity upper bound
Algorithm 1 provides pseudocode illustrating how to evaluate the multivariate ILPR estimator on a Riemannian manifold equipped with an EPM. This incorporates a step in which we apply Tikhonov Regularization to enhance numerical stability during the regression analysis. An implementation of it using Octave or MATLAB is available on GitHub [43]. From the theoretical point of view, this pseudocode allows us to establish an upper bound on the computational complexity of performing ILPR given an isometric map f as:
where Cf(n) and denote the computational complexity for evaluating the isometric maps and their inverse on a single matrix of the sample Yi, respectively. The complete derivation of this bound can be found in Supplementary Appendix H. As we can conclude from this expression, the computational complexity depends on the isometric map used. For the case of local linear estimation, the embedded manifold dimension matches the embedding manifold with a single covariate, and the complexity simplifies to:
Specifically in the case of the SPD manifold, the Log–Cholesky and Log–Euclidean metrics are EPMs through their isometric maps:
where fLC and fLE are the isometric maps of the Log–Cholesky and Log–Euclidean metric. The computational complexity for these embeddings are:
resulting in a complexity of for the Log–Cholesky metric and O(2Nn3) for the Log–Euclidean metric when N its large. Thus, in the worst-case scenarios, the Log–Cholesky metric is expected to be approximately 33.3% faster than the Log–Euclidean metric. However, this efficiency does not necessarily imply superior regression accuracy. Confirming these theoretical predictions and assessing accuracy requires conducting simulations on SPD-valued data with different manifold and covariate dimensions, which is the focus of the next section.
4 Simulations on the SPD manifold
We derive a closed-form expression for the ILPR estimator on Riemannian manifolds equipped with an EPM. Additionally, for the SPD manifold, this closed expression applies to the Log–Cholesky and Log–Euclidean metrics, since they are also EPMs. These metrics are known to require fewer matrix logarithm computations, making them more computationally efficient than the affine-invariant metric [14, 29]. To evaluate their performance on simulated data, we compare the ILPR estimator for the Log–Cholesky and Log–Euclidean metrics against ELPR using the affine-invariant metric, as the latter is a standard method in the literature. We aim to assess the trade-off between regression error and computational cost for each method as we vary the manifold and covariate dimensions.
We employ a Monte Carlo simulation to evaluate the performance of each regression method. Here, the dimensions of both the manifold and covariate are constrained to (p, n)∈([1, 6] × [3, 18])∩ℕ2. A specific combination of covariate and manifold dimensions (p, n) characterizes each simulation instance. We generate 100 realizations of simulated regression problems. These include scenarios with TRUE DATA, data with added noise, and ESTIMATED DATA, with responses residing in the manifold SPD(n) and covariates in ℝp. Subsequently, we compute two performance metrics for each realization and regression method: the root mean square error (RMSE) to assess the regression approach's accuracy, and the computational time (CPU Time) to measure its computational cost. We performed the simulation on a computer equipped with an AMD Ryzen 7 6800H processor, Radeon Graphics, 40.0 GB of RAM, and a 64-bit operating system. The regression methods under consideration are ILPR Log–Cholesky, ILPR Log–Euclidean, and ELPR affine-invariant.
To generate the TRUE DATA responses, we adopt a model that generalizes the one proposed by Yuan [21], where the logarithm of the function m(X, p, n) forms a symmetric matrix whose elements depend linearly on the covariate vector X. This selection enables an investigation into the performance of the local linear ILPR across different regression methods. The coefficients governing the linear relationship between the matrix elements and the covariate are randomly selected uniform vectors, each with absolute values less than or equal to one. This TRUE DATA model extends the model used in Yuan [21]. The following function mathematically defines the model:
here, δij is the Kronecker delta, is a vector with all components equal to one, and Up(0, 1) is a random variable with a uniform distribution on (0, 1). The covariate vectors are generated using the model for i = 1, ⋯ , N, where N is the sample dimension per simulation. Therefore, the TRUE DATA in a simulation with parameters (p, n) is given by . Here, m(Xi, p, n) denotes the true covariance matrix corresponding to the covariate Xi of dimension p and manifold dimension n.
To generate the DATA + NOISE samples, we introduce noise to the responses of the TRUE DATA using the Riemannian Log–Normal noise model [21, 44]. Although various approaches to contaminating a sample on the Symmetric Positive Definite (SPD) manifold with noise, such as the Log–Normal and Rician noise models, have been proposed in the literature [21, 45], this study exclusively employs the Riemannian Log–Normal model for simplicity. However, researchers can extend the methodology outlined here to accommodate any arbitrary noise model. Let us describe the Riemannian Log–Normal model as follows:
where vech−1 represents the inverse half vectorization operator [36]. Consequently, the DATA + NOISE samples are denoted as .
As previously discussed, we derive the ESTIMATED DATA from the noisy data using only the Local Linear approach across three distinct methods: ILPR Log–Cholesky, ILPR Log–Euclidean, and ELPR affine-invariant. We compute the ILPR estimator for the Log–Cholesky and Log–Euclidean metrics and evaluate it at Xi using Algorithm 1 for each metric's corresponding isometry f, as delineated in Table 3. The optimal smoothing parameter is determined for each regression method using Leave-One-Out Cross-Validation (LOOCV), employing Equation 23, the intrinsic Riemannian distance induced by each considered metric, and the Gaussian Kernel function [1, 3]. The simulations focus exclusively on the local linear case (k = 1). Consequently, the ESTIMATED DATA for each metric gf (Log–Cholesky, Log–Euclidean, affine-invariant) is obtained as .
4.1 Simulation I. DTI type data
To assess the performance of each regression method for each simulation, we compute the root mean squared error (RMSE) between the TRUE DATA and the ESTIMATED DATA for each metric gf. The formula used to compute this metric is given by:
where distAI represents the Riemannian distance induced by the affine-invariant metric. To calculate the CPU time.
We present the Monte Carlo simulation results in two independent subsections. The first Subsection focuses on the manifold SPD(3), representing the manifold of Diffusion Tensor fields in DTI [12, 27]. Here, we visually represent the regression problems using the ellipsoidal representation of the data [46, 47]. Meanwhile, in the second Subsection, we study the quantitative and qualitative behavior of different regression methods as the manifold and covariate dimensions vary. In this case, the commonly used ellipsoidal method in DTI is no longer sufficient to visualize high-dimensional data [SPD(n), n≫3]. To address this issue, we will employ two complementary dimensional reduction techniques, Riemannian Stochastic Neighborhood Embedding (Rie-SNE) and linearized programmable-gain amplifier (PGA), for visualizing high-dimensional regression problems [48, 49]. We expect a suitable regression method to distribute the ESTIMATED DATA close to the TRUE DATA in both Rie t-SNE and linearized PGA visualization techniques. Therefore, the DATA+NOISE is isolated, and the regression method can be considered successful in capturing the underlying structure of the high-dimensional data.
We assess the performance of the different non-parametric regression methods described in the previous section on the space of 3 × 3 SPD matrices, a manifold of particular relevance in diffusion tensor imaging (DTI). In DTI, water diffusion in brain tissue is modeled by SPD tensors, whose symmetry reflects the bidirectional nature of diffusion, and positive definiteness ensures physically meaningful, non-negative measurements. Riemannian geometry provides the necessary framework for statistical analysis on this non-Euclidean manifold, enabling robust inference of brain structure and function [12, 50, 51].
To benchmark the proposed regression models, we performed Monte Carlo simulations on the SPD(3) manifold using a scalar covariate. A total of N = 100 realizations of synthetic data were generated from a ground-truth model (Equation 38), with noise added according to Equation 39. Root mean square error (RMSE) and CPU time were computed for each method.
Figure 5A visualizes a representative realization using ellipsoidal glyphs: the TRUE DATA, noisy observations (DATA + NOISE), and the ESTIMATED DATA obtained via three non-parametric regressors—intrinsic local linear estimators using the Log–Cholesky and Log–Euclidean metrics, and an extrinsic local linear estimator using the affine-invariant metric.
Figure 5. Figure compares the performance of three non-parametric regression methods on the SPD(3) manifold using Monte Carlo simulation. (A) The ellipsoidal representation shows a single realization of TRUE DATA and DATA+NOISE. All the methods have similar performance in dealing with noise. The horizontal axis is not shown but represents a scalar covariate. (B) Figure compares the RMSE and CPU-Time for each method using boxplots and their corresponding probability distributions. For DTI simulated data, the Log–Euclidean method provides the best trade-off between error and time for performing nonlinear regression.
All estimators effectively denoise the observations and recover the underlying nonlinear structure. As shown in Figure 5B, the affine-invariant estimator achieves the lowest RMSE, followed closely by the Log–Euclidean method. The Log–Cholesky estimator shows slightly higher RMSE but remains competitive. In terms of computational efficiency, the Log–Euclidean method is the fastest, while the affine-invariant method incurs the highest cost. The Log–Euclidean estimator thus offers the most favorable trade-off, combining low estimation error with reduced computational burden. These findings highlight its suitability for practical applications involving SPD-valued data.
4.2 Simulation II: high-dimensional regimes
We investigate the scalability of the nonparametric regression methods considered as manifold and covariate dimensions increase, using Monte Carlo simulations on SPD manifolds. Simulations were conducted for all integer pairs (p, n), with p∈[1, 6] denoting the number of covariates and n∈[3, 18] the SPD matrix size.
Figure 6A shows a scatter plot comparing the performance of the intrinsic local linear estimators with Log–Cholesky (blue) and Log–Euclidean (red) metrics against the extrinsic affine-invariant estimator (yellow). Each dot is the normalized RMSE and CPU time, scaled by the maximum observed value of the metric. As the manifold dimension increases, the computational costs rise across all methods, with the affine-invariant estimator exhibiting the highest time complexity. At the same time, the Log–Cholesky method remains the most scalable. The RMSE is comparable across all methods. The affine-invariant estimator yields a lower error, but all methods provide consistent results across different dimensional settings.
Figure 6. (A) Ellipsoidal visualizations of TRUE DATA, DATA+NOISE, and ESTIMATED DATA under three regression methods: Log–Cholesky (blue), Log–Euclidean (red), and affine-invariant (yellow). (B) Performance trade-off shown via heatmaps: The first column shows RMSE, and the second column shows CPU time, both on a log scale.
Figure 6B complements this analysis by showing log-scaled heatmaps of RMSE and CPU time. As we can infer, the RMSE grows modestly with the manifold dimension and more significantly with the number of covariates. CPU time increases across all methods with increasing dimensionality, with Log–Cholesky showing minimal growth and affine-invariant exhibiting sharp escalation. Log–Euclidean performance lies between these extremes. These trends reflect the computational demands of matrix logarithm operations inherent to each metric [27, 29].
To further illustrate estimator behavior in high-dimensional settings, Figure 7 shows visualizations for SPD(18) using Log–Cholesky Rie t-SNE (top row) and linearized PGA (bottom row). In all cases, the ESTIMATED DATA (green) aligns closely with the TRUE DATA (red) and diverges from the DATA+NOISE (black), confirming the effectiveness of the regression methods. While both techniques reveal meaningful structure, Log–Cholesky Rie t-SNE more effectively captures nonlinear patterns, including spiral trajectories, even in noisy conditions.
Figure 7. Riemannian visualizations in high dimensions. Log–Cholesky Rie t-SNE (top) and linearized PGA (bottom) applied to a single Monte Carlo realization on SPD(18) with scalar covariate. TRUE DATA (red) lies near ESTIMATED DATA (green) and is distant from DATA+NOISE (black), indicating successful denoising. Log–Cholesky t-SNE reveals spiral patterns in the data, highlighting its ability to capture nonlinearity.
5 Intrinsic local polynomial regression of lifespan EEG source connectivity
The aim of this section is to demonstrate a practical application of the ILPR framework developed in this study using real neurophysiological data. Our objective is to estimate developmental trajectories of the cortical cross-spectrum across the human lifespan. In neuroscience, the cross-spectrum provides a frequency-resolved covariance representation of cortical activity and is widely used to quantify functional connectivity between distributed neural generators. It also forms the foundation for more advanced descriptions of neural interactions, including effective connectivity and directed causal flows obtained through Spectral Granger–Geweke causality and related frameworks [5, 8].
Mathematically, for each frequency ω, the cross-spectrum is the covariance matrix of the Fourier transform of cortical activity. Collecting these matrices across frequencies yields a cross-spectral tensor where Nc denotes the number of cortical regions and Nω the number of sampled frequencies. Each slice is symmetric positive definite (SPD), and the full tensor resides in a direct product of SPD manifolds after appropriate regularization. Band-specific cross-spectra in the alpha (7–13 Hz) and delta (0.3–4 Hz) ranges reflect the functional organization of the oscillatory generators that produce these rhythms. Alpha rhythms are among the dominant signatures of human cortical dynamics, associated with attention, sensory gating, and thalamocortical coordination. In contrast, delta rhythms reflect slower, large-scale integrative processes that are highly relevant to development, aging, and neuropathology [8].
For the present analysis, we use the Xi-AlphaNET DataSet~ 60 GB, which contains precomputed source-level inverse solutions and cross-spectral tensors derived from the HarMNqEEG project [5, 8]. The dataset comprises N = 1, 965 subjects aged 5–100 years, recorded across more than nine countries using heterogeneous EEG systems. The Xi-AlphaNET inverse solution integrates anatomical connectivity, F-TRACT conduction delays, and sparse hierarchical priors, producing physiologically constrained reconstructions of ξ- and α-processes at the cortical level [8]. These source estimates are summarized using the HCP–MMP1 parcellation [52], yielding Nc = 360 cortical regions and SPD cross-spectral tensors evaluated at Nω = 49 frequencies sampled from 0–19 Hz. For each subject, the delta and alpha cross-spectra are selected at the frequency of maximal spectral power within their respective bands.
In the next step, we apply ILPR in its local linear form, treating the cross-spectrum S(j) as the SPD-valued response variable and age (aj) as the scalar descriptor. This yields smooth lifespan trajectories of alpha- and delta-band covariance structures under both the Log–Euclidean and Log–Cholesky metrics. For electrophysiological interpretability and visualization, we compute partial coherence (the coherence matrix of the precision matrix of the cross-spectrum, PCoh) from the estimated trajectories and apply a spatial permutation null test to retain only connections with statistically significant values. These selected connections are rendered on the ICBM152 cortical surface generated using Brainstorm. To ensure statistical rigor, both permutation procedures—the spatial null used for connection selection and the estimator-wise comparison between Log–Euclidean and Log–Cholesky estimators—were performed with a significance level α = 0.05 using N = 5, 000 permutations to generate the null distributions. Finally, to quantitatively assess whether the two intrinsic estimators differ, we conduct an additional permutation test on the estimator-wise lifespan trajectories, enabling us to determine whether any observed Log-Euclidean and Log–Cholesky differences are statistically significant.
The developmental trajectories obtained using ILPR for both estimators are summarized in Figure 8. This analysis simultaneously characterizes lifespan changes in cortical connectivity and benchmarks the new ILPR Log–Cholesky estimator introduced in this study against the established ILPR Log–Euclidean estimator [21]. The comparison enables us to evaluate whether the proposed LC formulation, which operates directly on Cholesky factors, provides advantages without compromising statistical or neurophysiological fidelity. Panel A shows the lifespan trajectory of alpha-band partial coherence. Both LE and LC estimators recover the same dominant posterior architecture: alpha connectivity is tightly concentrated in the occipital cortex, linking primary and associative visual regions. This posterior–sensory topology is consistent with the canonical generators of alpha rhythms and remains stable across age bins (0–20, 20–40, 40–60, 60–80, 80–100 years). The trajectories estimated by LC are visually indistinguishable from those of LE, demonstrating that the new estimator faithfully reproduces the expected alpha connectivity profile across the lifespan. Panel B displays the results for the delta band. Delta connectivity exhibits a much broader spatial distribution, with a marked posterior–anterior gradient and strong frontal involvement, consistent with the role of slow rhythms in large-scale integrative processes, particularly during development and aging. Again, the LC and LE trajectories overlap nearly perfectly, underscoring the robustness of the LC estimator. Panel C quantitatively evaluates differences between the two estimators using permutation tests. For both alpha and delta bands, the empirical statistic T falls well within the null distribution generated by 5, 000 permutations at significance level α = 0.05, yielding p = 1.0. These results demonstrate that LC and LE are statistically indistinguishable at the population level. Panel D further confirms this equivalence by showing that the average absolute difference |PcohLE−PcohLC| across all 360 × 360 parcel pairs is uniformly minimal and exhibits no anatomical bias.
Figure 8. Lifespan trajectories of alpha- and delta-band partial coherence estimated from Xi–AlphaNET. (a, b) We analyzed N = 1, 965 participants (ages 5–100 years) from the HarMNqEEG dataset using Xi–AlphaNET to estimate cortical sources. The resulting cross-spectrum tensors were summarized using the HCP–MMP1 parcellation (NROI = 360). For each subject, alpha- and delta-band matrices were extracted at their peak spectral frequencies and converted into partial coherence (Pcoh). Lifespan trajectories for the Log–Euclidean (LE) and Log–Cholesky (LC) estimators show only connections that survive a spatial permutation null test (α = 0.05). Across the lifespan, LE and LC yielded highly consistent Pcoh estimates. Alpha-band connectivity remained strongly concentrated in occipital regions, whereas delta-band connectivity displayed a broader distribution with a posterior–anterior gradient toward the frontal cortex. (c, d) Permutation testing showed no systematic LE–LC differences, and the mean contrast maps (|PcohLE|−|PcohLC|) confirmed minimal divergence across the parcellation.
Taken together, these findings demonstrate that the LC estimator is neurophysiologically indistinguishable from the classical LE estimator introduced by Yuan et al. [21]: both recover the expected posterior dominance of alpha rhythms, the widespread frontal-to-posterior structure of delta rhythms, and the gradual changes in connectivity across the lifespan. Crucially, despite their statistical and biological equivalence, the LC estimator is approximately 33.3% faster to compute than the LE estimator in our implementation. This gain in computational efficiency, combined with full preservation of statistical behavior, establishes ILPR–LC as a preferred intrinsic regression method for large-scale SPD-valued neurophysiological datasets.
6 Conclusion and discussion
Our research provides several results-oriented solutions to address several gaps in the state-of-the-art of ILPR, as outlined in the introduction. These relate to the theoretical properties and practical applicability of this smoothing method [21]. To achieve this, we first offer a rigorous definition of ILPR that accommodates multivariate covariates in arbitrary Riemannian manifolds, thereby generalizing the original formulation to more general settings [21]. We then study the action of an isometric map on ILPR, which allows us to explore the geometrical intricacies of the weighted least squares optimization problem that defines this smoothing method. We derive a closed analytical expression for the multivariate ILPR estimator on Riemannian Metrics equipped with an EPM. We provide a pseudocode algorithm for implementation and an upper bound on its computational complexity. We prove that some Riemannian metrics used to study the SPD manifold constitute, in fact, an EPM, notably the Log–Cholesky and Log–Euclidean metrics. For these metrics, the bound on the computational complexity of ILPR-EPM allows us to establish the Log–Cholesky as the most theoretically efficient.
To demonstrate the practical implications of our theoretical results, we first conducted extensive empirical tests on simulated Riemannian-valued data with responses on the manifold of symmetric positive definite (SPD) matrices across different dimensions. These simulations were designed to reproduce and generalize key settings previously considered by Yuan et al. [21]. Using the closed-form expression derived for the ILPR estimator under the Euclidean Pullback Metric (EPM) framework, we computed multivariate ILPR estimators under both the Log–Cholesky and Log–Euclidean geometries. Performance was evaluated by explicitly quantifying the trade-off between estimation accuracy and computational cost, and by comparing against Euclidean local polynomial regression (ELPR) under the affine-invariant metric, which is commonly regarded as a reference geometry on the SPD manifold due to its congruence invariance properties [12, 21, 27, 42]. Monte Carlo simulations spanning increasing manifold and covariate dimensions show that the Log–Cholesky formulation consistently achieves an optimal balance between accuracy and computational efficiency as dimensionality grows. Consistent with previous findings, we observe that the Log–Euclidean metric performs slightly better on low-dimensional manifolds such as SPD(3), which are commonly used to model diffusion tensor imaging (DTI) data [27]. However, this advantage does not persist as the dimensionality of either the manifold or the covariates increases, at which point the Log–Cholesky metric exhibits superior scalability and numerical stability.
Crucially, these simulation-based conclusions are corroborated by our analysis of real neurophysiological data. Applying ILPR to source-level EEG cross-spectral covariance matrices estimated from the HarMNqEEG dataset (N = 1,965, [5]) and summarized via a 360-region cortical parcellation, we find that Log–Cholesky and Log–Euclidean estimators yield statistically indistinguishable lifespan trajectories of cortical functional connectivity at the population level. Despite this equivalence in statistical and neurophysiological outcomes, the Log–Cholesky formulation achieves a substantial reduction in computational cost, confirming the advantages observed in simulation in a realistic, high-dimensional setting. Taken together, these results suggest that, for large-scale normative studies of brain connectivity across the lifespan—such as those exemplified by the HarMNqEEG project but conducted in source space and at high spatial resolution—the Log–Cholesky metric provides an especially suitable geometric choice for intrinsic regression [5]. Its favorable balance between computational efficiency and statistical fidelity makes it well-suited for future normative modeling efforts on high-dimensional SPD-valued neuroimaging data, and motivates further research into its use for population-level inference, longitudinal analyses, and large cohort studies in computational neuroscience.
We do, however, acknowledge several limitations in our simulations. These include: (1) The analysis is constrained to a limited set of metrics on the SPD manifold. There are many other important metrics, such as Wasserstein and Bogoliubov–Kubo–Mori metrics, as well as Stein divergences, all of which are relevant in various research fields of signal processing or physics [34, 53]. (2) The generative model used in our simulations to generate synthetic data on the SPD does not fully incorporate real-world complexities, and further research is needed to assess the consistency of these results in more realistic noise scenarios. (3) Due to computational limitations, we limit the SPD manifold dimension to less than 20, which is a common scenario in the analysis of covariance matrices in signal processing. However, it is also important to establish how the different regression methods scale to higher dimensions that are beyond the boundaries of the simulations.
In summary, our study explores ILPR within the isometric maps framework, unveiling new theoretical results for arbitrary Riemannian manifolds and, in particular, a closed-form expression for the estimator and its consistency under the family of EPM. We show a proof of concept over the SPD manifold, where we show that the exact analytical expression of the ILPR estimator for the Log–Cholesky metric can be derived using the mathematical formalism introduced, and we show using simulation benchmarks that this estimator is computationally efficient, which makes it ideal for large studies on SPD-valued data.
Data availability statement
The original contributions presented in the study are included in the article/Supplementary material, further inquiries can be directed to the corresponding author.
Author contributions
RR: Conceptualization, Formal analysis, Investigation, Methodology, Software, Validation, Visualization, Writing – original draft, Writing – review & editing. YW: Investigation, Software, Validation, Writing – review & editing. ML: Investigation, Software, Validation, Writing – review & editing. GF: Investigation, Validation, Writing – review & editing. LM: Conceptualization, Formal analysis, Funding acquisition, Investigation, Project administration, Resources, Supervision, Validation, Visualization, Writing – review & editing. PV: Conceptualization, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Visualization, Writing – original draft, Writing – review & editing.
Funding
The author(s) declared that financial support was received for this work and/or its publication. This study was supported in part by the STI 2030-Major Projects under Grant 2022ZD0208500; in part by the National Natural Science Foundation of China under Grant W2411084; in part by the Key Research and Development Projects of the Science and Technology Department of Chengdu under Grant 2024-YF08-00072-GX; in part by the University of Electronic Science and Technology of China (UESTC) under Grant Y0301902610100201; and in part by the Chengdu Science and Technology Bureau Program under Grant 2022-GH02-00042-HZ. Chinese National Science Program of UESTC: No. Y0301902610100201. LM gratefully acknowledges personal support from the Hundred Talents Program of the University of Electronic Science and Technology of China, the Outstanding Young Talents Program (Overseas) of the National Natural Science Foundation of China, and talent programs of Sichuan Province and Chengdu Municipality.
Acknowledgments
The authors thank the reviewers for their valuable suggestions. Thanks also to Maria Luisa Bringas-Vega and Lidice Galan for all the support.
Conflict of interest
The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The author PV declared that they were an editorial board member of Frontiers at the time of submission. This had no impact on the peer review process and the final decision.
Generative AI statement
The author(s) declared that generative AI was not used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fams.2025.1690557/full#supplementary-material
References
1. Fan J, Gijbels I. Local Polynomial Modelling and Its Applications. London: Routledge (2018). doi: 10.1201/9780203748725
2. Ruppert D, Wand MP. Multivariate locally weighted least squares regression. Ann Stat. (1994) 22:1346–70. doi: 10.1214/aos/1176325632
4. Kent JT. New directions in shape analysis. In: The Art of Statistical Science, Vol. 115. New York, NY: Wiley (1992).
5. Li M, Wang Y, Lopez-Naranjo C, Hu S, Reyes RCG, Paz-Linares D, et al. Harmonized-multinational qEEG norms (HarMNqEEG). Neuroimage. (2022) 256:119190. doi: 10.1016/j.neuroimage.2022.119190
6. Lopez Naranjo C, Razzaq FA Li M, Wang Y, Bosch-Bayard JF, Lindquist MA, et al. EEG functional connectivity as a Riemannian mediator: an application to malnutrition and cognition. Hum Brain Mapp. (2024) 45:e26698. doi: 10.1002/hbm.26698
7. Reyes RG, Valdes-Sosa PA. Mapping effective connectivity: In:Grafman JH, , editor. Encyclopediaof the Human Brain. Amsterdam: Elsevier (2025). p. 589–611. doi: 10.1016/B978-0-12-820480-1.00184-4
8. Reyes RG, Gonzales AA, Wang Y, Jin Y, Minatti L, Valdes-Sosa PA. Lifespan mapping of EEG source spectral dynamics with ξ- α NET. bioRxiv. (2025). doi: 10.1101/2025.02.21.639413
9. Wang Y, Ephremidze L, Reyes RG, Valdes-Sosa P. Exponential Speedup of the janashia-lagvilava matrix spectral factorization algorithm. arXiv [preprint]. arXiv:2503.02553. (2025). doi: 10.48550/arXiv.2503.02553
10. Aydore S, Pantazis D, Leahy RM. A note on the phase locking value and its properties. Neuroimage. (2013) 74:231–44. doi: 10.1016/j.neuroimage.2013.02.008
11. Tuzel O, Porikli F, Meer P. Pedestrian detection via classification on Riemannian manifolds. IEEE Trans Pattern Anal Mach Intell. (2008) 30:1713–27. doi: 10.1109/TPAMI.2008.75
12. Pennec X, Fillard P, Ayache N. A Riemannian framework for tensor computing. Int J Comput Vis. (2006) 66:41–66. doi: 10.1007/s11263-005-3222-z
13. Dai X, Müller HG. Principal component analysis for functional data on Riemannian manifolds and spheres. Ann Stat. (2018) 46(6B):3334–61. doi: 10.1214/17-AOS1660
14. Dryden IL, Koloydenko A, Zhou D. Non-Euclidean statistics for covariance matrices, with applications to diffusion tensor imaging. Ann Appl Stat. (2009) 3:1102–23. doi: 10.1214/09-AOAS249
15. Lee H. Robust extrinsic regression analysis for manifold valued data. arXiv [preprint]. (2021). arXiv:2101.11872. doi: 10.48550/arXiv.2101.11872
16. Lin L, Niu M, Cheung P, Dunson D. Extrinsic Gaussian processes for regression and classification on manifolds. Bayesian Anal. (2019) 14:887–906. doi: 10.1214/18-BA1135
17. Lin L, St Thomas B, Zhu H, Dunson DB. Extrinsic local regression on manifold-valued data. J Am Stat Assoc. (2017) 112:1261–73. doi: 10.1080/01621459.2016.1208615
18. Lin Z, Yao F. Intrinsic Riemannian functional data analysis. Ann Stat. (2019) 47:3533–77. doi: 10.1214/18-AOS1787
19. Petersen A, Müller HG. Fréchet regression for random objects with Euclidean predictors. Ann Stat. (2019) 47:691–719. doi: 10.1214/17-AOS1624
20. You K, Park HJ. Re-visiting Riemannian geometry of symmetric positive definite matrices for the analysis of functional connectivity. Neuroimage. (2021) 225:117464. doi: 10.1016/j.neuroimage.2020.117464
21. Yuan Y, Zhu H, Lin W, Marron JS. Local polynomial regression for symmetric positive definite matrices. J R Stat Soc B (Stat Methodol). (2012) 74:697–719. doi: 10.1111/j.1467-9868.2011.01022.x
22. Kent JT. New directions in shape analysis. In:Mardia KV, , editor. The Art of Statistical Science. Chichester: Wiley (1992). p. 115–27.
23. Kent JT, Ganeiber AM, Mardia KV. A new unified approach for the simulation of a wide class of directional distributions. J Comput Graph Stat. (2018) 27:291–301. doi: 10.1080/10618600.2017.1390468
24. Kim HJ, Adluru N, Collins MD, Chung MK, Bendlin BB, Johnson SC, et al. Multivariate General Linear Models (MGLM) on riemannian manifolds with applications to statistical analysis of diffusion weighted images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2014). p. 2705–12. doi: 10.1109/CVPR.2014.352
25. Lin Z, M"uller HG, Park BU. Additive models for symmetric positive-definite matrices and Lie groups. Biometrika. (2022) 110:361–79. doi: 10.1093/biomet/asac055
26. Lee JM. Introduction to Riemannian Manifolds, Vol. 176. Cham: Springer (2018). doi: 10.1007/978-3-319-91755-9
27. Arsigny V, Fillard P, Pennec X, Ayache N. Log-Euclidean metrics for fast and simple calculus on diffusion tensors. Magn Reson Med. (2006) 56:411–21. doi: 10.1002/mrm.20965
28. Barachant A, Bonnet S, Congedo M, Jutten C. Riemannian geometry applied to BCI classification. In: International Conference on Latent Variable Analysis and Signal Separation. Cham: Springer (2010). p. 629–36. doi: 10.1007/978-3-642-15995-4_78
29. Lin Z. Riemannian geometry of symmetric positive definite matrices via Cholesky decomposition. SIAM J Matrix Anal Appl. (2019) 40:1353–70. doi: 10.1137/18M1221084
30. Cáceres JLH, Comas LD, Cordovés JMA, Ortega MO, Reyes RCG, Ruíz LC, et al. Assessing LBICO, a bicoherence-derived multi-frequency index measured from EEG signals. Rev Cubana Inf Méd. (2024) 16:721.
31. Andrews DW. Asymptotic optimality of generalized CL, cross-validation, and generalized cross-validation in regression with heteroskedastic errors. J Econom. (1991) 47:359–77. doi: 10.1016/0304-4076(91)90107-O
32. Thanwerdas Y, Pennec X. The geometry of mixed-Euclidean metrics on symmetric positive definite matrices. arXiv [preprint]. arXiv:2111.02990. (2021). doi: 10.48550/arXiv.2111.02990
33. Nash J. The imbedding problem for Riemannian manifolds. Ann Math. (1956) 63:20–63. doi: 10.2307/1969989
34. Thanwerdas Y, Pennec X. Theoretically and computationally convenient geometries on full-rank correlation matrices. SIAM J Matrix Anal Appl. (2022) 43:1851–72. doi: 10.1137/22M1471729
35. Van Loan CF. The ubiquitous Kronecker product. J Comput Appl Math. (2000) 123:85–100. doi: 10.1016/S0377-0427(00)00393-9
36. Koning RH, Neudecker H, Wansbeek T. Block Kronecker products and the VECB operator. Linear Algebra Appl. (1991) 149:165–84. doi: 10.1016/0024-3795(91)90332-Q
37. Chacón JE, Duong T. Higher order differential analysis with vectorized derivatives. arXiv [preprint]. arXiv:2011.01833. (2020). doi: 10.48550/arXiv.2011.01833
38. Van Khang N, Dat DC, Tuan NTM. Taylor expansion for matrix functions of vector variable using the Kronecker product. Vietnam J Mech. (2019) 41:337–48.
39. Magnus JR, Neudecker H. Matrix differential calculus with applications to simple, Hadamard, and Kronecker products. J Math Psychol. (1985) 29:474–92. doi: 10.1016/0022-2496(85)90006-9
40. Magnus JR. On the concept of matrix derivative. J Multivar Anal. (2010) 101:2200–6. doi: 10.1016/j.jmva.2010.05.005
41. Van Khang N. Consistent definition of partial derivatives of matrix functions in dynamics of mechanical systems. Mech Mach Theory. (2010) 45:981–8. doi: 10.1016/j.mechmachtheory.2010.03.005
42. Barachant A, Bonnet S, Congedo M, Jutten C. Classification of covariance matrices using a Riemannian-based kernel for BCI applications. Neurocomputing. (2013) 112:172–8. doi: 10.1016/j.neucom.2012.12.039
43. Reyes RG. Multivariate Intrinsic Local Polynomial Regression on Isometric Riemannian Manifolds. GitHub Repository (2023). Available online at: https://github.com/ronald1129/Multivariate-Intrinsic-Local-Polynomial-Regression-on-Isometric-Riemannian-Manifolds.git
44. Schwartzman A. Riemannian normality and random rotation-invariant statistics for high angular resolution diffusion imaging. Ann Stat. (2006) 34:1996–2015.
45. Zhu H, Zhang H, Ibrahim JG, Peterson BG. Statistical analysis of diffusion tensors in diffusion-weighted magnetic resonance image data (with discussion). J Am Statist Ass. (2007) 102:1085–102. doi: 10.1198/016214507000000581
46. Masutani Y, Aoki S, Abe O, Hayashi N, Otomo K, MR. diffusion tensor imaging: recent advance and new techniques for diffusion tensor visualization. Eur J Radiol. (2003) 46:53–66. doi: 10.1016/S0720-048X(02)00328-5
47. Westin CF, Maier SE, Mamata H, Nabavi A, Jolesz FA, Kikinis R. Processing and visualization for diffusion tensor MRI. Med Image Anal. (2002) 6:93–108. doi: 10.1016/S1361-8415(02)00053-1
48. Bergsson A, Hauberg S. Visualizing Riemannian data with Rie-SNE. arXiv [preprint]. arXiv:2203.09253. (2022). doi: 10.48550/arXiv.2203.09253
49. Fletcher PT, Lu C, Pizer SM, Joshi S. Principal geodesic analysis for the study of nonlinear statistics of shape. IEEE Trans Med Imaging. (2004) 23:995–1005. doi: 10.1109/TMI.2004.831793
50. Fletcher PT, Joshi S. Riemannian geometry for the statistical analysis of diffusion tensor data. Signal Process. (2007) 87:250–62. doi: 10.1016/j.sigpro.2005.12.018
51. Minati L, Wkeglarz WP. Physical foundations, models, and methods of diffusion magnetic resonance imaging of the brain: a review. Concepts Magn Reson A. (2007) 30:278–307. doi: 10.1002/cmr.a.20094
52. Glasser MF, Coalson TS, Robinson EC, Hacker CD, Harwell J, Yacoub E, et al. A multi-modal parcellation of human cerebral cortex. Nature. (2016) 536:171–8. doi: 10.1038/nature18933
53. Sra S. Positive definite matrices and the S-divergence. Proc Am Math Soc. (2016) 144:2787–97. doi: 10.1090/proc/12953
54. Duan P, Yang F, Chen T, Shah SL. Direct causality detection via the transfer entropy approach. IEEE Trans Control Syst Technol. (2013) 21:2052–66.
55. Reyes RG, Martinez Montes E. Modeling neural activity in neurodegenerative diseases through a neural field model with variable density of neurons. bioRxiv. (2022) 2022–08.
56. Benkarim O, Paquola C, Park B-y, Royer J, Rodríguez-Cruces R, de Wael RV, et al. A Riemannian approach to predicting brain function from the structural connectome. NeuroImage. (2022) 257:119299.
Keywords: Euclidean pullback metric, isometric Riemannian manifold, Log-Cholesky metric, multivariate intrinsic local polynomial regression, positive definite manifold
Citation: Reyes RG, Wang Y, Li M, Fernández GE, Minati L and Valdes Sosa PA (2026) Multivariate intrinsic local polynomial regression on isometric Riemannian manifolds. Applications to symmetric positive data. Front. Appl. Math. Stat. 11:1690557. doi: 10.3389/fams.2025.1690557
Received: 22 August 2025; Revised: 18 December 2025;
Accepted: 29 December 2025; Published: 22 January 2026.
Edited by:
Qiwei Li, The University of Texas at Dallas, United StatesReviewed by:
Zakariya Yahya Algamal, University of Mosul, IraqPankaj Tiwari, University of Kalyani, India
Mingao Yuan, The University of Texas at El Paso, United States
Copyright © 2026 Reyes, Wang, Li, Fernández, Minati and Valdes Sosa. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Pedro Antonio Valdes Sosa, cGVkcm8udmFsZGVzQG5ldXJvaW5mb3JtYXRpY3MtY29sbGFib3JhdG9yeS5vcmdjb2xsYWJvcmF0b3J5Lm9yZw==
†ORCID: Ronald García Reyes orcid.org/0000-0001-6206-5852
Ying Wang orcid.org/0000-0002-1947-3313
Ludovico Minati orcid.org/0000-0002-2532-1674
Pedro Antonio Valdes Sosa orcid.org/0000-0001-5485-2661
Min Li1