# Least Square Approach to Out-of-Sample Extensions of Diffusion Maps

- Department of Mathematics and Statistics, Sam Houston State University, Huntsville, TX, United States

Let *X* = **X** ∪ **Z** be a data set in ℝ^{D}, where **X** is the training set and **Z** the testing one. Assume that a kernel method produces a dimensionality reduction (DR) mapping 𝔉: **X** → ℝ^{d} (*d* ≪ *D*) that maps the high-dimensional data **X** to its row-dimensional representation **Y** = 𝔉(**X**). The out-of-sample extension of dimensionality reduction problem is to find the dimensionality reduction of *X* using the extension of 𝔉 instead of re-training the whole data set *X*. In this paper, utilizing the framework of reproducing kernel Hilbert space theory, we introduce a least-square approach to extensions of the popular DR mappings called Diffusion maps (Dmaps). We establish a theoretic analysis for the out-of-sample DR Dmaps. This analysis also provides a uniform treatment of many popular out-of-sample algorithms based on kernel methods. We illustrate the validity of the developed out-of-sample DR algorithms in several examples.

## 1. Introduction

Recently, in many scientific and technological areas, we need to analyze and process high-dimensional data, such as speech signals, images and videos, text documents, stock trade records, and others. Due to the curse of dimensionality [1, 2], directly analyzing and processing high-dimensional data are often infeasible. Therefore, *dimensionality reduction* (DR) (see the books [3, 4]) becomes a critical step in high-dimensional data processing. DR maps high-dimensional data into a low-dimensional space so that the data process can be carried out on its low-dimensional representation. There exist many DR methods in literature. The famous linear method is *principle component analysis* (PCA) [5]. However, PCA cannot effectively reduce the dimension for the data set, which essentially resides on a nonlinear manifold. Therefore, to reduce the dimensions of such data sets, people employ non-linear DR methods [6–12], among which, the method of Diffusion Maps (Dmaps) introduced by Coifman and his research group [13, 14] have been proved attractive and effective. Adopting the ideas of the spectral clustering [15, 16] and Laplacian eigenmaps [17], Dmaps integrates them into a more conceptual framework—the geometric harmonics.

As a spectral method, Dmaps employs the diffusion kernel to define the similarity on a given data set **X** ⊂ ℝ^{D}. The principal *d*-dimensional eigenspace (*d* ≪ *D*) of the kernel provides the feature space of **X**, so that a diffusing mapping 𝔉 maps **X** to the set **Y** = 𝔉(**X**), which is called a DR of **X**.

Note that the mapping 𝔉 is constructed by the spectral decomposition of the kernel, which is data-dependent. If the set **X** is enlarged to *X* = **X** ∪ **Z** and we want to make DR of *X* by Dmaps, we have to retrain the set *X* in order to construct a new diffusing mapping. The retraining approach is often unpractical if the cardinality of *X* becomes very large, or the new data set **Z** comes as a time-stream.

Out-of-example DR extension method finds the DR of *X* by extending the diffusing mapping 𝔉 onto *X*. In most cases, we can assume that the new data set **Z** has the similar features as **X**. Therefore, instead of retraining the whole set *X*, we realize the DR of *X* by extending the mapping 𝔉 from **X** to *X* only.

Lots of papers have introduced various out-of-example extension algorithms (see [14, 18, 19] and their references). However, the mathematical analysis on out-of-example extension is not studied sufficiently.

The main purpose of this paper is to give a mathematical analysis on the out-of-sample DR extension of Dmaps. In Wang [20], we preliminarily studied out-of-sample DR extensions for kernel PCA. Since the structure of kernels for Dmaps are different from kernel PCA, it needs a special analysis. In this paper we deal with the DR extensions of Dmaps in the framework of reproducing kernel Hilbert space (RKHS), in which Dmaps extension can be classified as the least square one.

The paper is organized as follows: In section 2, we introduce the general out-of-sample extensions in the RKHS framework. In section 3, we establish the least square out-of-sample DR extensions of Dmaps. In section 4, we give the mathematical analysis and algorithms for the Dmaps DR extension. In the last section, we give several examples for the extension.

## 2. Preliminary

We first introduce some notions and notations. Let μ be a finite (positive) measure on a data set *X* ⊂ ℝ^{D}. We denoted by *L*^{2}(*X*, μ) the (real) Hilbert space on *X*, equipped with the inner product

Then, $\left|\right|f|{|}_{{L}^{2}(X,\mu )}=\sqrt{{\langle f,f\rangle}_{{L}^{2}(X,\mu )}}$. Later, we will abbreviate *L*^{2}(*X*, μ) to *L*^{2}(*X*) (or *L*^{2}) if the measure μ (and the set *X*) is (are) not stressed.

**Definition 1** *A function k*: *X*^{2} → ℝ *is called a Mercer's kernel if it satisfies the following conditions:*

1. *k is symmetric: k*(*x, y*) = *k*(*y, x*)*;*

2. *k is positive semi-definite;*

3. *k is bounded on X*^{2}*, that is, there is an M* > 0 *such that* |*k*(*x, y*)| ≤ *M*, (*x, y*) ∈ *X*^{2}.

In this paper, we only consider Mercer's kernels. Hence, the term *kernel* will stand for Mercer's one. The kernel distance (associated with *k*) between two points *x, y* ∈ *X* is defined by

A kernel *k* defines an RKHS *H*_{k}, in which the inner product satisfies [21]

Later, we will use *H* instead of *H*_{k} if the kernel *k* is not stressed. Recall that *k* has a dual identity. It derives the identity operator on *H*, as shown in 1, and also derives the following compact operator *K* on *L*^{2}(*X*):

In Wang [20], we proved that if

where the set {ϕ_{1}, ··· , ϕ_{m}} is linearly independent, then the set is an o.n. basis of *H*. Therefore, for *f, g* ∈ *H* with $f=\sum _{j}{c}_{j}{\varphi}_{j}$ and $g=\sum _{j}{d}_{j}{\varphi}_{j}$, we have ${\langle f,g\rangle}_{{H}_{k}}=\sum _{j}{c}_{j}{d}_{j}$.

Let the spectral decomposition of *k* be the following:

where the eigenvalues are arranged decreasingly, λ_{1} ≥ ··· ≥ λ_{m} > 0, and the eigenfunctions *v*_{1}, *v*_{1}, ··· , *v*_{m}, are normalized to satisfy

Write ${\gamma}_{i}(x)=\sqrt{{\lambda}_{i}}{v}_{i}(x)$. Then, {γ_{1}, ··· , γ_{m}} is an o.n. basis of *H*, which is called the *canonic basis* of *H*. We also call $k(x,y)=\sum _{j=1}^{m}{\gamma}_{j}(x){\gamma}_{j}(y)$ the *canonic decomposition* of *k*. By 2, we have

Thus, if *f* ∈ *H* have the canonic representation $f=\sum _{j=1}^{m}{c}_{j}{\gamma}_{j}$, then, for any *g* ∈ *H*, the inner product 〈*f, g*〉_{H} has the following integral form:

To investigate the out-of-sample DR extension, we first recall some general results on function extensions. Let *X* = **X** ∪ **Z**. To stress that a point *x* ∈ *X* is also in **X**, we use **x** instead of *x*. Similarly, we denote by **k**(**x, y**) the restriction of *k*(*x, y*) on **X**^{2}. That is,

We also denote by **H** the RKHS associated with **k**. Then a continuous map **E** : **H** → *H* is called an extension if

Correspondingly, we define the restriction *R*: *H* → **H** by

It is obvious that the extensions from **X** to *X* are not unique if **Z** is not empty. So, we define the set of all extensions of **f** ∈ **H** by

and call $\widehat{f}\in {A}_{\text{f}}$ the *least-square extension* of **f** if

It is evident that the least-square extension of a function is unique. We denote by **T** : **H** → *H* the operator of the least-square extension.

In Wang [20], we already prove the following:

1. Let {**v**_{1}, ··· , **v**_{d}} be the canonic basis of **H** and σ_{1} ≥ σ_{2} ≥ ··· ≥ σ_{d} > 0 be the eigenvalues of the kernel **k**(**x, y**). Then the least-square extension of **v**_{j} is

Therefore, for any $\text{f}=\sum _{j=1}^{d}{c}_{j}{\text{v}}_{j}\in \text{H}$,

2. Let Ĥ = **T**(**H**) and **T*** : *H* → **H** be the joint operator of **T**. Then *P* = **TT*** is an orthogonal projection from *H* to Ĥ.

3. Let $\widehat{k}(x,y)$ be the kernel of the RKHS Ĥ. Then ${k}_{0}(x,y)=k(x,y)-\widehat{k}(x,y)$ is a Mercer's kernel such that ${k}_{0}(x,y)=0,(x,y)\in {X}^{2}\backslash {\text{X}}^{2}$. Denote by *H*_{0} the RKHS associated with *k*_{0}. Then, *H* = Ĥ ⊕ *H*_{0} and Ĥ ⊥ *H*_{0}.

4. If *k*(*x, y*) is a Gramian-type DR kernel [20], and ${\left[{\text{v}}_{1}(\text{X}),\xb7\xb7\xb7\phantom{\rule{0.3em}{0ex}},{\text{v}}_{d}(\text{X})\right]}^{T}$ gives the DR of **X**, then ${\left[{\widehat{v}}_{1}(X),\xb7\xb7\xb7\phantom{\rule{0.3em}{0ex}},{\widehat{v}}_{d}(X)\right]}^{T}$ provides the least-square out-of-sample DR extension on *X*.

## 3. Least-Square Out-of-Sample DR Extensions for Dmaps

The kernels of Dmaps are constructed based on the Gaussian kernel

The function

defines a mass density on *X*, and $M={\int}_{X}S(x)d\mu (x)$ is the total mass of *X*.

There are two important forms of the kernels of Dmaps: The *Graph-Laplacian* diffusion kernel and the *Laplace-Beltrami* one.

### 3.1. Dmaps With the Graph-Laplacian Kernel

We first discuss the least-square out-of-sample DR Extensions for the Dmaps with the Graph-Laplacian (GL) kernel. Normalizing the Gaussian kernel by *S*(*x*), we obtain the following Graph-Laplacian diffusion kernel [4, 13]:

This kernel relates to the data set *X* equipped with an undirected (weighted) graph. It is known that 1 is the greatest eigenvalue of *g*(*x, y*) and its corresponding normalized eigenfunction is $\sqrt{\frac{S(x)}{M}}$.

Let *H*_{g} be the RKHS associated with the kernel *g* and {ϕ_{0}, ··· , ϕ_{m}} be its canonic basis, which suggest the following spectral decomposition of *g*(*x, y*):

where 1 = λ_{0} ≥ λ_{1} ≥ ··· ≥ λ_{m} > 0 and ${v}_{j}(x)={\varphi}_{j}(x)/\sqrt{{\lambda}_{j}}$. Because ${\varphi}_{0}=\sqrt{\frac{S(x)}{M}}$ provides only the mass information of the data set, it should not reside on the feature space. Hence, we define the feature space as the RKHS associated with the kernel $\sum _{j=1}^{m}{\varphi}_{j}(x){\varphi}_{j}(y)$, where ϕ_{0} is removed.

**Definition 2**. *The mapping* $\Phi :X\to {\mathbb{R}}^{m}:\Phi (x)={\left[{\varphi}_{1}(x),\xb7\xb7\xb7\phantom{\rule{0.3em}{0ex}},{\varphi}_{m}(x)\right]}^{T}$ *is called the diffusion mapping and the data set* Φ(*X*) ⊂ ℝ^{m} *is called a DR of X*.

**Remark**. In Wang [20], we already pointed out that each orthogonal transformation of the set Φ(*X*) can also be considered as a DR of *X*. Hence, any non-canonical o.n. basis of the feature space also provides a DR mapping.

To study the out-of-sample extension, as what was done in the preceding section, we assume *X* = **X** ∪ **Z** and denote by **g**(**x, y**) the Graph-Laplacian kernel on **X**, that is,

where **S**(**x**) is the mass density on **X**, and

Assume that spectral decomposition of **g** is given by

Then the RKHS *H*_{g} associated with **g** has the canonic basis {φ_{0}, φ_{1}, ··· , φ_{d}}:

where ${\phi}_{j}=\sqrt{{\sigma}_{j}}{\text{v}}_{j}$. Because **S**(**x**) ≠ *S*(**x**), in general,

Hence, we cannot directly apply the extension technique in the preceding section to **g**. Our main purpose in this subsection is to introduce the extension from *H*_{g} to *H*_{g}.

Denote by *H*_{w} and *H*_{w} the RKHSs associated with the kernels *w* and **w**, respectively. Because *w*(**x, y**) = **w**(**x, y**) for (**x, y**) ∈ **X**^{2}, the extension technique in the preceding section can be applied.

Let ${\text{u}}_{j}(\text{x})=\sqrt{\text{S}(\text{x})}{\phi}_{j}(\text{x})$ and ${u}_{j}(x)=\sqrt{S(x)}{\varphi}_{j}(x)$. Then we have

**Lemma 3** *The least-square extension operator* **T** : *H*_{w} → *H*_{w} *has the following representation:*

**Proof**. Because ${\left\{{\text{u}}_{j}\right\}}_{j=0}^{d}$ is not a canonic o.n. basis of *H*_{w}, we cannot directly apply the extension formula 3. Recall that the formula 3 can also be written as **T**(**f**)(*x*) = 〈**f**, *k*(*x*, ·)〉_{H}. (In the considered case, the kernel *w* replaces *k*.) Note that

which implies that, for any **f** ∈ *H*_{w}, we have

Therefore, the formula **T**(**u**_{j})(*x*) = 〈*w*(*x*, ·),_{uj〉Hw} yields 5. ■

We now write û_{j} = **T**(**u**_{j}) and define

Then the RKHS *H*_{ŵ} associated with the kernel ŵ is the extension of *H*_{w}.

The function *S*(*x*) induces the following multiplicator from *H*_{g} to *H*_{w}:

Similarly, the function **S**(**x**) induces the following multiplicator from *H*_{g} to *H*_{w}:

It is clear that the operator 𝔖_{S} (𝔖_{S}) is an isometric mapping. With the aid of 𝔖_{S} and 𝔖_{S}, we define the *least-square extension* ${T}$ from *H*_{g} to *H*_{g} by

The following diagram shows the strategy of the out-of-sample extension using Graph-Laplacian diffusion mapping.

We now derive the integral representation of the operator ${T}$.

**Lemma 4** *Let the canonic decomposition of* **g** *be given by 4 and* $\text{f}=\sum _{j=0}^{d}{c}_{j}{\phi}_{j}\in {H}_{\text{g}}$. *Then*

*Its adjoint operator* ${{T}}^{*}:{H}_{g}\to {H}_{\text{g}}$ *is given by*

**Proof**. Write ${\widehat{\varphi}}_{j}={T}({\phi}_{j})$. By 6, we have ${\widehat{\varphi}}_{j}(x)=\frac{\text{T}({\text{u}}_{j})(x)}{\sqrt{S(x)}}.$ By Lemma 3, we obtain

which yields 7. Recall that $\frac{w(x,\text{y})}{\sqrt{S(x)\text{S}(\text{y})}}=g(x,\text{y})\sqrt{\frac{S(\text{y})}{\text{S}(\text{y})}}$. For any *h* ∈ *H*_{g}, by 〈*h, g*(·, **y**)〉_{Hg} = *h*(**y**), we have

which yields 8. ■

We now give the main theorem in this subsection.

**Theorem 5** *Let* ${T}$ *be the operator defined in 6*. *Define* $\u011d(x,y)=\sum _{j=0}^{d}{\widehat{\varphi}}_{j}(x){\widehat{\varphi}}_{j}(y)$*, where* ${\widehat{\varphi}}_{j}={T}({\phi}_{j})$*, and let H*_{ĝ} *be the RKHS associated with* ĝ. *Then*,

(i) ${{T}}^{*}{T}=I$ *on H*_{g}.

(ii) $\left\{{\widehat{\varphi}}_{0},\xb7\xb7\xb7\phantom{\rule{0.3em}{0ex}},{\widehat{\varphi}}_{d}\right\}$ *is an orthonormal system in H*_{g}*, so that H*_{ĝ} *is a subspace of H*_{g} *and* ${P}={T}{{T}}^{*}$ *is an orthogonal projection from H*_{g} *to H*_{ĝ}. *Therefore, we have* ${P}({\widehat{\varphi}}_{j})={\widehat{\varphi}}_{j}$ *and* ${{T}}^{*}({\widehat{\varphi}}_{j})={\phi}_{j}$.

(iii) *The function g*_{0}(*x, y*) = *g*(*x, y*) − ĝ(*x, y*) *is a Mercer's kernel. The RKHS H*_{g0} *associated with g*_{0} *is* (*m* − *d*) *dimensional. Besides*, *H*_{g} = *H*_{ĝ} ⊕ *H*_{g0} *and H*_{ĝ} ⊥ *H*_{g0}.

(iv) *For any function f* ∈ *H*_{g0}, *f*(**x**) = 0, **x** ∈ **X**.

**Proof**. Recall that {φ_{0}, φ_{1}, ··· , φ_{d}} is an on. basis of *H*_{g}. By 8 and 9, we have

which yields ${{T}}^{*}{T}({\phi}_{j})={\phi}_{j}$. Hence, ${{T}}^{*}{T}=I$ on *H*_{g}. The proof of (i) is completed.

Note that

which indicates that $\left\{{\widehat{\varphi}}_{0}(x),\xb7\xb7\xb7\phantom{\rule{0.3em}{0ex}},{\widehat{\varphi}}_{d}(x)\right\}$ is an orthonormal system in *H*_{g} and *H*_{ĝ} is a subspace of *H*_{g}. Because ${{P}}^{2}={P}$ and ${P}({\widehat{\varphi}}_{j})={\widehat{\varphi}}_{j},j=0,1,\xb7\xb7\xb7\phantom{\rule{0.3em}{0ex}},d$, ${P}$ is an orthogonal projection from ${H}_{\stackrel{~}{g}}$ to *H*_{ĝ}, which proves (ii).

It is clear that (iii) is a direct consequence of (ii). Finally, we have ${P}(f)=0$ for *f* ∈ *H*_{g0}, which yields ${{T}}^{*}(f)=0$. Therefore, *f*(**x**) = 0, **x** ∈ **X**. The proof of (iv) is completed, ■

By Definition 2, the mapping $\Phi :\Phi (\text{x})={\left[{\phi}_{1}(\text{x}),\xb7\xb7\xb7\phantom{\rule{0.3em}{0ex}},{\phi}_{d}(\text{x})\right]}^{T}$ is a diffusion mapping from **X** to ℝ^{d} and the set **Φ**(**X**) is a DR of **X**. We now give the following definition.

**Definition 6** *Let* ${T}$ *be the operator defined in 6 and* ${\widehat{\varphi}}_{j}={T}({\phi}_{j})$. *Then the set* $\widehat{\Phi}(X)={\left[{\widehat{\varphi}}_{1}(X),\xb7\xb7\xb7\phantom{\rule{0.3em}{0ex}},{\widehat{\varphi}}_{d}(X)\right]}^{T}\subset {\mathbb{R}}^{d}$ *is called the least-square out-of-sample DR extension of the Dmaps with the Graph-Laplacian kernel*.

A DR extension on *X* is called *exact* if it is equal to a DR of *X* as defined in Definition 2 (see [20]). The following corollary is a direct consequence of Theorem 5.

**Corollary 7** The least-square out-of-sample DR extension given by ${T}$ from *H*_{g} to *H*_{g} is exact if and only if dim(*H*_{g}) = dim(*H*_{g}), or equivalently, *H*_{g0} = {0}.

### 3.2. Dmaps With the Laplace-Beltrami Kernel

The discussion on the out-of-sample DR extension of Dmaps with the Laplace-Beltrami (BL) kernel is similar to that in the previous subsection. Hence, in this subsection, we only outline the main results, skipping the details. We start the discussion from the asymmetrically normalized kernel

which defines a random walk on the data set *X* such that *m*(*x, y*) is the probability of the walk from the node *x* to the node *y* after a unit time. From the viewpoint of the random walk, we naturally modify the Gaussian kernel *w*(*x, y*) to the following:

Then, we normalize it to

where

We call *b*(*x, y*) the *Laplace-Beltrami* kernel of Dmaps, which relates to the data set *X* sampled from a manifold in ℝ^{D}. The greatest eigenvalue of *b*(*x, y*) is also 1, which corresponds to the normalized eigenfunction $\sqrt{\frac{P(x)}{L}}$, where

Let *H*_{b} be the RKHS associated with *b* and assume that the spectral decomposition of *b* is

where 1 = β_{0} ≥ β_{1} ≥ ··· ≥ β_{m} > 0 and ${\psi}_{j}(x)=\sqrt{{\beta}_{j}}{q}_{j}(x)$. Similar to the discussion in the previous subsection, since ${\psi}_{0}=\sqrt{\frac{P(x)}{L}}$ does not contains any feature of the data set, we exclude it from the feature space.

**Definition 8** *The mapping* $\text{\Psi}:X\to {\mathbb{R}}^{m}:\text{\Psi}(x)={\left[{\psi}_{1}(x),\xb7\xb7\xb7\phantom{\rule{0.3em}{0ex}},{\psi}_{m}(x)\right]}^{T}$ *is called the Laplace-Beltrami diffusion mapping and the data set* Ψ(*X*) ⊂ ℝ^{m} *is called a DR of X associated with Laplace-Beltrami Dmaps*.

We new assume again *X* = **X** ∪ **Z** and denote by **b**(**x, y**) the Laplace-Beltrami kernel on **X**. Assume that spectral decomposition of **b** is

Then the RKHS *H*_{b} associated with **b** has the canonic basis {ω_{0}, ω_{1}, ··· , ω_{d}}, where ${\omega}_{j}=\sqrt{{\gamma}_{j}}{\text{q}}_{j}$.

Define the multiplicator from *H*_{b} to *H*_{w} by

and the multiplicator from *H*_{b} to *H*_{w} by

The operator 𝔖_{R} (𝔖_{R}) is an isometric mapping. We now define the *least-square extension* ${M}$ from *H*_{b} to *H*_{b} by

The integral representation of ${M}$ is give by the following lemma:

**Lemma 9** *Let* {ω_{0}, ω_{1}, ··· , ω_{d}}*be the canonic basis of* **b**. *Write* ${\widehat{\psi}}_{j}={M}({\omega}_{j})$. *Then*

*Particularly, for* $\text{f}=\sum _{j=0}^{d}{c}_{j}{\omega}_{j}\in {H}_{\text{b}}$*, we have*

*Its adjoint operator* ${{M}}^{*}:{H}_{b}\to {H}_{\text{b}}$ *is given by*

Since the proof is similar to that for Lemma 4, we skip it here.

**Theorem 10** *Let* ${M}$ *be the operator defined in 11*. *Define* $\widehat{b}(x,y)=\sum _{j=0}^{d}{\widehat{\psi}}_{j}(x){\widehat{\psi}}_{j}(y)$*, where* ${\widehat{\psi}}_{j}={M}({\omega}_{j})$*, and let* ${H}_{\widehat{b}}$ *be the RKHS associated with* $\widehat{b}$. *Then*,

1. ${{M}}^{*}{M}=I$ *on H*_{b}.

2. $\left\{{\widehat{\psi}}_{0},\xb7\xb7\xb7\phantom{\rule{0.3em}{0ex}},{\widehat{\psi}}_{d}\right\}$ *is an orthonormal system in H*_{b}*, so that* ${H}_{\widehat{b}}$ *is a subspace of H*_{b} *and* ${Q}={M}{{M}}^{*}$ *is an orthogonal projection from H*_{b} *to* ${H}_{\widehat{b}}$. *Therefore, we have* ${Q}({\widehat{\psi}}_{j})={\widehat{\psi}}_{j}$ *and* ${{M}}^{*}({\widehat{\psi}}_{j})={\omega}_{j}$.

3. *The function* ${b}_{0}(x,y)=b(x,y)-\widehat{b}(x,y)$ *is a Mercer's kernel. The RKHS H*_{b0} *associated with b*_{0} *is* (*m*−*d*) *dimensional. Besides*, ${H}_{b}={H}_{\widehat{b}}\oplus {H}_{{b}_{0}}$ *and* ${H}_{\widehat{b}}\perp {H}_{{b}_{0}}$.

4. *For any function f* ∈ *H*_{b0}, *f*(**x**) = 0, **x** ∈ **X**.

We skip the proof of Theorem 10 because it is similar to that for Theorem 5. We now give the following definition:

**Definition 11** *Let* ${M}$ *be the operator defined in 11 and* ${\widehat{\psi}}_{j}={M}({\omega}_{j})$. *Then the set* $\widehat{\text{\Psi}}(X)={\left[{\widehat{\psi}}_{1}(X),\xb7\xb7\xb7\phantom{\rule{0.3em}{0ex}},{\widehat{\psi}}_{d}(X)\right]}^{T}\subset {\mathbb{R}}^{d}$ *is called the least-square out-of-sample DR extension of the Dmaps with the Laplace-Beltrami kernel*.

**Corollary 12** The least-square out-of-sample DR extension given by ${M}$ from *H*_{b} to *H*_{b} is exact if and only if dim(*H*_{b}) = dim(*H*_{b}), or equivalently, *H*_{b0} = {0}.

### 3.3. Algorithms for Out-of-Sample DR Extension of Dmaps

In this subsection, we present the algorithm for out-of-sample DR extension of Dmaps. The algorithm contains two parts. In the first part, we construct the DR for **X** by 4 and 10. In the second part, we extend the DR to the set *X*, by 9 and 12.

In the algorithm, we represent the data sets **X, Z**, and *X* = **X** ∪ **Z** as the *D*×*N*, *D*×*M*, and *D*×(*N*+*M*) matrices, respectively, so that *X* = [**X, Z**]. We assume the measure *dμ*(*x*) = *dx*. Write **X** = [**x**_{1}, ··· , **x**_{N}], **Z** = [**z**_{1}, ··· , **z**_{M}], and *X* = [*x*_{1}, ··· , *x*_{(N+M)}], where *x*_{j} = **x**_{j}, 1 ≤ *j* ≤ *N* and *x*_{j} = **z**_{j−N}, *N* + 1 ≤ *j* ≤ *N* + *M*. Then we represent all kernels by matrices and all functions by vectors. For example, **w** is now represented by the *N* × *N* matrix with ${\text{w}}_{i,j}=exp(-\left|\right|{\text{x}}_{i}-{\text{x}}_{j}|{|}^{2}/\u03f5)$. To treat GL-map and LB-map in a uniform way, we write ${\text{S}}_{i}=\sum _{j}{\text{w}}_{i,j}$ and define

Then we set either kernel on **X** as the *N* × *N* matrix **k** with

The pseudo-code is given in Algorithm 1.

## 4. Illustrative Examples

In this section, we give several illustrative examples to show the validity of the Dmaps out-of-sample extensions. We employ four benchmark artificial data sets, S-curve, Swiss roll, punched sphere, and 3D cluster, in our samples. The graphs of these four data sets are give in Figure 1.

### 4.1. Out-of-Sample Extension by Graph-Laplacian Mapping

We first show the examples for the out-of-sample extensions provided by Graph-Laplacian mapping for the four benchmark figures. We set the size of each of these data sets by |*X*| = 2, 048. When the out-of-example algorithm is applied, we choose the size of the training data set to be |**X**| = 1, 843, which is 90% of the all samples, and choose the size of the testing set |**Z**| = 205, which is 10% of all samples. The parameters for the Graph-Laplacian kernel are set as follows: For obtaining the sparse kernel, we choose 25 nearest neighbors for every node, and assign the diffusion parameter ϵ = 1 for S-curve, Punched Sphere, and 3D Cluster, while assign ϵ = ∞ for Swiss Roll. We compare the DR result of the whole set *X* obtained by out-of-example extension with that obtained without out-of-example extension in the Figures 2–5. The figures show that the DRs obtained by out-of-sample extensions are satisfactory.

### 4.2. Out-of-Sample Extension by Laplace-Beltrami Mapping

We now show the examples for the out-of-sample extensions provided by Laplace-Beltrami mapping for the same four benchmark figures. We set the same sizes for |*X*|,|**X**|, and |**Z**|, respectively. The parameters for the Laplace-Beltrami kernel are also set the same as for Graph-Laplacian kernel. The results of the comparisons are give in Figures 6–9.

To give more detailed comparisons, in Figures 10–13, we show the DRs of the training data and the testing data obtained by out-of-extensions and without extensions, respectively, for LB mapping.

**Figure 11**. Comparisons of DRs of training data and the testing data, respectively, for Punched Sphere.

It is a common sense that if we reduce the size of the training set while increase the size for the testing set, the out-of-sample extension will introduce larger errors for DR. Figures 14–15 show that, in a relative large scope, say, the size of the testing set is no greater than the size of the training set. the out-of-sample extension still produces the acceptable results.

## Data Availability

No datasets were generated or analyzed for this study.

## Author Contributions

The author confirms being the sole contributor of this work and has approved it for publication.

## Conflict of Interest Statement

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

## References

1. Bellman R. *Adaptive Control Processes: A Guided Tour.* Princeton, NJ: Princeton University Press (1961).

2. Scott DW, Thompson JR. Probability density estimation in higher dimensions. In: Gentle JE, editor. *Computer Science and Statistics: Proceedings of the Fifteenth Symposium on the Interface.* Amsterdam; New York, Ny; Oxford: North Holland-Elsevier Science Publishers (1983). p. 173–9.

4. Wang JZ. *Geometric Structure of High-Dimensional Data and Dimensionality Reduction.* Beijing; Berlin; Heidelberg: Higher Educaiton Press; Springer (2012).

5. Jolliffe IT. *Principal Component Analysis*. Springer Series in Statistics. Berlin: Springer-Verlag (1986).

6. Zhang ZY, Zha HY. Principal manifolds and nonlinear dimensionality reduction via local tangent space alignment. *SIAM J Sci Comput.* (2004) **26**:313–38.

7. Schölkopf B, Smola A, Müller K-R. Nonlinear component analysis as a kernel eigenvalue problem. *Neural Comput.* (1998) **10**:1299–319.

8. Roweis ST, Saul LK. Nonlinear dimensionality reduction by locally linear embedding. *Science.* (2000) **290**:2323–6. doi: 10.1126/science.290.5500.2323

9. Donoho DL, Grimes C. Hessian eigenmaps: new locally linear embedding techniques for high-dimensional data. *Proc Natl Acad Sci USA.* (2003) **100**:5591–6. doi: 10.1073/pnas.1031596100

10. Shmueli Y, Sipola T, Shabat G, Averbuch A. Using affinity perturbations to detect web traffic anomalies. In: *The 11th International Conference on Sampling Theory and Applications* (Bremen) (2013).

11. Shmueli Y, Wolf G, Averbuch A. Updating kernel methods in spectral decomposition by affinity perturbations. *Linear Algebra Appl.* (2012) **437**:1356–65. doi: 10.1016/j.laa.2012.04.035

12. Balasubramanian M, Schwaartz E, Tenenbaum J, de Silva V, Langford J. The isomap algorithm and topological staility. *Science* (2002) **295**:7. doi: 10.1126/science.295.5552.7a

13. Coifman RR, Lafon S. Diffusion maps. *Appl Comput Harmon Anal.* (2006) **21**:5–30. doi: 10.1016/j.acha.2006.04.006

14. Coifman RR, Lafon S. Geometric harmonics: a novel tool for multiscale out-of-sample extension of empirical functions. *Appl Comput Harmon Anal.* (2006) **2**:31–52. doi: 10.1016/j.acha.2005.07.005

15. Ng A, Jordan M, Weiss Y. On spectral clustering: analysis and an algorithm. *Adv Neural Inform Process Syst.* (2001) **14**:849–56.

16. Shi J, Malik J. Normalized cuts and image segmentation. *IEEE Trans Pattern Anal Mach Intell.* (2000) **22**:888–905. doi: 10.1109/34.868688

17. Belkin M, Niyogi P. Laplacian eigenmaps for dimensionality reduction and data representation. *Neural Comput.* (2003) **15**:1373–96. doi: 10.1162/089976603321780317

18. Aizenbud Y, Bermanis A, Averbuch A. PCA-based out-of-sample extension for dimensionality reduction. *arXiv: 1511.00831* (2015).

19. Bengio Y, Paiement J, Vincent P, Delalleau O, Le Roux N, Ouimet M. Out-of-sample extensions for LLE, Isomap, MDS, eigenmaps, and spectral clustering. In: Thrun S, Saul L, Schölkopf B, editors. *Advances in Neural Information Processing Systems*. Cambridge, MA: MIT Press (2004).

20. Wang JZ. Mathematical analysis on out-of-sample extensions. *Int J Wavelets Multiresol Inform Process.* (2018) **16**:1850042. doi: 10.1142/S021969131850042X

Keywords: out-of-sample extension, dimensionality reduction, reproducing kernel Hilbert space, least-square method, diffusion maps

AMS Subject Classification: 62-07, 42B35, 47A58, 30C40, 35P15

Citation: Wang J (2019) Least Square Approach to Out-of-Sample Extensions of Diffusion Maps. *Front. Appl. Math. Stat.* 5:24. doi: 10.3389/fams.2019.00024

Received: 02 March 2019; Accepted: 25 April 2019;

Published: 16 May 2019.

Edited by:

Ding-Xuan Zhou, City University of Hong Kong, Hong KongReviewed by:

Bo Zhang, Hong Kong Baptist University, Hong KongShao-Bo Lin, Wenzhou University, China

Sui Tang, Johns Hopkins University, United States

Copyright © 2019 Wang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Jianzhong Wang, jzwang@shsu.edu