Abstract
It has been hypothesized that the ventral stream processing for object recognition is based on a mechanism called cortically local subspace untangling. A mathematical abstraction of object recognition by the visual cortex is how to untangle the manifolds associated with different object categories. Such a manifold untangling problem is closely related to the celebrated kernel trick in metric space. In this paper, we conjecture that there is a more general solution to manifold untangling in the topological space without artificially defining any distance metric. Geometrically, we can either embed a manifold in a higher-dimensional space to promote selectivity or flatten a manifold to promote tolerance. General strategies of both global manifold embedding and local manifold flattening are presented and connected with existing work on the untangling of image, audio, and language data. We also discuss the implications of untangling the manifold into motor control and internal representations.
1. Introduction
Is dimensionality a curse or a blessing? The term “curse of dimensionality” was coined by Richard Bellman when studying dynamical programming in the 1960s (Bellman, ). It refers to various phenomena that arise from the analysis and organization of data in high-dimensional spaces. Specifically, all objects tend to become sparse and dissimilar in many ways as the dimensionality increases, which prevents common data organization strategies from being efficient. To overcome such a curse of dimensionality, various non-linear dimensionality reduction techniques such as IsoMAP (Tenenbaum et al., 2000) and locally linear embedding (LLE) (Roweis and Saul, 2000) have been developed to reveal the low-dimensional structure embedded in high-dimensional observation data.
The blessing of dimensionality (Donoho, ) is a more counter-intuitive concept. To illustrate this concept, we start by considering a classical toy example of XOR decision for the linear perceptron (Rosenblatt, 1958). There is no 2D linear classifier that can separate the two different classes of XOR decision. However, with an additional dimension z = x ⊕ y, it is straightforward to linearly separate two classes in a 3D space (x, y, z) (e.g., hyperplane will do). Another example of so-called two-circle data consisting of two concentric circles, each representing a different class. Again, there exists no linearly classifier that can separate red from blue in 2D; while linear separability can be easily satisfied in 3D by taking into account the third and redundant dimension into account.
We note that the issue of dimensionality is often tangled with that of linearity. For example, Kernel trick (Schölkopf, 2000) in support vector machine (SVM), which allows linear learning algorithms to learn a non-linear function or decision boundary, can be interpreted as a special class of techniques exploiting the blessing of dimensionality. In face verification (Chen et al., ), linear feature dimension as large as 100K has been reported to improve performance due to the blessing of dimensionality. More recently, the class of convolutional neural networks, equipped with non-linear rectifying linear units (ReLU), has shown excellent performance in various vision tasks from image classification to object recognition. Between non-linearity and dimensionality, which plays a more fundamental role?
In this paper, we advocate for the blessing of dimensionality from a manifold untangling perspective (Chung and Abbott, ). The problem of manifold untangling (a.k.a. disentanglement, Brahma et al., ) can be formulated as an extension of the manifold embedding and knotting problem (Skopenkov, 2008) in differential topology. Originating from Whitney's original work in 1930 (Whitney, 1936), blessing-of-dimensionality related results include embedding of the n-manifold in R2n and unknotting in R2n+1 (Wu, 2008). These classical results in the theory of differential topology inspire us to tackle the problem of manifold untangling by iteratively constructing overparameterized direct-fit models (Hasson et al., ) in a higher-dimensional space. The main contributions of this paper are summarized below.
Manifold untangling without a distance metric. In topological space, we show how to improve the manifold capacity by a unified untangling approach.
Two general strategies for untangling manifolds: global embedding vs. local flattening. We show how embedding and flattening jointly improve manifold capacity by promoting selectivity and tolerance.
Model-agnostic for multimodal data. We apply the theory of manifold untangling to several recent works on multiview image recognition, invariant audio recognition, and perceptual video straightening.
Biological connection with the hypothesis of cortically local subspace untangling in ventral stream processing and trajectory untangling in motor control.
2. Manifold untangling: what and why?
2.1. Problem formulation
The problem of manifold untangling originated from the modeling of ventral stream processing in neuroscience (DiCarlo and Cox, ) (see Figure 1). To explain how object recognition works, a major challenge is the form of high-dimensional visual representations. An object manifold (e.g., the image projected onto the retina) is characterized by variations of its pose, position, and size, which can be mathematically abstracted as a low-dimensional curved surface inside the retinal image space. It follows that different objects, such as varying face identities, correspond to different manifolds. The term “object manifold” specifically refers to low-dimensional subspaces underlying population activities embedded in high-dimensional neural state space according to Chung and Abbott (). The manifolds embedded in the ambient neural state space (called the neural population geometry in Chung and Abbott, ) include both sensory/motor and cognitive regions of the brain.
Figure 1
To illustrate the problem of manifold untangling more vividly, we can use an analogy with tangled shoelaces in our familiar 3D Euclidean space. The task of object recognition is analogous to untangle these shoelaces but in a higher-dimensional space of visual representations. In the literature, manifold untangling (a.k.a. disentanglement, Brahma et al.,
2.2. Motivation: topological space does not require a distance metric
One of the long-standing open problems in manifold discovery is how to calculate the geodesic distance between two points on a manifold. Unlike the Euclidean distance, the geodesic distance is intrinsically tangled with the locally curved low-dimensional geometry of the manifold. Without knowledge of local geometry, calculating the geodesic distance or building a kernel becomes a tangled problem like manifold learning (Ma and Fu,
We argue that the answer is affirmative. Our basic intuition is based on the observation that it is easier to untangle a manifold in a higher-dimensional space (Fusi et al.,
To quantify the effectiveness of manifold untangling, the manifold capacity (Chung et al.,
3. Manifold embedding and flattening
3.1. Manifold embedding and unknotting theory
Theorem 1. Whitney Embedding Theorem (1936).
Any smooth manifold M of dimension m ≥ 2 can be embedded into R2m+1.
In 1958, W.T. Wu proved that every connected n-manifold unknots in R2n+1 for n > 1 (Wu, 2008). The theory of differential manifold was extended into surgery theory by J. Milnor in the 1960s, which became a major tool in high-dimensional topology. An important class of smoothing manifolds was to use obstruction theories (Hirsch,
The intuition that higher-dimensional space facilitates the task of manifold untangling has not been well-documented in the literature. The closest result seems to be (Tauro et al., 2014). To shed some insight to the blessing of dimensionality, we have conducted a simple experiment with the synthetic two-moon data (see Figure 2A). It is easy to observe that these data are not linearly separable in R2; however, we have verified that after locally linear embedding (LLE) (Roweis and Saul, 2000), a linear dichotomy exists, as shown in Figure 2B. Note that unlike kernel trick in support vector machine, we do not resort to non-linearity but the blessing of dimensionality for a data representation that is less tangled.
Figure 2

Blessing of dimensionality. (A) Two-moon data are not linearly separable in R2; (B) t-SNE visualization of the LLE embedding in R4. Note that two-moon data becomes linearly separable after embedding in a higher-dimensional space R4 through locally linear embedding (LLE) (Roweis and Saul, 2000).
Based on the above line of reasoning, the basic ideas behind our approach to maximize the manifold capacity in a higher-dimensional space are as follows. On the one hand, we want to increase the number of distinct manifolds by promoting the selectivity of data representations (i.e., pushing more manifolds away from each other). This objective can be achieved by embedding the manifold into a higher-dimensional space using the generalized kernel trick such as LLE or IsoMAP (Tenenbaum et al., 2000) (note that we use them in the opposite direction to non-linear dimensionality reduction—i.e., as the tools of non-linear dimensionality increase). On the other hand, we want to increase the number of separable dichotomies by promoting tolerance of data representations. This is aligned with the idea of manifold flattening by constructing identity-preserving transformations (DiCarlo et al.,
3.2. Global manifold embedding
At the global level (i.e., working with the entire manifold as a whole), there are two broad classes of manifold embedding techniques: kernel methods and sparse coding. Both of them can re-represent input data in a higher-dimensional space to facilitate the task of manifold untangling.
3.2.1. Recursive and generalized kernel methods
A well-known method, named the kernel trick, is to generalize distance-based algorithms to operate in the feature space (Schölkopf, 2000). The key idea is to construct a non-linear mapping function ϕ : X → Y where x ∈ X and ϕ(x) ∈ Y denote the input and feature spaces, respectively. Then, the kernel trick is implemented by the dot product in the feature space, i.e., k(x, x′) =< ϕ(x), ϕ(x′) >. For the class of positive definite kernels, rigorous results, such as Mercer's theorem (Vapnik, 1999) guarantees the generalization of distance metric for a wide range of kernel constructions (e.g., radial basis function and neural tangent kernel). As a concrete example, Figure 3 illustrates the idea behind the kernel trick for a toy example of separating points within a circle from those outside.
Figure 3

Kernel trick in the inner product space (left: input space, right: feature space). The kernel is given by ϕ((a, b)) = (a, b, a2 + b2) and K(x, y) = x · y+ ∥ x ∥2 + ∥ y ∥2. Training points are mapped to a 3-dimensional space, where a separate hyperplane can be easily found.
The effectiveness of the kernel trick is often attributed to its non-linearity related to the input space. However, dealing with non-linearity is always challenging—e.g., despite the conceptual simplicity of the kernel trick, it is often much more difficult to reason with the optimality of different approaches to kernel construction. More importantly, as shown in Figure 3, the blessing of dimensionality offers a refreshing perspective to understand the kernel trick. The new dimension introduced by the kernel geometrically warps the data points in such a way that they can be more easily separated by a linear classifier. Such a simple observation inspires us to tackle the manifold untangling by recursively applying the kernel trick.
More specifically, we propose to generalize the non-linear mapping function ϕ : Xn → Xn+1, n ∈ N, where xn ∈ Xn and ϕn(xn) ∈ Xn+1, dim(Xn+1) > dim(Xn) denote the input and output spaces in the n-th layer, respectively. Our intuition is that manifold untangling is closely related to the approximation by non-linear sigmoid functions (Cybenko,
Theorem 2. Universal Approximation Theorem.
For any continuous function f(x) and sigmoidal function σ, there exists a universal approximation by such that |f(x) − g(x) < ϵ| for all x ∈ In, where In denotes an n-dimensional unit cube.
The approximation result above can be interpreted as the untangling of the non-linear function f(x) by successive concatenation of N sigmoid unit in a single hidden layer. Each unit partially untangles the non-linear function until the input function is straightened into a linear one. Connecting this result with our manifold untangling intuition, we can interpret multilayer feedforward networks as universal approximators (Hornik et al.,
3.2.2. Hierarchical sparse coding
The equivalence relationship between the kernel method in a support vector machine (SVM) (Bartlett and Shawe-Taylor,
More rigorously, we consider the class of hierarchical and redundant sparse representations [e.g., steerable pyramids (Simoncelli and Freeman, 1995) and overcomplete dictionaries (Olshausen and Field,
Under the framework of manifold untangling, we claim that hierarchical sparse coding increases the number of manifolds (manifold capacity) while keeping the feature dimension (N) constant. In view of the lack of a rigorous definition of manifold capacity in the literature, we resort to a closely-related concept (the capacity of associative memory) in our analysis. A mathematical analysis of why sparse coding increases the capacity of associative memory can be found in Okada (
To show how improved sparsity increases the capacity of associative memory, we consider a non-holographic associative memory model in Willshaw et al. (1969) which consists of NA × NB grid points on a square lattice. Let and denote the ratio of active grid points responsible for the associative recall of R cross-link patterns. Then, the memory capacity of such an associative network is given by
where Nc = NA × NB and the collision probability p can be calculated by
It is easy to observe that to maintain a low collision probability p, both rA and rB need to be small, implying a small percentage of active grid points along the horizontal and vertical directions. The improvement in sparsity in the representation of the data helps reduce the probability of collision (less crosstalk) (Olshausen and Field,
3.3. Local manifold flattening
At the local level (i.e., dealing with the local geometry of a manifold), we can smooth either the rugged surface underlying the data observations or the curved decision boundaries separating different classes.
3.3.1. Identity-preserving transformations
The other important new insight deals with the discovery of local geometry on a manifold to promote tolerance within the same class/identity. The importance of tolerance to object recognition can be mathematically justified by flattening the manifold with identity-preserving transformations (see Figure 2B in DiCarlo et al.,
The manifold untangling framework offers a refreshing perspective on the well-studied binding problem (Treisman, 1996). After manifold flattening, each untangled subspace is characterized by the neural population geometry, whose representation simultaneously conveys explicit information about not only object identity but also tangled subspace attributes such as position, size, pose, and context. Even when multiple objects are present, one can imagine that identity-preserving transformations can flatten their corresponding manifolds to improve the manifold capacity. There is no need to rebind those subspace attributes because they are implicitly embedded into identity-preserving transformations.
To better illustrate the concept of manifold flattening, we can think of the three pairs of legs in jacks as an analogy to the identity, position, and scale subspaces. Mathematically, these jacks can be interpreted as a 1D manifold embedded into a 3D Euclidean space. The problem of packing object manifolds is challenging because the legs of those jacks interfere with each other. Identity-preserving transformations facilitate the packing task by flattening the two subspaces of position and scale (we will discuss the biological implementation of this strategy later). In the transformed space after manifold untangling (i.e., conditioned on the knowledge about the position and scale), the jacks are flattened to ellipsoids suitable for packing or linear separation.
3.3.2. Decision boundary smoothing
An alternative approach to achieve the objective of local manifold flattening is via smoothing the decision boundary among different classes/identities. Along this line of reasoning, several closely related ideas have recently been proposed such as manifold mixing (Verma et al., 2019), manifold charting (Mangla et al.,
The objective of manifold flattening is to reduce the number of directions with significant variance (refer to Figure 2B). Following the notation in Verma et al. (2019), we use to denote input space, representation space, and output space, respectively. The representation space can be the hidden states of DNN or support vectors of SVM or sparse coefficients in hierarchical sparse coding. We can obtain the following theoretical result.
Theorem 3. Manifold Flattening Theorem.
Let be a space of dimension , and let d represent the number of classes/identities in the dataset. If , then there exists a linear function/dichotomy that can separate the d different classes.
The proof of the above result for the hidden state of the DNN representations can be found in Verma et al. (2019). Generally speaking, if the dimensionality of the representation is greater than the number of classes d, then the resulting representations for that class will fall into a subspace of dimension .
It is enlightening to compare the boundary smoothing strategy of decision with that of identity-preserving transformations. The former improves the performance of the classifier in the presence of distribution shifts, outliers, and adversarial examples with few-shot learning constraint (i.e., it does not require much training data). The latter requires more training data to achieve the desired objective of X-invariant recognition (X refers to environmental uncertainty factor) by learning identity-preserving transformations. These two approaches are complementary to each other because they flatten the manifold from different (inter-class vs. intra-class) perspectives.
4. Model-agnostic manifold untangling
4.1. Multi-view visual object recognition
Visual object recognition has been extensively studied by the computer vision community (Zhang et al., 2013; Bakry and Elgammal,
A fundamental weakness of those conventional approaches is their lack of generalization property. It is often assumed as a priori that the topology of the viewpoint manifold of individual objects is known. The derived manifold untangling solution easily breaks down when such an assumption becomes invalid (e.g., due to the tangling of other uncertainty factors such as scale, illumination, and clutter, Johnson and Hebert,
This work offers attractive alternative solutions to multiview visual object recognition. In several challenging datasets with the presence of pose and expression variations, it has been shown in Chen et al. (
Identity-preserving transformations are often applied to generalize the performance of deep learning models to previously unseen data (Connor et al.,
A closely related idea to manifold untangling is the learning of disentangled representations. For example, the GAN for disentangled representation learning (DR-GAN) (Tran et al., 2017) can take one or multiple images as input and explicitly output the pose code along with an arbitrary number of synthetic images. Such a GAN-based deep-generative model cleverly combines the pose code in the generator and the pose estimation in the discriminator into a closed loop. It can be interpreted as achieving tolerance by simultaneously resolving the uncertainty of identity and pose. It is mathematically equivalent to the maximum a posterior (MAP) estimation in the joint space of object identity and identity-preserving transformations (refer to Figure 4D in DiCarlo et al.,
4.2. Invariant speech and language recognition
Unlike image data, speech signals are characterized by dynamic patterns in the temporal domain. Since language is unique to humans, language models serve as a strong supervisor in speech recognition. From words and phrases to paragraphs and part-of-speech, the principle of hierarchical organization has been widely studied in natural language processing. Computational maps in the auditory cortex share an organizational principle similar to that in the visual cortex (Krumhansl,
Compared to images, speech and language data are arguably less tangled due to the varying physical origin. From a manifold untangling perspective, embedding plays a more important role than flattening for speech and language data than for images. This difference is supported by the popularity of word embedding models [e.g., word2vec (Goldberg and Levy,
4.3. Perceptual straightening of video data
By contrast, video data has been much less studied than image or speech. Depending on the definition of object category, we can revisit several classical video processing tasks from a manifold untangling perspective. First, the class of natural video defines a manifold that is related to visual quality. The amount of perturbation (e.g., jittering artifacts) from the manifold of natural video is often correlated with the degradation of visual quality. One of recent works (Hénaff et al.,
Second, the concept of probabilistic appearance manifold has been introduced for video-based face recognition (FR) (Lee et al.,
Third, a dual problem with image-based object recognition is dynamic scene classification (Theriault et al., 2013) where the object category is semantically defined by the scene of video data. Learning the slowest feature with slow feature analysis (SFA) (Wiskott and Sejnowski, 2002), one can untangle the classes for different semantic categories. The key idea behind SFA is to learn invariant representations from transformation sequences, which is closely related to Laplacian eigenmaps (Sprekeler, 2011). From the perspective of manifold untangling, SFA can be interpreted as an alternative to selectivity and tolerance to learning invariance (Franzius et al.,
5. Biological connections with sensory processing, motor control, and binding problem
5.1. Cortically local subspace untangling in ventral stream
How is manifold untangling achieved by the ventral stream of the visual cortex? In DiCarlo et al. (
In the hierarchical HMAX model for object recognition (Riesenhuber and Poggio, 1999), two classes of cells (simple vs. complex) are responsible for selectivity and tolerance operations, respectively. There exists a canonical circuit to model simple and complex cells in V1 (Kouh and Poggio,
The temporal continuity hypothesis states that “input patterns that occur close together in time tend to lead to similar output responses” (DiCarlo et al.,
5.2. Trajectory untangling in motor control
J. Gibson says that “we move because we see; we see because we move.” The dual view toward perception and motion inspires us to consider the problem of manifold untangling for the motor cortex as the dual for the visual cortex. In Russo et al. (2018) it has been observed that, unlike muscle activity, neural activity is structured in such a way as to avoid tangling, that is, similar neural activity patterns lead to dissimilar action patterns in the future (an object action-related counterpart of object recognition). How does the motor cortex encode muscle-like commands? Hypothesis about encoding of movement velocity or direction exists in the literature (e.g., Gallego et al.,
Based on the premise that the present network state strongly influences the future state, we conjecture that the objective of trajectory untangling is also recursively (although via hierarchical timescale instead of spatial scales) achieved by the motor cortex. Conceptually similar to the tangling in object recognition, the principle of trajectory untangling implies that two similar patterns of neural activity, observed as different moments, should not produce highly dissimilar action patterns in the near future. Violation of such principle often leads to trajectory tangling, a potential instability in the network dynamics of motor control. A key finding from the cycling experiment from Russo et al. (2018) is that “muscle-like signals are present, but are relatively modest ‘ripples that ride on top of larger signals that confer minimal tangling.”
The perspective of trajectory untangling is consistent with the closed-loop theory of motor learning (Adams,
5.3. From perceptual untangling to internal representation
According to Helmholtz (Lee,
Figure 4

Long-range architecture of the cortex (cited from Larkum,
Thalamo-cortical interaction must occur simultaneously in both feed-forward and feedback streams to support the predictive coding hypothesis in the visual cortex (Rao and Ballard, 1999). A feedforward visual stream transmits external stimuli information to higher cortical areas through manifold untangling; pyramidal neurons act as associative elements that detect coincidences between present stimuli and experience (internal representation). Then, the feedback stream serves as the prediction coding scheme (Rao and Ballard, 1999) of the cortex that determines the firing of pyramidal neurons. Given that 90% of the synaptic input to layer-1 (L1) are from long-range feedback connections, the backpropagation-activated coupling (BAC) (Larkum,
The bridging of feed-forward and feedback streams is consistent with the new perspective of how the binding problem was solved by base grouping (feed-forward processing) and incremental grouping (feedback connection) (Roelfsema, 2023). It was argued that the distribution of visual attention is largely determined by motor control or action planning. More specifically, the process of selecting objects for perceptual processing and object recognition is coupled with that of providing the information necessary for motor action through a single attentional mechanism (Deubel and Schneider,
Finally, hippocampus, seated on the top of neocortical pyramid, is responsible for storing memories of specific events and places. It plays a key role in constructing an internal representation of the external world, which involves integrating information from different sensory modalities and binding them into a coherent memory. The dentate gyrus (DG), a subregion of the hippocampus, interacts with the other subregions of the hippocampus (e.g., including the CA1 and CA3 regions) to form a functional network that is critical for memory processing and retrieval. In feed-forward processing, the entorhinal cortex sends sensory information from the neocortex to the dentate gyrus, which then processes and integrates the information with other sensory inputs in the hippocampus. Manifold unfolding is implemented by DG which performs the decorrelation and sparsification of input signals by projecting to higher-dimensional space. In feedback processing, manifold projection simply projects the stored information back to the neocortical regions, which is consistent with hippcampal index theory (Teyler and DiScenna, 1986).
6. Conclusions
It has been hypothesized that through neuronal population dynamics, the neocortex solves the problem of object recognition via perceptual untangling. We formulate the problem of manifold untangling as an abstraction of object recognition in this paper. Two complementary approaches to untangle an object manifold are presented: embedding (selectivity-promoting) and flattening (tolerance-promoting). We have discussed two classes of embedding strategies (generalized kernel method and hierarchical sparse coding) as well as flattening strategies (identity-preserving transformation and decision boundary smoothing). Under the framework of manifold unfolding, we present a unified interpretation of multiview image recognition, invariant audio/language recognition, and perceptual straightening of video. Finally, the theory of manifold unfolding is connected with the literature of neuroscience, which demonstrates the biologically plausible implementation of perceptual untangling.
Future works require the development of experimentally or computationally testable hypotheses or models built upon the theory of manifold untangling. Deep neural networks have shown to demonstrate some interesting manifold disentangling properties in Brahma et al. (
Statements
Data availability statement
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.
Author contributions
Both authors listed have made a substantial, direct, and intellectual contribution to the work and approved it for publication.
Funding
This research was supported by the AFOSR (FA9550-21-1-0088), NSF (BCS-1945230 and IIS-2114644), and NIH (R01MH129426).
Acknowledgments
The authors thank reviewers for constructive comments that help improve the presentation of this paper.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
1
AdamsJ. A. (1971). A closed-loop theory of motor learning. J. Motor Behav.3, 111–150.
2
AhonenT.HadidA.PietikäinenM. (2004). “Face recognition with local binary patterns,” in European Conference on Computer Vision (Prague: Springer), 469–481.
3
AmodeiD.AnanthanarayananS.AnubhaiR.BaiJ.BattenbergE.CaseC.et al. (2016). “Deep speech 2: -to-end speech recognition in English and Mandarin,” in International Conference on Machine Learning (New York, NY: PMLR), 173–182.
4
BakryA.ElgammalA. (2014). “Untangling object-view manifold for multiview recognition and pose estimation,” in European Conference on Computer Vision (Zurich: Springer), 434–449.
5
BarlowH. (2001). Redundancy reduction revisited. Netw. Comput. Neural Syst.12, 241.
6
BartlettP.Shawe-TaylorJ. (1999). “Generalization performance of support vector machines and other pattern classifiers,” in Advances in Kernel Methods–Support Vector Learning, eds SchölkopfB.BurgesC. J. C., 43–54.
7
BellmanR. (1966). Dynamic programming. Science153, 34–37.
8
BrahmaP. P.WuD.SheY. (2015). Why deep learning works: a manifold disentanglement perspective. IEEE Trans. Neural Netw. Learn. Syst.27, 1997–2008. 10.1109/TNNLS.2015.2496947
9
ChenD.CaoX.WenF.SunJ. (2013). “Blessing of dimensionality: high-dimensional feature and its efficient compression for face verification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (Portland, OR), 3025–3032.
10
ChungS.AbbottL. (2021). Neural population geometry: an approach for understanding biological and artificial neural networks. Curr. Opin. Neurobiol.70, 137–144. 10.1016/j.conb.2021.10.010
11
ChungS.LeeD. D.SompolinskyH. (2018). Classification and geometry of general perceptual manifolds. Phys. Rev. X8, 031003. 10.1103/PhysRevX.8.031003
12
CohenU.ChungS.LeeD. D.SompolinskyH. (2020). Separability and geometry of object manifolds in deep neural networks. Nat. Commun.11, 1–13. 10.1038/s41467-020-14578-5
13
ConnorM.FallahK.RozellC. (2021). Learning identity-preserving transformations on data manifolds. arXiv preprint arXiv:2106.12096. 10.48550/arXiv.2106.12096
14
CrowellR. H.FoxR. H. (2012). Introduction to Knot Theory, Vol. 57. New York, NY: Springer Science & Business Media.
15
CybenkoG. (1989). Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst.2, 303–314.
16
DeubelH.SchneiderW. X. (1996). Saccade target selection and object recognition: evidence for a common attentional mechanism. Vision Res.36, 1827–1837.
17
DiCarloJ. J.CoxD. D. (2007). Untangling invariant object recognition. Trends Cogn. Sci.11, 333–341. 10.1016/j.tics.2007.06.010
18
DiCarloJ. J.ZoccolanD.RustN. C. (2012). How does the brain solve visual object recognition?Neuron73, 415–434. 10.1016/j.neuron.2012.01.010
19
DonohoD. L. (2000). High-dimensional data analysis: the curses and blessings of dimensionality. AMS Math Challenges Lecture1, 32.
20
DuS. S.ZhaiX.PoczosB.SinghA. (2019). “Gradient descent provably optimizes over-parameterized neural networks,” in International Conference on Learning Representations (ICLR) (New Orleans).
21
EdelmanG. M. (1993). Neural Darwinism: selection and reentrant signaling in higher brain function. Neuron10, 115–125.
22
FranziusM.WilbertN.WiskottL. (2008). “Invariant object recognition with slow feature analysis,” in International Conference on Artificial Neural Networks (Prague: Springer), 961–970.
23
FusiS.MillerE. K.RigottiM. (2016). Why neurons mix: high dimensionality for higher cognition. Curr. Opin. Neurobiol.37, 66–74. 10.1016/j.conb.2016.01.010
24
GallegoJ. A.PerichM. G.MillerL. E.SollaS. A. (2017). Neural manifolds for the control of movement. Neuron94, 978–984. 10.1016/j.neuron.2017.05.025
25
GirosiF. (1998). An equivalence between sparse approximation and support vector machines. Neural Comput.10, 1455–1480.
26
GoldbergY.LevyO. (2014). word2vec explained: deriving mikolov et al.'s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722. 10.48550/arXiv.1402.3722
27
HassonU.NastaseS. A.GoldsteinA. (2020). Direct fit to nature: an evolutionary perspective on biological and artificial neural networks. Neuron105, 416–434. 10.1016/j.neuron.2019.12.002
28
HatcherA. (2005). Algebraic Topology. Cambridge: Cambridge University Press.
29
HénaffO. J.GorisR. L.SimoncelliE. P. (2019). Perceptual straightening of natural videos. Nat. Neurosci.22, 984–991. 10.1038/s41593-019-0377-4
30
HirschM. W. (1963). Obstruction theories for smoothing manifolds and maps. Bull. Am. Math. Soc.69, 352–356.
31
HoranD.RichardsonE.WeissY. (2021). “When is unsupervised disentanglement possible?”Advances in Neural Information Processing Systems, 5150–5161.
32
HornikK.StinchcombeM.WhiteH. (1989). Multilayer feedforward networks are universal approximators. Neural Netw.2, 359–366.
33
JohnsonA. E.HebertM. (1999). Using spin images for efficient object recognition in cluttered 3d scenes. IEEE Trans. Pattern Anal. Mach. Intell.21, 433–449.
34
KeeminkS. W.MachensC. K. (2019). Decoding and encoding (de) mixed population responses. Curr. Opin. Neurobiol.58, 112–121. 10.1016/j.conb.2019.09.004
35
KellA. J.YaminsD. L.ShookE. N.Norman-HaignereS. V.McDermottJ. H. (2018). A task-optimized neural network replicates human auditory behavior, predicts brain responses, and reveals a cortical processing hierarchy. Neuron98, 630–644. 10.1016/j.neuron.2018.03.044
36
KobakD.BrendelW.ConstantinidisC.FeiersteinC. E.KepecsA.MainenZ. F.et al. (2016). Demixed principal component analysis of neural population data. elife5, e10989. 10.7554/eLife.10989.022
37
KouhM.PoggioT. (2008). A canonical neural circuit for cortical nonlinear operations. Neural Comput.20, 1427–1451. 10.1162/neco.2008.02-07-466
38
KrumhanslC. L. (2001). Cognitive Foundations of Musical Pitch, Vol. 17. Oxford: Oxford University Press.
39
LangdonC.GenkinM.EngelT. A. (2023). A unifying perspective on neural manifolds and circuits for cognition. Nat. Rev. Neurosci. 1–15. 10.1038/s41583-023-00693-x
40
LarkumM. (2013). A cellular mechanism for cortical associations: an organizing principle for the cerebral cortex. Trends Neurosci.36, 141–151. 10.1016/j.tins.2012.11.006
41
LeeK.-C.HoJ.YangM.-H.KriegmanD. (2003). “Video-based face recognition using probabilistic appearance manifolds,” in 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Madison: IEEE).
42
LeeT. S. (2015). The visual system's internal model of the world. Proc. IEEE103, 1359–1378. 10.1109/JPROC.2015.2434601
43
LiuC.WechslerH. (2002). Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition. IEEE Trans. Image Process.11, 467–476. 10.1109/TIP.2002.999679
44
MaY.FuY. (2012). Manifold Learning Theory and Applications, Vol. 434. Boca Raton, FL: CRC Press.
45
MamouJ.LeH.Del RioM.StephensonC.TangH.KimY.ChungS. (2020). “Emergence of separable manifolds in deep language representations,” in International Conference on Machine Learning (PMLR), 6713–6723.
46
ManglaP.KumariN.SinhaA.SinghM.KrishnamurthyB.BalasubramanianV. N. (2020). “Charting the right manifold: Manifold mixup for few-shot learning,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (Snowmass Village, CO), 2218–2227.
47
MattarA. A.GribbleP. L. (2005). Motor learning by observing. Neuron46, 153–160. 10.1016/j.neuron.2005.02.009
48
MoczV.Vaziri-PashkamM.ChunM. M.XuY. (2021). Predicting identity-preserving object transformations across the human ventral visual stream. J. Neurosci.41, 7403–7419. 10.1523/JNEUROSCI.2137-20.2021
49
OkadaM. (1996). Notions of associative memory and sparse coding. Neural Netw.9, 1429–1458. 10.1016/j.conb.2004.07.007
50
OlshausenB. A.FieldD. J. (1997). Sparse coding with an overcomplete basis set: a strategy employed by v1?Vis. Res.37, 3311–3325.
51
OlshausenB. A.FieldD. J. (2004). Sparse coding of sensory inputs. Curr. Opin. Neurobiol.14, 481–487.
52
PaganM.UrbanL. S.WohlM. P.RustN. C. (2013). Signals in inferotemporal and perirhinal cortex suggest an untangling of visual target information. Nat. Neurosci.16, 1132–1139. 10.1038/nn.3433
53
PalafoxP.BožičA.ThiesJ.NießnerM.DaiA. (2021). “NPMS: Neural parametric models for 3d deformable shapes,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 12695–12705.
54
PenningtonJ.SocherR.ManningC. D. (2014). “Glove: global vectors for word representation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (Doha), 1532–1543.
55
RaoR. P.BallardD. H. (1999). Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat. Neurosci.2, 79–87.
56
RiesenhuberM.PoggioT. (1999). Hierarchical models of object recognition in cortex. Nat. Neurosci.2, 1019–1025.
57
Roberto e SouzaM.MaiaH. A.PedriniH. (2022). Survey on digital video stabilization: concepts, methods, and challenges. ACM Comput. Surv.55, 1–37. 10.1145/3494525
58
RodríguezP.LaradjiI.DrouinA.LacosteA. (2020). “Embedding propagation: smoother manifold for few-shot classification,” in European Conference on Computer Vision (Springer), 121–138.
59
RoelfsemaP. R. (2023). Solving the binding problem: assemblies form when neurons enhance their firing rate–they don't need to oscillate or synchronize. Neuron111, 1003–1019. 10.1016/j.neuron.2023.03.016
60
RosenblattF. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev.65, 386.
61
RoweisS. T.SaulL. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science290, 2323–2326. 10.1126/science.290.5500.2323
62
RussoA. A.BittnerS. R.PerkinsS. M.SeelyJ. S.LondonB. M.LaraA. H.et al. (2018). Motor cortex embeds muscle-like commands in an untangled population response. Neuron97, 953–966. 10.1016/j.neuron.2018.01.004
63
SchölkopfB. (2000). “The kernel trick for distances,”Advances in Neural Information Processing Systems 13, eds LeenT.DietterichT.TrespV..
64
SerreT.WolfL.BileschiS.RiesenhuberM.PoggioT. (2007). Robust object recognition with cortex-like mechanisms. IEEE Trans. Pattern Anal. Mach. Intell.29, 411–426. 10.1109/TPAMI.2007.56
65
ShenY.YangC.TangX.ZhouB. (2020). InterfaceGAN: interpreting the disentangled face representation learned by GANs. IEEE Trans. Patt. Anal. Mach. Intell. 44.
66
SimoncelliE. P.FreemanW. T. (1995). “The steerable pyramid: a flexible architecture for multi-scale derivative computation,” in Proceedings International Conference on Image Processing (Washington, DC: IEEE), 444–447.
67
SkopenkovA. B. (2008). Embedding and knotting of manifolds in Euclidean spaces. arXiv preprint arXiv:math/0604045. 10.48550/arXiv.math/0604045
68
SprekelerH. (2011). On the relation of slow feature analysis and Laplacian eigenmaps. Neural Comput.23, 3287–3302. 10.1162/NECO_a_00214
69
StefanK.CohenL. G.DuqueJ.MazzocchioR.CelnikP.SawakiL.et al. (2005). Formation of a motor memory by action observation. J. Neurosci.25, 9339–9346. 10.1523/JNEUROSCI.2282-05.2005
70
StephensonC.FeatherJ.PadhyS.ElibolO.TangH.McDermottJ.et al. (2019). “Untangling in invariant speech recognition,” in Advances in Neural Information Processing Systems 32 (Vancouver, BC).
71
TauroF.GrimaldiS.PorfiriM. (2014). Unraveling flow patterns through nonlinear manifold learning. PLoS ONE9, e91131. 10.1371/journal.pone.0091131
72
TenenbaumJ. B.SilvaV. D.LangfordJ. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science290, 2319–2323. 10.1126/science.290.5500.2319
73
TeylerT. J.DiScennaP. (1986). The hippocampal memory indexing theory. Behav. Neurosci.100, 147.
74
TheriaultC.ThomeN.CordM. (2013). “Dynamic scene classification: learning motion descriptors with slow features analysis,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (Portland, OR), 2603–2610.
75
TranL.YinX.LiuX. (2017). “Disentangled representation learning gan for pose-invariant face recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (Honolulu, HI), 1415–1424.
76
TreismanA. (1996). The binding problem. Curr. Opin. Neurobiol.6, 171–178.
77
VapnikV. (1999). The Nature of Statistical Learning Theory. New York, NY: Springer Science & Business Media.
78
VermaV.LambA.BeckhamC.NajafiA.MitliagkasI.Lopez-PazD.et al. (2019). “Manifold mixup: better representations by interpolating hidden states,” in International Conference on Machine Learning (Long Beach, CA: PMLR), 6438–6447.
79
Von Der MalsburgC. (1994). The Correlation Theory of Brain Function. Berlin: Springer.
80
VyasS.GolubM. D.SussilloD.ShenoyK. V. (2020). Computation through neural population dynamics. Annu. Rev. Neurosci.43, 249. 10.1146/annurev-neuro-092619-094115
81
WhitneyH. (1936). Differentiable manifolds. Ann. Math.645–680.
82
WillshawD. J.BunemanO. P.Longuet-HigginsH. C. (1969). Non-holographic associative memory. Nature222, 960–962.
83
WiskottL.SejnowskiT. J. (2002). Slow feature analysis: unsupervised learning of invariances. Neural Comput.14, 715–770. 10.1162/089976602317318938
84
WuW.-T. (2008). “On the realization of complexes in Euclidean spaces I,” in Selected Works Of Wen-Tsun Wu (World Scientific), 23–69.
85
ZhaiX.KolesnikovA.HoulsbyN.BeyerL. (2022). “Scaling vision transformers,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (New Orleans, LA), 12104–12113.
86
ZhangH.El-GaalyT.ElgammalA.JiangZ. (2013). “Joint object and pose recognition using homeomorphic manifold analysis,” in Proceedings of the AAAI Conference on Artificial Intelligence (Bellevue, Washington, DC), 1012–1019.
87
ZhangZ.TaoD. (2012). Slow feature analysis for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell.34, 436–450. 10.1109/TPAMI.2011.157
Summary
Keywords
blessing of dimensionality, object recognition, motor control, manifold embedding, manifold flattening
Citation
Li X and Wang S (2023) Toward a computational theory of manifold untangling: from global embedding to local flattening. Front. Comput. Neurosci. 17:1197031. doi: 10.3389/fncom.2023.1197031
Received
30 March 2023
Accepted
11 May 2023
Published
31 May 2023
Volume
17 - 2023
Edited by
Nicolangelo Iannella, University of Oslo, Norway
Reviewed by
Ivan Raikov, Stanford University, United States; Jian K. Liu, University of Leeds, United Kingdom; Sadra Sadeh, Imperial College London, United Kingdom
Updates

Check for updates
Copyright
© 2023 Li and Wang.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Xin Li xin.li@mail.wvu.edu
Disclaimer
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.