Generative Models of Brain Dynamics

Ramezanian-Panahi, Mahta; Abrevaya, Germán; Gagnon-Audet, Jean-Christophe; Voleti, Vikram; Rish, Irina; Dumas, Guillaume

doi:10.3389/frai.2022.807406

REVIEW article

Front. Artif. Intell., 15 July 2022

Sec. Machine Learning and Artificial Intelligence

Volume 5 - 2022 | https://doi.org/10.3389/frai.2022.807406

Generative Models of Brain Dynamics

Mahta Ramezanian-Panahi^1,2^*

Germán Abrevaya^1,3

Jean-Christophe Gagnon-Audet^1,2

Vikram Voleti^1,2

Irina Rish^1,2

Guillaume Dumas^1,2,4

¹Mila-Quebec AI Institute, Montréal, QC, Canada
²Université de Montréal, Montréal, QC, Canada
³Departamento de Física, Facultad de Ciencias Exactas y Naturales, Instituto de Física de Buenos Aires (IFIBA), CONICET, Universidad de Buenos Aires, Buenos Aires, Argentina
⁴Department of Psychiatry, CHU Sainte-Justine Research Center, Mila-Quebec AI Institute, Université de Montréal, Montréal, QC, Canada

This review article gives a high-level overview of the approaches across different scales of organization and levels of abstraction. The studies covered in this paper include fundamental models in computational neuroscience, nonlinear dynamics, data-driven methods, as well as emergent practices. While not all of these models span the intersection of neuroscience, AI, and system dynamics, all of them do or can work in tandem as generative models, which, as we argue, provide superior properties for the analysis of neuroscientific data. We discuss the limitations and unique dynamical traits of brain data and the complementary need for hypothesis- and data-driven modeling. By way of conclusion, we present several hybrid generative models from recent literature in scientific machine learning, which can be efficiently deployed to yield interpretable models of neural dynamics.

Introduction

“What I cannot create I do not understand.” — Richard Feynman

The explosion of novel data acquisition and computation methods has motivated neuroscientists to tailor these tools for ad-hoc problems. While attempts at pattern detection in enormous datasets are commonplace in the literature-representing a logical first step in applying learning algorithms to complex data-such efforts provide little insight into the observed mechanisms and emission properties. As the above quote from R. Feynman suggests, such methods are understanding the brain. The importance of developing interpretable algorithms for biological data-beyond the standard “black-box” models of conventional machine learning-is underscored by the pressing need for superior explainability seen in medical and health-related research. To this end, formal modeling (the practice of expressing some dependent variable unequivocally in terms of some other set of independent variables Wills and Pothos, 2012) is the only way for transparent and reproducible theories (Guest and Martin, 2020). In the present review, we propose that a class of architectures known as generative models constitute an emergent set of tools with superior properties for reconstructing segregated and whole-brain dynamics. A generative model may consist of, for example, a set of equations that determine the evolution of the signals from a human patient based on system parameters. In general, generative models have the benefit over black-box models containing inference mechanisms rather than simple predictive capacity.

Why Prefer Inference Over Prediction?

Put simply: the goal of science is to leverage prior knowledge, not merely to forecast the future (a task well suited to engineering problems), but to answer “why,” questions, and to facilitate the discovery of mechanisms and principles of operation. Bzdok and Ioannidis (2019) discuss why inference should be prioritized over prediction for building a reproducible and expandable body of knowledge. We argue that this priority should be especially respected for clinical neuroscience.

It is important to note that modeling is, and should be, beyond prediction (Epstein, 2008). Not only does explicit modeling allow for explanation (which is the main point of science), but it also directs experiments and allows for the generation of new scientific questions.

In this paper, we demonstrate why focusing on the multi-scale dynamics of the brain is essential for biologically plausible and explainable results. For this goal, we review a large spectrum of computational models for reconstructing neural dynamics developed by diverse scientific fields, such as biological neuroscience (biological models), physics, and applied mathematics (phenomenological models), as well as statistics and computer science (data-driven models). On this path, it is crucial to consider the uniqueness of neural dynamics and the shortcomings of data collection. Neural dynamics are different from other forms of physical time series. In general, neural ensembles diverge from many canonical examples of dynamical systems in the following ways:

Neural Dynamics Is Different

A neural ensemble is distinctive from the general notion of the dynamical system:

• Unlike chemical oscillations and power grids, the nervous system is a product of biological evolution, which makes it special regarding complexity and organization.

• Like many biophysical systems, it is highly dissipative and functions in non-equilibrium regimes (at least while working as a living organ).

• Although the brain exhibits continuous neuromodulation, the anatomical structure of the brain is encoded in the genome, hence it is essentially determined (Rabinovich et al., 2006).

• There are meaningful similarities in brain activity across species. This is especially good news because, unlike humans, neural properties of less-complicated species are well-characterized (White et al., 1986).

These characteristics help narrow down the search for useful models.

Neural Data Is Different

Neural recordings—especially of human subjects—are noisy and often scarce. Due to requirements of medical certification, cost of imaging assays, and the challenges with recruitment, acquiring these datasets can be both expensive and time-consuming. Moreover, such data can be difficult to wrangle and contains inconsistent noise -not only across participants, but quite often for a single participant at different times (e.g., artifacts, skin condition, and time resolution in the case of EEG).

Overview of Generative Models

Our focus is on generative models. Generative modeling can, in the current context, be distinguished from discriminative or classification modeling; in the sense that there is a probabilistic model of how observable data is generated by unobservable latent states. Almost invariably, generative models in imaging neuroscience are state space or dynamic models based upon differential equations or density dynamics (in continuous or discrete state spaces). Generative models can be used in one of two ways: first, they can be used to simulate or generate plausible neuronal dynamics (at multiple scales), with an emphasis on reproducing emergent phenomena of the sort seen in real brains. Second, the generative model can be inverted, given some empirical data, to make inferences about the functional form and architecture of distributed neuronal processing. In this use, the generative model is used as an observation model and is optimized to best explain some data. Crucially, this optimization entails identifying both the parameters of the generative model and its structure, via the process of model inversion and selection, respectively. When applied in this context, generative modeling is usually deployed to test hypotheses about functional brain architecture is (or neuronal circuits) using (Bayesian) model selection. In other words, comparing the evidence (a.k.a. marginal likelihood) for one model against some others.

Current generative models fall into three main categories as shown in Figure 1 with respect to their modeling assumption and objective:

1. Biophysical models: Biophysical models are realistic models which encapsulate biological assumptions and constraints. Due to large number of components and the empirical complexity of the systems modeled, examples of biophysical models run the gamut, from very small, with a high degree of realism (e.g., Hodgkin and Huxley's model of squid giant axon), to large scale (e.g., Izhikevich and Edelman, 2008 model of whole cortex). Due to computational limitations, large-scale models are often accompanied by increasing levels of simplification. Blue Brain Project (Markram, 2006) is an example of this type of modeling.

2. Phenomenological models: Analogies and behavioral similarities between neural populations and established physical models open the possibility of using well-developed tools in Statistical Physics and Complex Systems for brain simulations. In such models, some priors of the dynamics are given but not by realistic biological assumptions. A famous example is the model of Kuramoto oscillators (Bahri et al., 2020) in which the goal is to find the parameters that best reconstruct the behavior of the system. These parameters describe the property of the phenomenon (e.g., the strength of the synchrony), although they do not directly express the fabric of the organism.

3. Agnostic computational: Data-driven methods that, given a “sufficient” amount of data, can learn reconstruct the behavior with little prior knowledge. Examples of such approaches are some self-supervised methods such as latent ODEs (Chen et al., 2018). The term “sufficient” expresses the main limitation of these approaches. Such approaches often need unrealistically large datasets and come with intrinsic biases. In addition, the representation that these models provide can be analytically far from the physics of the system or the phenomenon.

FIGURE 1

Figure 1. Venn diagram of the generative models of interest. Based on the abstraction and assumption, methods might belong to one or more of the three worlds of machine learning, neuroscience, and dynamical systems. This review is structured into three main categories that are in fact, intersections of these fields: biophysical (Section 1), phenomenological (Section 2), and agnostic modeling (Section 3). Tools developed independently in each of these fields can be combined to overcome the limitation of data.

Figure 2 shows an overview of various generative models and the presence in the literature up to this date.

FIGURE 2

Figure 2. Overview of generative models: well-developed models (blue), partially-explored approaches (purple), and modern pathways with little or no literature on neural data (red).

Key Contributions: The objective is to bridge a gap in the literature of computational neuroscience, dynamical systems, and AI and to review the usability of the proposed generative models concerning the limitation of data, the objective of the study and the problem definition, prior knowledge of the system, and sets of assumptions (see Figure 2).

1. Biophysical Models

Understanding how cognition “emerges” from complex biophysical processes has been one of the main objectives of computational neuroscience. Although inferring high-level cognitive tasks from biological processes is not easily achieved, different biophysical simulations provide some “explanation” of how neural information relates to behavior. Those attempts are motivated by the need for interpretable and biologically-detailed models.

While there is as yet no “unifying theory of neuroscience,” biological neuronal models are being developed at different scales and with different degrees of abstraction (see Figure 3). These models are usually grouped into two main categories:the first represents a “bottom-up” approach, which emphasizes biophysical details for fine-scale simulation and expects the emergence¹. An example of this approach is the Blue Brain project (Markram, 2006).

FIGURE 3

Figure 3. Instances of modeling across different levels of the organization and problem dimension. The conceptual scope is an indicator of biophysical details incorporated in the model. It determines how the focus of the model is directed toward mechanistic reality or the behavioral output. It is also an indicator of where a given model sits on the Marr's level.

Conversely, “top-down” schemes focus on explicit high-level functions and design frameworks based on some targeted behavior. Each of the two approaches works with a different knowledge domain and has its own pitfalls. The top-down approach can incorporate behavioral insights without concerning itself with hard-to-code biological details to generate high-level observed behavior. Models of this kind do not provide low-level explanations and are prone to biases related to data collection (Srivastava et al., 2020). The bottom-up perspective, on the other hand, benefits from a customized level of biophysical insight. At the same time, its description is not generalizable to behavior, and it can be difficult to scale (thanks to unknown priors and numerous parallel mechanisms). Also, the reductionist approach to complex systems (e.g., the brain) is subject to substantial criticism. In particular, while a reductionist approach can help to examine causality, it is not enough for understanding how the brain maps onto the behavior (Anderson, 1972; Krakauer et al., 2017).

In this section, we review brain models across different scales that are faithful to biological constraints. We focus primarily on the first column from the left in Figure 3, starting from the realistic models with mesoscopic details to more coarse-grained frameworks.

1.1. Modeling at the Synaptic Level

The smallest interacting blocks of the nervous system are proteins (van den Heuvel et al., 2019). Genetic expression maps and atlases are useful for discovering the functions of these blocks in the neural circuit (Mazziotta et al., 2000). However, these maps are not uniformly expressed in the brain (Lein et al., 2007). While the expression maps of those proteins continue to unfold (Hawrylycz et al., 2012), combined with connectivity data, they can help quantify dynamics. These maps link the spatial distribution of gene expression patterns, and neural coupling (Richiardi et al., 2015) as well as other large-scale dynamics, e.g., dynamic connectivity as a dependent of neurogenetic profile (Diez and Sepulcre, 2018).

A notable effort in this regard is the Allen Brain Atlas (Jones et al., 2009) in which genomic data of mice, humans, and non-human primates (Hawrylycz et al., 2014) have been collected and mapped for understanding structural and functional architecture of the brain (Gilbert, 2018). While genomic data by itself is valuable for mapping out connectivity in different cell types, a fifth division of AA, Allen Institute for Neural Dynamics was recently announced, with the aim of studying the link between neural circuits of laboratory mice and behaviors related to foraging (Chen and Miller, 2021).

On a slightly larger scale, a considerable amount of work concerns the relationship between cellular and intracellular events and neural dynamics. Intracellular events and interactions models could generate accurate responses on small (Dougherty et al., 2005) and large scales (Whittington et al., 1995). Some of these models laid the foundation of computational neuroscience and are reviewed in Section 1.2. In what follows, we start with neurocomputational models at the mesoscale level (realistic models of small groups of neurons, i.e., the top-left corner of Figure 3), after which we move on toward macro-scale levels with different degrees of abstraction.

1.2. Basic Biophysics of Neurons: A Quick History

Zooming out from the intra-neuron synaptic level, inter-neuron communication emerges as a principal determinant of the dynamics. Information transmission is mainly based on the emission of action potentials. The mechanism of this flow of ions was first explained by the influential Hodgkin-Huxley equations and corresponding circuits. The electrical current of the equivalent circuit is described by four differential equations that incorporate membrane capacity and the gating variables of the channels (Hodgkin and Huxley, 1952). While the Hodgkin-Huxley model agrees with a wide range of experiments (Patlak and Ortiz, 1985; Traub et al., 1991) and continues to be a reference for models of ion channels, it needs to be simplified to be expandable to the models of the neuronal population. The main difficulty with the Hodgkin-Huxley model is that it requires solving a system of differential equations for each of the gating parameters of each of the single ion channels of a cell while there are more than 300 types of ion channels discovered as of today (Gabashvili et al., 2007). Various relaxing assumptions have been proposed, one of which is to dismiss the time dependence of membrane conductance and the dynamics of the action potential by simply assuming the firing happens when the electrical input accumulated at the membrane exceeds a threshold (Abbott and Kepler, 1990). The latter model is known as integrate-and-fire (Stein and Hodgkin, 1967), and it comes in different flavors depending on the form of nonlinearity assumed for the dynamics of leaky or refractory synapses (Michaels et al., 2016).

To model the interesting dynamics of various ion channels, a model of compartments of dendrites, called the multicompartment model, can be employed. An exclusive review article by Herz et al. (2006) categorizes compartmental models into five groups based on the level of balance and details involved from Hodgkin-Huxley description to black-box.

While the research on hyper-realistic modeling of many neurons continues, other frameworks focus on simulating the biophysics of the population of neurons. In Section 1.3, we pause on the state of large-scale synaptic simulations to show how a change in computational paradigm helps in overcoming some of the limitations inherent in these models. Models of Neural mass and Wilson-Cowan are examples of such alternatives (see Sections 1.3.1, 1.3.2, respectively).

1.3. Population-Level Models

Izhikevich and Edelman (2008) describe the first attempt in reconstructing the whole cortex. Their simulation includes a microcircuitry of 22 basic types of neurons with simplified dendrite trees and fewer synapses. The underlying structural data based on the geometry of the white matter is drawn from diffusion tensor imaging (DTI) (Honey et al., 2009) of the human brain. The microcircuitry of the six-layered neocortex was reconstructed based on cats' visual cortex. The spiking dynamics employed in this model comes from Izhikevich (2003) and it is a simplification of the Hodgkin-Huxley model as it outputs the firing rates instead of currents. On a larger scale, some subcortical dynamics (e.g., dopaminergic rewarding from the brainstem) are also implemented.

The significance of this simulation compared to preceding efforts is its inclusion of all cortical regions and some of their interplays in the form of cortico-cortical connections. The researchers also considered synaptic plasticity a significant factor in studying developmental changes such as learning. The model demonstrates several emergent phenomena such as self-sustained spontaneous activity, chaotic dynamics, and avalanches, alongside delta, alpha, and beta waves, and other heterogeneous oscillatory activities similar to those in the human brain.

Complexity aside, the model has its shortcomings, including extreme sensitivity to the initial condition. To address this, the authors suggest studying the population behavior instead of single-cell simulations. Despite all the limits, Izhikevich and Edelman (2008) is the first benchmark of whole-cortex modeling and the foundation of future detailed projects such as Blue Brain project (Markram, 2006) and MindScope (Koch et al., 2014).

Following Izhikevich and Edelman (2008), the Blue Brain Project (Markram, 2006) was founded in 2005 as a biological simulation of synapses and neurons of the neocortical microcircuitry. The ambitious goal was to extend this effort to a whole-brain level and build “the brain in a box.” The initial simulated subject was only a 2mm tall and 210μm in radius fragment of the somatosensory cortex of a juvenile rat (~100, 000 neurons). The efforts for further expansion to larger scales, i.e., mouse whole-brain and human-whole brain, are far-fetched by many critics (Abbott, 2020).

Far from the initial promise of “understanding” of the brain, the Blue Brain Project is still far from incorporating the full map of connections (also known as connectome Horn et al., 2014) in the mouse brain, which is still an order of magnitude smaller than the human brain (Frégnac and Laurent, 2014). That being said, acquiring the connectomic map does not necessarily result in a better understanding of function. Note that while the connectomic structure of the roundworm Caenorhabditis elegans nervous system has been entirely constructed since 1986 (White et al., 1986), research is still unable to explain the behavior of the network, e.g., predicting stimuli based on excitation (Koch, 2012). Finally, strong concerns regarding the validity of the experiments rise from the fact that the simulation still does not account for glial cells. Glial cells constitute 90% of the brain cells. They have distinctive mechanisms as they do not output electric impulses (Fields et al., 2014) but are responsible for inactivating and discharging products of neuronal activities which influence the synaptic properties (Henn and Hamberger, 1971) and consequently learning and cognitive processes (Fields et al., 2014). This point of incompleteness sheds extra doubt on the achievability of brain in silico from the Human Brain Project.

The above critiques have been called for a revision of the objectives of the Blue Brain project with more transparency. Hence, new strategies such as the division of Allen Institute, MindScope (Hawrylycz et al., 2016), and the Human Brain Project (Amunts et al., 2016) aim for adaptive granularity, more focused research on human data, and pooling of resources through cloud-based collaboration and open science (Fecher and Friesike, 2014). Alternatively, smaller teams developed less resource-intensive simulation tools such as Brian (Stimberg et al., 2019) and NEST (Gewaltig and Diesmann, 2007).

There are several readily-available simulators of large networks of spiking neurons to reconstruct many-neuron biophysics. Brian (Stimberg et al., 2019) is a Python package for defining a customized spiking network. The package can automatically generate the code for simulating a computationally-optimal language (e.g., C++, Python, or Cython). With GPUs available, it can also enable parallelism for faster execution. Brian is more focused on single-compartment models while GENESIS (Bower and Beeman, 2012) and NEURON (Carnevale and Hines, 2006) center around multicompartment cells.

NEST is another popular package for building ad-hoc models of spiking neurons with adjusted parameters. These parameters include the spiking rules (such as IF, Hodgkin-Huxley AdEx), networks (topological or random neural networks), synaptic dynamics (plasticity expressions, neuromodulation) (Gewaltig and Diesmann, 2007).

While working with mid-level packages, Technical limitations and the objective of the study should be considered. These include computational efficiency and the code generation pipeline. Interested researchers are encouraged to refer to the review by Blundell et al. (2018) to learn more about the guidelines and proposed solutions.

The steep price of high-resolution computation and the remoteness from high-level cognition can be levitated by replacing the detailed dynamics of single neurons with the collective equations of the population. This dimensionality-reduction strategy is the essence of the neural mass models (David and Friston, 2003), spiking neural network (Vreeken, 2003), and dynamical causal modeling (Friston et al., 2003).

1.3.1. Neural Mass Models

Staying faithful to the biophysical truth of the system can happen at scales larger than a few cells. In other words, by reducing the degrees of freedom, one can reduce a massive collection of individual integrate-and-fire equations (mentioned in Section 1.2) to a functional DE of the probabilistic evolution of the whole population known as Fokker-Planck DE. However, since Fokker-Planck equations are generally high-dimensional and intractable, a complimentary formalism, known as the mean-field approximation, is proposed for finessing the system (Deco et al., 2008).

In statistical physics, the mean-field approximation is a conventional way of lessening the dimensions of a many-body problem by averaging over the degrees of freedom. A well-known classic example is the problem of finding collective parameters (such as pressure or temperature) of a bulk of gas with known microscopic parameters (such as velocity or mass of the particles) by the Boltzmann distribution. The analogy of the classic gas shows the gist of the neural mass model: the temperature is an emergent phenomenon of the gas ensemble. Although higher temperatures correspond to higher average velocity of the particles, one needs a computational bridge to map microscopic parameters to the macroscopic one(s). To be clear, remember that each particle has many relevant attributes (e.g., velocity, mass, and the interaction force relative to other particles). Each attribute denotes one dimension in the phase space. One can immediately see how this problem can become computationally impossible even for 1cm³ of gas with ~10¹⁹ molecules.

The current state of thermodynamics accurately describes the macroscopic behavior of gas, so why not use this approximation to the many-body problems of neuronal populations? The analogous problem for a neural mass model can be described with the single-neuron activity and membrane potential as the microscopic parameter and the state of the neural ensemble in phase space as the macroscopic parameter. The computational bridge is based on Fokker-Planck equations for separate ensembles.

Neural mass models can be used both for understanding the basic principles of neural dynamics and building generative models (Friston, 2008). They can also be generalized to neural fields with wave equations of the states in phase space (Coombes, 2005) as well as other interesting dynamical patterns (Coombes et al., 2014). Moreover, these models are applicable across different scales and levels of granularity from subpopulations to the brain as a whole. This generalizability makes them a good candidate for analysis on different levels of granularity, ranging from modeling the average firing rate to decision-making and seizure-related phase transitions. The interested readers are encouraged to refer to the review in Deco et al. (2008) to see how neural mass models can provide a unifying framework to link single-neuron activity to emergent properties of the cortex. Neural mass and field models build the foundation for many of the large-scale in-silico brain simulations (Coombes and Byrne, 2019) and have been deployed in many of the recent computational environments (Ito et al., 2007; Jirsa et al., 2010). Note that the neural mass model can show inconsistency in the limits of synchrony and require complementary adjustments for systems with rich dynamics (Deschle et al., 2021) by mixing with other models of neural dynamics such as Wilson-Cowan (Wilson and Cowan, 1972) as in Coombes and Byrne (2019).

1.3.2. Wilson-Cowan

Wilson-Cowan is a large-scale model of the collective activity of a neural population based on mean-field approximation (see Section 1.3.1). Seemingly the most influential model in computational neuroscience after Hodgkin-Huxley (Hodgkin and Huxley, 1952) is Wilson-Cowan (Wilson and Cowan, 1972) with presently over 3,000 mentions in the literature.

The significance of this work in comparison to its proceedings (e.g., in Beurle, 1956; Anninos et al., 1970) is more than a formal introduction of tools from dynamical systems in neuroscience. This model acknowledges the diversity of synapses by integrating distinct inhibitory and excitatory subpopulations. Consequently, the system is described by two state variables instead of one. Moreover, the model accounts for Dale's principle (Eccles et al., 1954) for a more realistic portrayal. That is to say, each neuron is considered purely inhibitory or excitatory. The four theorems proved in the seminal paper (Wilson and Cowan, 1972) conclude the existence of oscillations as a response to a specific class of stimulus configuration and the exhibition of simple hysteresis for other classes of stimuli.

Wilson-Cowan model lays the foundation for many of the major theoretical advances. Examples of the derivative studies include energy function optimization for formulating associative memory (Hopfield, 1982), artificial neural networks as a special case with binary spiking neurons (Hinton and Sejnowski, 1983), pattern formation (Amari, 1977), brain wave propagation (Roberts et al., 2019), movement preparation (Erlhagen and Schöner, 2002), and Dynamic Causal Modeling (Sadeghi et al., 2020). Other studies also demonstrate the possibility of diverse nonlinear behavior of networks of Wilson-Cowan oscillators (MacLaurin et al., 2018; Wilson, 2019). More detailed extensions are on the way. For example, second-order approximations (El Boustani and Destexhe, 2009) and simulation of intrinsic structures such as spiking-frequency adaptation or depressing synapses (Chen and Miller, 2018). For a comprehensive list of continuations, see Destexhe and Sejnowski (2009).

1.3.3. Spiking Neural Network: Artificial Neural Networks as a Model of Natural Nervous System

With the introduction of neural networks, the idea of implementing neural circuits and biological constraints into artificial neural networks (ANN) gained momentum. McCulloch and Pitts (1943) is an early example that uses ANN with threshold spiking behavior. Despite being oversimplified, their idea formed the basis for a particular type of trainable network known as spiking neural networks (SNN) or biological neural networks (as in Vreeken, 2003). Note that the distinction here with the other forms of spiking networks like Izhinevich's and derivatives (discussed earlier in Section 1.3) is that here we are talking about the networks that demonstrate a function approximation as a deep learning algorithm would do (Box 1).

Box 1. Neuromorphic computers: Architectures tailored for spiking networks.

The disparity in energy consumption and computing architecture of biological and silicon neurons are the most important factors that raise eyebrows in assessing brain-like algorithms. The brain consumes ~20 watts of power while this amount for a supercomputer is in the order of megawatts (Zhang et al., 2018). This twist verifies that the processing of information in these simulations is far from the biological truth. Apart from the energy consumption gap, the non-Von Neumann architecture of the brain is another discrepancy that stands in the way of realistic brain simulation in silico. There is no Von Neumann bottleneck in the brain as there is no limitation on throughput as a result of separation of memory and computing unit (Wulf and McKee, 1995). The brain also has other features that are greatly missed in deep networks. These include synaptic plasticity, high parallelism due to a large number of neurons, high connectivity due to a large number of synapses, resilience to degradation, and low speed and frequency of communication, among other things. Although many of the aspects of biological cognition are complicated to reconstruct (e.g., embodiment and social interaction), the research in neuromorphic computing is addressing the disparities above by targeting hardware design (Cai and Li, 2021).

A potential solution for narrowing this computation gap can be sought at the hardware level. An instance of such a dedicated pipeline is neuromorphic processing units (NPU) that are power efficient and take time and dynamics into the equation from the beginning. An NPU is an array of neurosynaptic cores that contain computing models (neurons) and memory within the same unit. In short, the advantage of using NPUs is that they resemble the brain more realistically than a CPU or GPU because of asynchronous (event-based) communication, extreme parallelism (100–1,000,000 cores), and low power consumption (Eli, 2022). Their efficiency and robustness also result from the Physical proximity of the computing unit and memory. Below popular examples of such NPUs are listed. Each of them stemmed from different initiatives.

• SpiNNaker or “Spiking Neural Network architecture” is an architecture based on low-power microprocessors and was first introduced in 2005 to help the European Brain Project with computations of large cortical area. The first version could imitate ten thousand spiking neurons and four million synapses with 43 nano Joules of energy per synaptic event (Sharp et al., 2012).

• TrueNorth chips are arrays of 4,096 neurosynaptic cores amounting to 1 million digital neurons and 256 million synapses. IBM builds TrueNorth primarily as a low-power processor suitable for drones, but it is highly scalable and customizable (Akopyan et al., 2015).

• Loihi chips have demonstrated significant performance in optimization problems. Intel's fifth NPUs has incorporated biophysical reconstruction of hierarchical connectivity, dendritic compartments, synaptic delays, reward traces. Its circuit is composed of dandrite units (for updating state variables), axon units (generating feed for the subsequent cores), and learning unit (for updating weights based on customized learning rules) (Davies et al., 2018)

An integrative example of the implementation discussed above is NeuCube. NueCube is a 3D SNN with plasticity that learns the connections among populations from various STBD modulations such as EEG, fMRI, genetic, DTI, MEG, and NIRS. Gene regulatory networks can be incorporated as well if available. Finally, This implementation reproduces trajectories of neural activity. It has more robustness to noise and higher accuracy in classifying STBD than standard machine learning methods such as SVM (Kasabov, 2014).

Beyond biological alikeness, neuromorphic computing has important technical aspects that are missing in conventional compute units and can revolutionize neural data processing. They demonstrate lower latency, power consumption, and high portability required for real-time interpretation. These attributes make them useful for recent signal collectors like wearable EEG. On the other hand, although they have shown to be highly scalable and adaptable, their high cost per bit is a major pitfall (Davies, 2021; Sharifshazileh et al., 2021).

In contrast to deep neural networks, the activity in this architecture (transmission) is not continuous in time (i.e., during each propagation cycle). Instead, the activities are event-based occurrences with the event being the action potential depolarization². Although ANN architectures that are driven by spiking dynamics have been long used for optimization problems such as pattern recognition (Kasabov, 2007) and classification (Soltic et al., 2008), they lag behind conventional learning algorithms in many tasks, but that is not the end of the story.

Maass (1997) argues that concerning network size, spiking networks are more efficient in computation compared to other types of neural networks such as sigmoidal. Therefore it is worthwhile to implement SNNs in a more agnostic manner as spiking RNNs. Examples of such promising implementations are reservoir computing, liquid, and each state machine. For more on those architectures, see Section 3.1.2.

1.4. Brain Atlases: Whole- and Population-Level Modeling

The 21^st century has been the bursting era of large-scale brain initiatives. The objective of the simulation partly justifies this multitude. As it was previously mentioned, the notion simulation is highly versatile in meaning depending on the goal of the project (de Garis et al., 2010), i.e., where it sits on the Figure 3. Some of the projects of this spectrum are listed below.

• BigBrain: a free-access and few-cell-resolution model of human brain in 3D (Landhuis, 2017).

• Allen Brain Atlas: genome-wide map of gene expression for the human adult and mouse brain (Jones et al., 2009).

• Human Connectome Project: a large-scale structural and functional connectivity map of the human brain (coined as connectome in Sporns et al., 2005; Van Essen et al., 2013).

• Brain Research through Advancing Innovate Neurotechnologies: BRAIN (Devor et al., 2013).

• The Virtual Brain (TVB): an open-source neural dynamics simulator using real anatomical connectivity (Jirsa et al., 2010).

• Human Brain Project (HBP): aimed to realistically simulate the human brain in supercomputers (Miller, 2011).

Scaling compute power does not suffice for leveling up to the whole-brain models. Another challenge is the integration of time delays that become significant at the whole-brain level. In local connections, the time delays are small enough to be ignored (Jirsa et al., 2010) the transmission happens in a variety of finite speeds from 1 to 10 m per second. As a result of this variation, time delays between different brain parts are no longer negligible. Additional spatial features emerge by the implementation of this heterogeneity (Jirsa and Kelso, 2000; Petkoski and Jirsa, 2019).

Larger scale approaches could adapt neural mechanisms that rely on intra-region interactions (da Silva, 1991) in order to ditch the problems related to the synaptic level studies mentioned earlier. The Virtual Brain (TVB) project is one of these initiatives. TVB captures the network dynamics of the brain by stimulating the neural population structural connectivity, the variant time scales, and noise (Sanz Leon et al., 2013). TVB allows testing subject-specific hypotheses as the structural connectivity is based on individual DTI. The large-scale activity is an integration of local neural masses connected through large-range dynamics. It has a web platform GUI and can run on a personal computer and has already implemented many types of dynamics for different types of brain signals, namely EEG, MEG, BOLD, fMRI.

With models like TVB, one should note the shift in paradigm from the fine-scale simulations like Blue Brain. Contrary to the Blue Brain, the nodes consist of large groups of neurons (order of a few millimeters), not one or a few neurons. Consequently, the governing equations are the ones for deriving population dynamics and statistical modes. Another essential point is that TVB allows researchers to study the brain's phenomenology parametrically. The following section is dedicated to such studies.

2. Phenomenological Models

In contrast to realistic biological models of in-vivo events, phenomenological³ models offer a way of qualitative simulation of certain observable behaviors [or, as it is discussed in Dynamical Systems literature (Strogatz, 2018), phase trajectories]. The key assumption is that although short- and long-range dynamics depend on intricate biophysical events, the emerging observables can be encoded in significantly lower dimensions. This dimensionality reduction is thanks to dynamics that are capable of constructing similar statistical features of interest. Since a detailed enough biophysical model should eventually exhibit the same collective statistics, one may argue that the phenomenological models offer a detour to system-level reconstruction by ditching lots of cellular and physiological considerations.

Compared to detailed biophysical models, coarse-grained approaches rely on a smaller set of biological constraints and might be considered “too simplistic.” However, they are capable of reconstructing many collective phenomena that are still inaccessible to hyper-realistic simulations of neurons (Piccinini et al., 2021). A famous example of emergence at this level is synchronizations in cortex (Arenas et al., 2008). Moreover, experiments show that the population-level dynamics that are ignorant about the fine-grained detail better explain the behavior (Briggman et al., 2005; Churchland et al., 2012).

The significance of phenomenological models in the reconstruction of brain dynamics is also because of their intuitiveness and reproducibility. They may demonstrate critical properties of the neuronal population. An interesting example is noise-driven dynamics of the brain, which is responsible for multistability and criticality during resting state (Deco and Jirsa, 2012; Deco et al., 2017).

2.1. Problem Formulation, Data, and Tools

The idea of using phenomenological models for neural dynamics is mainly motivated by the possibility of using tools from dynamical system theory. The goal is to quantify the evolution of a state space built upon the state variables of the system. For example, if one can find two population variables (x, y) that determine the state of a neural ensemble, then all the possible pairs of x and ys form the basis for the state space of the system, let us call this 2-dimensional space A. The state of this ensemble at any given time t can always be expressed as a 2-D vector in A. In mathematics, A is called a vector space defined by the sets of differential equations that describe the evolution of x and y in time. As an intuitive visualization of a vector space, imagine a water swirl: each point of the surface of a water swirl can be represented by a vector with the magnitude and direction of local velocity. One can see how at each point in this space, there is a flow that pushes the system in a specific trajectory. Reproducing features of the brain signals or identifying such a sparse state space and the dynamics of a parsimonious set of state variables allows for forecasting the fate of the neural ensemble in future timesteps (Saggio and Jirsa, 2020). The evolution of the state variables is described by differential equations. In what follows in this section, some of the most prominent phenomenological models and their findings are discussed.

Interactions and connectivity can be observed in a wide set of settings from resting-state activity (Piccinini et al., 2021) to task-specific experiments (Pillai and Jirsa, 2017) by various imaging techniques including fMRI (Hutchison et al., 2013), EEG (Atasoy et al., 2018), MEG (Tait et al., 2021), and Calcium imaging (Abrevaya et al., 2021). In order to study segregation and integration of dynamics, networks of brain connectivity need to be constructed based on imaging data. Brain connectivity here refers to either anatomical, functional, or effective connectivity as described in Table 1. The anatomical connectivity is based in the physiological components and the morphology on fiber pathways, functional connectivity represents the correlation of activity between different regions, and the effective connectivity demonstrates the information flow (Sporns, 2007). For a more comprehensive review of such networks, refer to Wein et al. (2021).

TABLE 1

Table 1. Complex brain networks are measured through Structural Connectivity (SC), Functional Connectivity (FC), or Effective Connectivity (EC). Computational Connectomics is a common ways of formulating structural and functional networks of the whole brain.

2.1.1. Model Selection for Brain Connectivity

Deducing the effective connectivity of functionally-segregated brain regions is crucial in developing bio-plausible and explainable models. It is important to note that while anatomical connectivity rests directly upon data and functional connectivity is based on statistical dependencies in data space; effective connectivity could only be estimated through the inversion of a generative model. In other words, functional connectivity (FC) is data-driven and effective connectivity (EC) is hypothesis-driven, meaning that the FC is derived statistically from spatiotemporal data while EC is not directly observable from imaging and is parameterized as the causal relations among brain regions for different tasks. To find the best descriptive parameters, one needs to test various hypotheses. Table 1 shows examples of formulating brain connectivities. Granger causality is only a validation tool that is used both for optimizing functional and effective connectivity (Valdes-Sosa et al., 2011). Dynamical Causal Modeling (DCM), introduced in Friston et al. (2003), quantitatively generates the connectivities that fit the observed data by maximizing model evidence, aka marginal likelihood of the model (Daunizeau et al., 2011).

DCM can be thought of as a method of finding the optimal parameters of the causal relations that best fit the observed data. The parameters of the connectivity network are (1) anatomical and functional couplings, (2) induced effect of stimuli, and (3) the parameters that describe the influence of the input on the intrinsic couplings. The expectation-maximization (EM) algorithm is the widely-used optimizer. However, EM is slow for large, changing, and/or noisy networks. Zhuang et al. (2021) showed Multiple-Shooting Adjoint Method for Whole-Brain Dynamic outperforming EM on classification tasks while being used for continuous and changing networks.

DCM is, in fact, a method for testing hypotheses and guiding experiments, not a predictive or generative model by itself. Models of the intra-connected regions can be built based on the earlier subsections, e.g., neural mass model, neural fields, or conductance-based models. For a review of such hybrid approaches, see Moran et al. (2013).

Connectivity matrices introduced in Table 1 are the backbone of the information process pipeline. That being said, this parameter needs to be married to the dynamics of the states in the brain. To date, a large portion of studies have focused on mapping these networks onto the resting-state network, and a lot of structure-function questions remained to be answered by studying the task-related data (Cabral et al., 2017). In what follows, the models that quantify these dynamics based on the phenomenology of the behavior are discussed.

2.1.2. Generative Graph Models

Recent progress in the science of complex networks and information theory has paved the way for analytical and numerical models of structural and functional connectivity (Lurie et al., 2020). The network approach to the neural population is a conventional way to study neural processes as information transmission in time-varying networks. This analogy allows for examining the path and behavior of the system in terms of different dynamical properties.

An insightful interplay of function vs. structure is observed along the biologically plausible line of work by Deco and Jirsa (2012). They reconstructed the emergence of equilibrium states around multistable attractors and characteristic critical behavior like scaling-law distribution of inter-brain pair correlations as a function of global coupling parameters. Furthermore, new studies show that synchrony not only depends on the topology of the graph but also on its hysteresis (Qian et al., 2020).

Tools from graph theory and network science (Newman et al., 2006) are used to formulate this relation. Spectral mapping (Becker et al., 2018) and structure-function topological mapping (Liang and Wang, 2017) are proofs of concept in this regard. Generative graph models (traditionally developed by graph theory such as the one for random graph introduced in Erdős and Rényi, 1960) are principle tools of inference in this approach and now have been enhanced by machine learning, see for example, deep-network generative models in Kolda et al. (2014) and Li et al. (2018). Simulations of brain network dynamics and study of controllability (Kailath, 1980) has shown how differently regions are optimized for diverse behavior (Tang et al., 2017).

2.2. Inspiration From Statistical Physics and Nonlinear Dynamical Models

In addition to network science, another axis for interpreting neural data is based on well-established tools initially developed for parametrizing the time evolution of physical systems. Famous examples of these systems include spin-glass (Deco et al., 2012), different types of coupled oscillators (Cabral et al., 2014; Abrevaya et al., 2021), and multistable and chaotic many-body systems (Deco et al., 2017; Piccinini et al., 2021). This type of modeling has already offered promising and intuitive results. In the following subsections, we review some of the recent literature on various methodologies.

2.2.1. Brain as a Complex System

It is not easy to define what a complex system is. Haken (2006) defines the degree of complexity of a sequence as the minimum length of the program and of the initial data that a Turing machine (aka the universal computer) needs to produce that sequence. Despite being a debatable definition, one can conclude that according to it, the spatiotemporal dynamics of the mammalian brain qualifies as a complex system (Hutchison et al., 2011; Sforazzini et al., 2014). Therefore, one needs a complex mechanism to reconstruct the neural dynamics. In the following few subsections, we review candidate equations for the oscillations in cortical network (Buzsáki and Draguhn, 2004).

2.2.1.1. Equilibrium Solutions and Deterministic Chaos

Whole-brain phenomenological models like the Virtual Brain (Sanz Leon et al., 2013) are conventional generators for reconstructing spontaneous brain activity. There are various considerations to have in mind to choose the right model for the right task. A major trade-off is between the complexity and abstractiveness of the parameters (Breakspear, 2017). In other words, to capture the behavior of detailed cytoarchitectural and physiological make-up with a reasonably-parametrized model. Another consideration is the incorporation of noise, which is a requirement for multistable behavior (Piccinini et al., 2021) i.e., transitions between stable patterns of reverberating activity (aka attractors) in a neural population in response to perturbation (Kelso, 2012).

2.2.1.2. Kuramoto

Kuramoto model is a mathematical descriptor of coupled oscillators, one that can be written down as simple as a system of ODEs solely based on sinusoidal interactions (Kuramoto, 1984; Nakagawa and Kuramoto, 1994). Kuramoto model is widely used in physics for studying synchronization phenomena. It is relevant to neurobiological systems as it enables a phase reduction approach: Neural populations can be regarded as similar oscillators that are weakly coupled together. These couplings are parameterized in the model. Kuramoto can be extended to incorporate anatomical and effective connectivity and can expand from a low-level model of few-neuron activity to a stochastic population-level model with partial synchrony and rich dynamical properties. One way to do that is to upgrade the classic linear statistics to nonlinear Fokker-Planck equations (Breakspear et al., 2010).

There is significant literature on Kuramoto models on neural dynamics on different scales and levels. Strogatz (2000) is a conceptual review of decades of research on the principles of the general form of the Kuramoto model. Numerous studies have found consistency between the results from Kuramoto and other classical models in computational neuroscience like Wilson-Cowan (Wilson and Cowan, 1972; Hoppensteadt and Izhikevich, 1998). Kuramoto model is frequently used for quantifying phase synchrony and for controlling unwanted phase transitions in neurological diseases like epileptic seizures and Parkinson's (Boaretto et al., 2019). Still, there are many multistability questions regarding cognitive maladaptation yet to be explored, potentially with the help of Kuramoto models and the maps of effective connectivity. Anyaeji et al. (2021) is a review targeting clinical researchers and psychiatrists. It is a good read for learning about the current challenges that could be formulated as a Kuramoto model. Kuramoto is also unique in adaptability to different scales: from membrane resolution with each neuron acting as a delayed oscillator (Hansel et al., 1995) to the social setting where each subject couples with the other one in the dyad by means of interpersonal interactions (Dumas et al., 2012).

2.2.1.3. Van der Pol

Another model relevant to neuroscience is the van der Pol (VDP) oscillator which is probably the simplest relaxation oscillator (Guckenheimer and Holmes, 2013) and a special case of the FitzHugh-Nagumo model, which is, in turn, a simplification of the Hodgkin-Huxley model (see Section 1.2) (FitzHugh, 1961). Through the Wilson-Cowan approximation (Kawahara, 1980), VDP can also model neural populations. For more information about the Wilson-Cowan model, please see Section 1.3.2. Recently, Abrevaya et al. (2021) have used coupled VDP oscillators to model a low-dimensional representation of neural activity in different living organisms (larval zebrafish, rats, and humans) measured by different brain imaging modalities, such as calcium imaging (CaI) and fMRI. Besides proposing a method for inferring functional connectivity by using the coupling matrices of the fitted models, it was demonstrated that dynamical systems models could be a valuable resource of data augmentation for spatiotemporal deep learning problems.

Looking at the brain as a complex system of interacting oscillators is a detour for expanding the modeling to larger organization scales. The emergent behavior of the system can be described with “order parameters.” Although this is a description with much lower dimensions than the biophysical equations, it still expresses many remarkable phenomena such as phase transitions, instabilities, multiple stable points, metastability, and saddle points (Haken, 2006). However, parametrizing such models is still an ongoing challenge, and many related studies are limited to the resting-state network. The following section reviews the prospects of recent data-driven methods and how they can leverage the study of system-level behavior.

3. Agnostic Computational Models

Jim Garys's framework (Hey, 2009) divides the history of science into four paradigms. Since centuries ago, there have been experimental and theoretical paradigms. Then the phenomena of interest became too complicated to be quantified analytically, so the computational paradigms started with the rise of numerical estimations and simulations. Today, with the bursting advances in recording, storage, and computation capacity of neural signals, neuroscience is now exploring the fourth paradigm of Jim Garys's framework (Hey, 2009) i.e., data exploration in which the scientific models are fit to the data by learning algorithms.

In the introduction, we reflected on how scientists should not settle for mere prediction. While the literature on data-driven methods is enormous, this review focuses mainly on the strategies that help gain mechanistic insights rather than those that reproduce data through operations that are difficult to relate to biological knowledge. Instances of these unfavored methods include strict generative adversarial networks with uninterpretable latent spaces or black-box RNN with hard-to-explain parameters. The following section categorizes these methods into established and emerging techniques and discusses some showcases.

3.1. Established Learning Models

Data-driven models have long been used in identifying structure-function relations (similar to the ones mentioned in Table 1) (McKeown et al., 1998; Koppe et al., 2019). The shift of studies from single-neuron to networks of neurons, has accelerated in the last decade. This trend is because relying on collective properties of a population of neurons to infer behavior seems more promising than reconstructing the physiological activity of single neurons in hopes of achieving emergence. Yuste (2015) argues that the mere representations that relate the state of individual neurons to a higher level of activity have serious shortcomings (Michaels et al., 2016). However, these shortcomings can be addressed by incorporating temporal dynamics and collective measures into the model. We review the models that satisfy this consideration.

3.1.1. Dimensionality Reduction Techniques

Clustering and unsupervised learning are useful for mapping inputs (X) to features (Y). Later, this set of (X, Y) can be extrapolated to unseen data. There are various methods for identifying this mapping or, in other words, for approximating this function. Principal Component Analysis (PCA) is a primary one. PCA maps data onto a subspace with the maximal variance (Markopoulos et al., 2014). It is a common method of dimensionality reduction. However, the orthogonal set of features found by this method are not necessarily statistically-dependent. Therefore, they are not always helpful in finding sources and effective connectivity. Alternatively, Independent Component Analysis, commonly known as ICA, was introduced as a solution to the Blind Source Separation (BSS) problem. Each sample of the data is an ensemble of the state of different sources. However, the characteristics of these sources are the hidden variable (Pearlmutter and Parra, 1997). ICA is effective in finding the related source as it maps the data onto the feature space by minimizing the statistical independence for each feature rather than by minimizing the variance. Conventional use of component analysis is with fMRI and EEG recordings. In each time window, each sensor receives a noisy mix of activities in segregated brain regions. One is usually interested in inferring effective connectivity based on such data. Having a large number of electrodes around the scalp enables ICA to identify the independent sources of activities and artifacts. ICA algorithms come in different flavors depending on the dataset and the property of interest. For example temporal- (Calhoun et al., 2001), spatial- (McKeown et al., 1998), and spatiotemporal-ICA (Wang et al., 2014; Goldhacker et al., 2017) are tailored for different types of sampling. Hybrid approaches, e.g., ICA amalgamated with structural equation modeling (SEM), have shown better performance in given setups with less prior knowledge than SEM alone (Rajapakse et al., 2006). The interested reader is encouraged to refer to Calhoun and Adali (2012) for a dedicated review of ICA methods.

3.1.2. Recurrent Neural Networks

Recurrent neural networks (RNN) are the Turing-complete (Kilian and Siegelmann, 1996) algorithms for learning dynamics and are widely used in computational neuroscience. In a nutshell, RNN processes data by updating a “state vector.” The state vector holds the memory across steps in the sequence. This state vector contains long-term information of the sequence from the past steps (LeCun et al., 2015).

Current studies validate diverse types of RNNs as promising candidates for generating neural dynamics. Sherstinsky (2020) shows how the implicit “additive model,” which evolves the state signal, incorporates some of the interesting bio-dynamical behavior such as saturation bounds and the effects of time delays. Several studies modeled the cerebellum as an RNN with granular (Buonomano and Mauk, 1994; Medina et al., 2000; HofstoÈtter et al., 2002; Yamazaki and Tanaka, 2005) or randomly-connected layers (Yamazaki and Tanaka, 2007). Moreover, similarities of performance and adaptability to limited computational power (as in biological systems) are observed both in recurrent convolutional neural networks and in the human visual cortex (Spoerer et al., 2020).

RNNs vary greatly in architecture. The choice of architecture can be implied by the output of interest (for example text Sutskever et al., 2011 vs. natural scenes Socher et al., 2011) or the approaches to overcome the problem with vanishing and exploding gradient (e.g., long short-term memory (LSTM) Hochreiter and Schmidhuber, 1997, hierarchical Hihi and Bengio, 1995, or gated RNNs Chung et al., 2014).

3.1.2.1. Hopfield

Hopfield network (Hopfield, 1982) is a type of RNN inspired by the dynamics of Ising model (Brush, 1967; Little, 1974). In the original Hopfield mechanism, the units are threshold (McCulloch and Pitts, 1943) neurons, connected in a recurrent fashion. The state of the system is described by a vector V which represents the states of all units. In other words, the network is in fact, an undirected graph of artificial neurons. The strength of connection between units i and j is described by w_ij which is trained by a given learning rule i.e., commonly Storkey (Storkey, 1997) or Hebbian rule (stating that “neurons that fire together, wire together”) (Hebb, 1949). After the training, these weights are set, and an energy landscape is defined as a function of V. The system evolves to minimize the energy and moves toward the basin of the closest attractor. This landscape can exhibit the stability and function of the network (Yan et al., 2013).

The Hopfield model can accommodate some biological assumptions and work in tandem with cortical realizations. Similar to the human brain, Hopfield connections are mostly symmetric. Most importantly, since its appearance, it has been widely used for replicating associative memory. However, soon it was revealed that other dynamical phenomena like cortical oscillations and stochastic activity (Wang, 2010) need to be incorporated in order to capture a comprehensive image of cognition.

3.1.2.2. LSTM

In addition to the problem of vanishing and exploding gradients, other pitfalls also demand careful architecture adjustment. Early in the history of deep learning, RNNs demonstrated poor performance on sequences with long-term dependencies (Schmidhuber, 1992). Long short term memory (LSTM) is specifically designed to resolve this problem. The principle difference of LSTM and vanilla RNN is that instead of a single recurrent layer, it has a “cell” composed of four layers that interact with each other through three gates: input gate, output gate and forget gate. These gates control the flow of old and new information in the “cell state” (Hochreiter and Schmidhuber, 1997). On certain scales of computation, LSTM still has considerable performance compared to trendy sequential models like transformers.

3.1.2.3. Reservoir Computing

A reservoir computer (RC) (Maass et al., 2002) is an RNN with a reservoir of interconnected spiking neurons. Broadly speaking, the distinction of RC among RNNs, in general, is the absence of granular layers between input and output. RCs themselves are dynamical systems that help learn the dynamics of data. Traditionally, the units of a reservoir have nonlinear activation functions that allow them to be universal approximators. Gauthier et al. (2021) show that this nonlinearity can be consolidated in an equivalent nonlinear vector autoregressor. With the nonlinear activation function out of the picture, the required computation, data, and metaparameter optimization complexity are significantly reduced, the interpretability is consequently improved while the performance stays the same.

3.1.2.4. Liquid State Machine

LSM can be thought of as an RNN soup that maps the input data to a higher dimension that more explicitly represents the features. The word liquid come from the analogy of a stone (here an input) dropping into the water (here a spiking network) and propagating waves. LSM maintains intrinsic memory and can be simplified so much that it processes real-time data (Polepalli et al., 2016). Zoubi et al. (2018) shows LSM performs notably in building latent space of EEG data (extendable to fMRI). As for the faithfulness to the biological truth, Several studies argue that LSM surpasses RNNs with granular layers in matching organization and circuitry of cerebellum (Yamazaki and Tanaka, 2007) and cerebral cortex (Maass et al., 2002). Lechner et al. (2019) demonstrate the superiority of a biologically-designed LTM on given accuracy benchmarks to other ANNs, including LSTM.

3.1.2.5. Echo State Network

ESN works as a tunable frequency generator developed by Maass et al. (2002) at the same time and with similar fundamentals as LSM but independent of that. The idea is to have the input induce nonlinear responses in the neurons of a large reservoir and train linear combinations of these responses to produce the desired output. ESN used to be one of the gold standards of nonlinear dynamics modeling before 2010 (Jaeger and Haas, 2004; Jaeger et al., 2007). The first significance of ESN is due to training with linear regression, enabling easy implementation and freedom from gradient decent problems (e.g., bifurcation and vanishing/exploding gradient). With the vast development of deep learning, this feature is no longer a remarkable advantage. However, ESN is still a plausible architecture for non-digital computation substrates such as neuromorphic hardware (Jaeger, 2007; Bürger et al., 2015).

3.1.2.6. Physically-Informed RNN

A prominent factor in determining the dynamical profile of the brain is the intrinsic time delays (Chang et al., 2018). Integrating these time delays into artificial networks was initially an inspiration from neuroscience for AI. Later, they came back as a successful tool for integrated sequence modeling for multiple populations. In the last decade, RNN has been used for reconstructing neural dynamics via interpretable latent space in different recording modalities such as fMRI (Koppe et al., 2019) and Calcium imaging (Abrevaya et al., 2021).

Continuity of time is another extension that can make RNNs more compatible with various forms of sampling and thus neural dynamics from spikes to oscillations. Continous time RNNs (CT-RNNs) are RNNs with activation functions made up of differential equations. They have been proved to be universal function approximators (Funahashi and Nakamura, 1993) and have surfaced recently in the literature as reservoir computers (Verstraeten et al., 2007; Gauthier et al., 2021) and liquid time-constant neural networks (Hasani et al., 2020).

Essentially, finding the optimal architecture and hyperparameters for a given problem does not have a straightforward recipe. The loss function in a deep neural network can be arbitrarily complex and usually takes more than a convex optimization. Li et al. (2018) shows how parameters of the network can change the loss landscape and trainability. Another more specific issue to the algorithms trained on a temporal sequence is catastrophic forgetting and attention bottleneck. These complications arise from the limitation of memory and attention to the past time steps. New attention models such as transformers and recurrent independent mechanisms (see Sections 3.1.4, 3.1.5, respectively) are specifically built to address these issues. As memory-enhanced components, RNN layers appear in other deep and shallow architectures with sequential data as input, including encoder-decoders.

3.1.3. Variational Autoencoder

Variational autoencoder (VAE for short) is a type of neural network that encodes the ground truth as the input onto a “latent space” and then decodes that space for reconstructing the input (Kingma and Welling, 2013). The network is parameterized by minimizing the reconstruction loss, which is, in this case, a metric of information gained by a metric called Kullback-Leibler divergence (Kullback and Leibler, 1951). This metric is also known as variational free energy or evidence lower bound (ELBO). It is the same objective function used in dynamic causal modeling (Winn et al., 2005).

An example of VAE used for regenerating dynamics is by Perl et al. (2020) in which the coupling dynamics of the whole brain and the transitions between the states of wake-sleep progression is generated. The goal is to find low (e.g., as low as 2-) dimensional manifolds that can capture the signature structure-function relationship that demonstrates the stage in the wake-sleep cycle (Vincent et al., 2007; Barttfeld et al., 2015) and the parameters of generic coupled Stuart-Landau oscillators as in Deco et al. (2017). An idea for regenerating dynamics is to use a deep-network embedded differential equations (as in Section 3.2.2) in the latent VAE structure (Chen et al., 2018).

3.1.4. Transformers

The transformer is a relatively new class of ML models that recently has shown state-of-the-art performance on sequence modeling such as natural language processing (NLP) field of research (Vaswani et al., 2017). Beyond NLP, this architecture demonstrates good performance on a wide variety of data, including brain imaging (Kostas et al., 2021; Song et al., 2021; Sun et al., 2021). Similar to RNNs, transformers aim to process sequential data such as natural language or temporal signals. It differs from the RNN paradigm because it does not process the data sequentially; instead, it looks at whole sequences with a mechanism called “attention,” and by doing so, it alleviates the problem of forgetting long dependencies, which is common in RNN and LSTMs. This mechanism can make both long- and short-term connections between points in the sequence and prioritize them. Transformers are widely used for generating foundation models (i.e., models that are pretrained on big data Bommasani et al., 2021) and they can outperform recurrent networks like LSTM with large models/data (Kaplan et al., 2020).

3.1.5. Recurrent Independent Mechanisms

Recurrent independent mechanisms (RIM) are a form of attention model that learns and combines independent mechanisms to boost generalizability and robustness in executing a task. The task in the sense of signal processing can be generating a sequence based on the observed data. The hypothesis is that the dynamics can be learned as a sparse modular structure. In this recurrent architecture, each module independently specializes in a particular mechanism. Then all the RIMs compete through an attention bottleneck so that only the most relevant mechanisms get activated to communicate sparsely with others to perform the task (Goyal et al., 2020).

3.2. Hybrid Approaches, Scientific ML, and the New Frontiers

The independence from prior knowledge sounds interesting as it frees the methodology from inductive biases and makes the models more generalizable by definition. However, this virtue comes at the cost of a need for large training sets. In other words, the trade-off of bias and computation should be considered: Applying lots of prior knowledge and inductive biases result in a lesser need for data and computation. In contrast, little to no inductive bias calls for a great need for big and curated data. It is true that with the advancement of recording techniques, the scarcity of data is less of a problem than it was before, but even with all these advances, having clean and sufficiently large medical dataset that helps with the problem in hand is not guaranteed.

Total reliance on data is especially questionable when the data has significant complications (as discussed in the introduction). Opting for a methodology guided by patterns rather than prior knowledge is problematic in particular when the principle patterns of data arise from uninteresting phenomena such as the particular way a given facility may print out the brain images (Ng, 2021).

The thirst for data aside, one of the prominent drawbacks of agnostic modeling and ML, in particular, is that they are famous for providing opaque blackbox solutions, meaning that by leaving biological priors out of the picture, the explainability of the outcome is weak. Lack of explainability is a pet peeve for people in science as they are interested in both prediction and the reasoning behind those predictions.

In addition to the implicit assumption of the adequacy of training data, the explicit assumption that these models rely on is that the solution is parsimonious, i.e., there are few descriptive parameters. Despite some possibility of error with this assumption in given problems (Su et al., 2017), it is particularly useful in having arbitrarily less complicated descriptions that are generalizable, interpretable, and less prone to overfitting.

The following sections describe general function approximators that could identify data dynamics without injecting any prior knowledge about the system. They could provide a perfect solution for a well-observed system with unknown dynamics. Although some of neural ODE methods have already been applied to fMRI and EEG data (Zhuang et al., 2021), other deep architectures such as GOKU-net and latent ODEs are new frontiers.

3.2.1. Sparse Identification of Nonlinear Dynamics

Kaheman et al. (2020) proposed a novel approach for quantifying underlying brain dynamics. The key assumption is that the governing multi-dimensional principles can be derived by a system of equations describing the first-order rate of change. In order to use sparse regression methods such as Sparse Identification of Nonlinear Dynamics (SINDy), one needs to precisely specify the set of parsimonious state variables (Quade et al., 2018). That being said, SINDy does not work for small datasets. If it is given fewer data than possible terms, the system of governing equations is underspecified. Therefore, the underfitting as a result of insufficient training data is a secondary problem. One approach to address this issue is incorporating the known terms and dismissing the learning for those parts. An example is discussed in Section 3.2.3.

3.2.2. Differential Equations With Deep Neural Networks

A relatively new class of dynamical frameworks combines differential equations with machine learning in a more explicit fashion. It is noteworthy to mention that by neural ODE here, we are referring to the term used in Chen et al. (2018). Neural ODEs are a family of deep neural networks that learn the governing differential equations of the system, not to be confused with the differential equations of neuronal dynamics. This class of frameworks has been used to model the dynamics of time-varying signals. They begin by assuming that the underlying dynamics follow a differential equation. They can then be used to discover the parameters of that differential equation by using standard optimization of deep neural networks. As is evident, such formulations are quite useful in modeling and analyzing brain dynamics, especially using deep networks. Below we describe some of the relevant works in this subfield.

3.2.2.1. Neural Ordinary Differential Equations

Combining ordinary differential equations (ODEs) with deep neural networks has recently emerged as a feasible method of incorporating differentiable physics into machine learning. A Neural Ordinary Differential Equation (Neural ODE) (Chen et al., 2018) uses a parametric model as the differential function in an ODE. This architecture can learn the dynamics of a process without explicitly stating the differential function, as has been done previously in different fields. Instead, standard deep learning optimization techniques could be used to train a parameterized differential function that can accurately describe the dynamics of a system. In the recent past, this has been used to infer the dynamics of various time-varying signals with practical applications (Chen et al., 2018; Jia and Benson, 2019; Kanaa et al., 2019; Rubanova et al., 2019; Yildiz et al., 2019; Kidger et al., 2020; Li et al., 2020; Liu et al., 2020).

3.2.2.2. Latent ODE

A dynamic model such as the Neural ODE can be incorporated in an encoder-decoder framework, resembling a Variational Auto-Encoder, as mentioned in Chen et al. (2018). Such models assume that latent variables can capture the dynamics of the observed data. Previous works (Chen et al., 2018; Kanaa et al., 2019; Rubanova et al., 2019; Yildiz et al., 2019) have successfully used this framework to define and train a generative model on time series data.

3.2.2.3. Stochastic Neural ODEs

Parametric models can also be incorporated into stochastic differential equations to make Neural Stochastic Differential Equations (Neural SDEs) (Li et al., 2020; Liu et al., 2020). Prior works have also introduced discontinuous jumps (Jia and Benson, 2019) in the differential equations.

3.2.2.4. Neural Controlled Differential Equations

Furthermore, latent ODE models can add another layer of abstraction. The observed data is assumed to be regularly/irregularly sampled from a continuous stream of data, following the dynamics described by a continuously changing hidden state. Both the dynamics of the hidden state and the relationship between the interpolated observations and the hidden state can be described by neural networks. Such systems are called Neural Control Differential Equations (Neural CDE) (Kidger et al., 2020). Broadly speaking, they are the continuous equivalent of RNNs.

3.2.3. Differential Equations Enhanced by Deep Neural Networks

The above methods use deep neural networks to define the differential function in ordinary differential equations. In contrast, UDEs and GOKU-nets (described below) take the help of deep neural networks to enhance differential equations. UDEs replace only the unknown parts of a known partial differential equation, while GOKU-nets use explicit differential equations as part of deep neural network pipelines.

3.2.3.1. Universal Differential Equations

Universal Differential Equations (UDE) offer an alternate way of incorporating neural networks into differential equations while accounting for prior knowledge. In their seminal work, Rackauckas et al. (2020) demonstrate how it is possible to aid a partial differential equation model by learning the unknown terms with universal approximators such as neural networks. Furthermore, they show how by combining this approach with a symbolic regression, such as SINDy, these models can accelerate the discovery of dynamics in limited data with significant accuracy.

3.2.3.2. Generative ODE Modeling With Known Unknowns

Another promising approach is the case of the Generative ODE Modeling with Known Unknowns, aka GOKU-nets (Linial et al., 2021). GOKU-net consists of a variational autoencoder structure with ODEs inside. In contrast with Latent ODEs, here, the ODEs are not parameterized but given explicit forms. Hence, it is possible to use some prior knowledge of the dynamics governing the system, such as in SINDy and UDEs, but there is no need to have direct observations of the state variables as in those cases. For example, one could hypothesize that the latent dynamics of a system follow some particular differential model such as Kuramoto or van der Pol. This model then jointly learns the transformation from the data space to a lower-dimensional feature representation and the parameters of the explicit differential equation.

The machine learning techniques are now routinely used for classification and regression of brain states (see Wein et al., 2021 for a review). However, they have much more potential than black-box, data-intensive classifiers. This is because new sequential models are sometimes designed to identify the missing pieces of the puzzle of dynamics. They can also act as generative models and provide a broad potential for testing biophysical and system-level hypotheses. Some of the methods introduced in this section are explained in detail in Kutz (2013) textbook. Moreover, extremely helpful tutorials can be found in Brunton (2011) YouTube^TM channel.

4. Conclusion

The key purpose of this review was to dive into samples of already popular paradigms or the ones authors found most promising for reconstructing neural dynamics with all the special considerations. To achieve this, we sorted the computational models with respect to two indicators: the scale of organization and the level of abstraction (Figure 3).

The scope of our study is broadly generative models of neural dynamics in biophysics, complex systems, and AI with some limitations. This paper is an interdisciplinary study that covers a time span from the mid-twentieth century when the pioneer models like Wilson-Cowan (Wilson and Cowan, 1972), and Hodgkin-Huxley (Hodgkin and Huxley, 1952) arose, up until the recent decade when gigantic brain atlas initiatives, groundbreaking research in ML, and unprecedented computation power became available. Given the rate of publication in the related fields, a systematic review was impossible. Therefore, this paper is a starting point for gaining an eagle-eye view of the current landscape. It is up to the reader to adjust the model scale and abstraction depending on the problem at hand (see Figure 3).

There is established work on formal hypothesis testing and model selection procedure for generating effective connectivity. While the advances in ML literature enable new frontiers of generative models, it is crucial to be aware of standard practices in generative modeling, such as Bayesian model reduction for selecting the model with the priors that fit the data best (Friston et al., 2003). Model inversion is a crucial procedure for model validation and can be helpful in opening the black box of deep neural networks by computing the model evidence and posteriors based on the prior parameters suggested by predominantly data-driven models. Furthermore, the model inversion can be extended to large, continuous, and noisy systems by improving parameter estimation using new optimization tools (Zhuang et al., 2021).

We emphasized the distinctiveness of the problems in computational neuroscience and cognitive science. One key factor is the trade-off of complexity and inductive bias with the availability of data and prior knowledge of the system. While there is still no ultimate recipe yet, hybrid methods could simultaneously tackle explainability, interpretability, plausibility, and generalizability.

Author Contributions

MR-P authored the bulk of the paper. GA significantly contributed to the general idea of the paper, review of the literature on scientific machine learning and phenomenological models, and the writing and feedback process. J-CG-A and VV to the review of the agnostic models. Senior authors directed the idea behind paper and provided feedback and mentoring. The figures are the result of brainstorming with all the authors. All authors contributed to the article and approved the submitted version.

Funding

GD is funded by the Institute for Data Valorization (IVADO), Montréal and the Fonds de recherche du Québec (FRQ). MR-P, J-CG-A, VV, and IR acknowledge the support from Canada CIFAR AI Chair Program and from the Canada Excellence Research Chairs (CERC) program. GA acknowledges the funding from CONICET.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Acknowledgments

The authors are grateful for the discussions and revisions from Hadi Nekoei, Timothy Nest, Quentin Moreau, Eilif Muller, and Guillaume Lajoie.

Footnotes

1. ^Emergence is the manifestation of collective behavior that cannot be deduced from the sum of the behavior of the parts (Johnson, 2002; Krakauer et al., 2017).

2. ^For a more comprehensive overview on types and applications, see Schliebs and Kasabov (2013).

3. ^Note that here, the notion of “phenomena” here is different than that used by e.g., Revonsuo (2006) where phenomenological architecture and properties are regarded as a representation of environment in the first-person mind (Smith, 2018), complementary to “physiological” architecture in the brain-as in Fingelkurts et al. (2009).

References

Abbott, A.. (2020). Documentary follows implosion of billion-euro brain project. Nature 588, 215–216. doi: 10.1038/d41586-020-03462-3

PubMed Abstract | CrossRef Full Text | Google Scholar

Abbott, L., and Kepler, T. B. (1990). “Model neurons: from hodgkin-huxley to hopfield,” in Statistical Mechanics of Neural Networks (Springer), 5–18.