# Generative Models of Brain Dynamics

^{1}Mila-Quebec AI Institute, Montréal, QC, Canada^{2}Université de Montréal, Montréal, QC, Canada^{3}Departamento de Física, Facultad de Ciencias Exactas y Naturales, Instituto de Física de Buenos Aires (IFIBA), CONICET, Universidad de Buenos Aires, Buenos Aires, Argentina^{4}Department of Psychiatry, CHU Sainte-Justine Research Center, Mila-Quebec AI Institute, Université de Montréal, Montréal, QC, Canada

This review article gives a high-level overview of the approaches across different scales of organization and levels of abstraction. The studies covered in this paper include fundamental models in computational neuroscience, nonlinear dynamics, data-driven methods, as well as emergent practices. While not all of these models span the intersection of neuroscience, AI, and system dynamics, all of them do or can work in tandem as generative models, which, as we argue, provide superior properties for the analysis of neuroscientific data. We discuss the limitations and unique dynamical traits of brain data and the complementary need for hypothesis- and data-driven modeling. By way of conclusion, we present several hybrid generative models from recent literature in scientific machine learning, which can be efficiently deployed to yield interpretable models of neural dynamics.

## Introduction

“*What I cannot create I do not understand.”* — Richard Feynman

The explosion of novel data acquisition and computation methods has motivated neuroscientists to tailor these tools for *ad-hoc* problems. While attempts at pattern detection in enormous datasets are commonplace in the literature-representing a logical first step in applying learning algorithms to complex data-such efforts provide little insight into the observed mechanisms and emission properties. As the above quote from R. Feynman suggests, such methods are *understanding* the brain. The importance of developing interpretable algorithms for biological data-beyond the standard “black-box” models of conventional machine learning-is underscored by the pressing need for superior explainability seen in medical and health-related research. To this end, formal modeling (the practice of expressing some dependent variable unequivocally in terms of some other set of independent variables Wills and Pothos, 2012) is the only way for transparent and reproducible theories (Guest and Martin, 2020). In the present review, we propose that a class of architectures known as generative models constitute an emergent set of tools with superior properties for reconstructing segregated and whole-brain dynamics. A generative model may consist of, for example, a set of equations that determine the evolution of the signals from a human patient based on system parameters. In general, generative models have the benefit over black-box models containing inference mechanisms rather than simple predictive capacity.

### Why Prefer Inference Over Prediction?

Put simply: the goal of science is to leverage prior knowledge, not merely to forecast the future (a task well suited to engineering problems), but to answer “why,” questions, and to facilitate the discovery of mechanisms and principles of operation. Bzdok and Ioannidis (2019) discuss why inference should be prioritized over prediction for building a reproducible and expandable body of knowledge. We argue that this priority should be especially respected for clinical neuroscience.

It is important to note that modeling is, and should be, beyond prediction (Epstein, 2008). Not only does explicit modeling allow for explanation (which is the main point of science), but it also directs experiments and allows for the generation of new scientific questions.

In this paper, we demonstrate why focusing on the multi-scale dynamics of the brain is essential for biologically plausible and explainable results. For this goal, we review a large spectrum of computational models for reconstructing neural dynamics developed by diverse scientific fields, such as biological neuroscience (biological models), physics, and applied mathematics (phenomenological models), as well as statistics and computer science (data-driven models). On this path, it is crucial to consider the uniqueness of neural dynamics and the shortcomings of data collection. Neural dynamics are different from other forms of physical time series. In general, neural ensembles diverge from many canonical examples of dynamical systems in the following ways:

### Neural Dynamics Is Different

A neural ensemble is distinctive from the general notion of the dynamical system:

• Unlike chemical oscillations and power grids, the nervous system is a product of biological evolution, which makes it special regarding complexity and organization.

• Like many biophysical systems, it is highly dissipative and functions in non-equilibrium regimes (at least while working as a living organ).

• Although the brain exhibits continuous neuromodulation, the anatomical structure of the brain is encoded in the genome, hence it is essentially determined (Rabinovich et al., 2006).

• There are meaningful similarities in brain activity across species. This is especially good news because, unlike humans, neural properties of less-complicated species are well-characterized (White et al., 1986).

These characteristics help narrow down the search for useful models.

### Neural Data Is Different

Neural recordings—especially of human subjects—are noisy and often scarce. Due to requirements of medical certification, cost of imaging assays, and the challenges with recruitment, acquiring these datasets can be both expensive and time-consuming. Moreover, such data can be difficult to wrangle and contains inconsistent noise -not only across participants, but quite often for a single participant at different times (e.g., artifacts, skin condition, and time resolution in the case of EEG).

### Overview of Generative Models

Our focus is on generative models. Generative modeling can, in the current context, be distinguished from discriminative or classification modeling; in the sense that there is a probabilistic model of how observable data is generated by unobservable latent states. Almost invariably, generative models in imaging neuroscience are state space or dynamic models based upon differential equations or density dynamics (in continuous or discrete state spaces). Generative models can be used in one of two ways: first, they can be used to simulate or generate plausible neuronal dynamics (at multiple scales), with an emphasis on reproducing emergent phenomena of the sort seen in real brains. Second, the generative model can be inverted, given some empirical data, to make inferences about the functional form and architecture of distributed neuronal processing. In this use, the generative model is used as an observation model and is optimized to best explain some data. Crucially, this optimization entails identifying both the parameters of the generative model and its structure, *via* the process of model inversion and selection, respectively. When applied in this context, generative modeling is usually deployed to test hypotheses about functional brain architecture is (or neuronal circuits) using (Bayesian) model selection. In other words, comparing the evidence (a.k.a. marginal likelihood) for one model against some others.

Current generative models fall into three main categories as shown in Figure 1 with respect to their modeling assumption and objective:

1. **Biophysical models:** Biophysical models are *realistic* models which encapsulate biological assumptions and constraints. Due to large number of components and the empirical complexity of the systems modeled, examples of biophysical models run the gamut, from very small, with a high degree of realism (e.g., Hodgkin and Huxley's model of squid giant axon), to large scale (e.g., Izhikevich and Edelman, 2008 model of whole cortex). Due to computational limitations, large-scale models are often accompanied by increasing levels of simplification. Blue Brain Project (Markram, 2006) is an example of this type of modeling.

2. **Phenomenological models:** Analogies and behavioral similarities between neural populations and established physical models open the possibility of using well-developed tools in Statistical Physics and Complex Systems for brain simulations. In such models, some priors of the dynamics are given but not by realistic biological assumptions. A famous example is the model of Kuramoto oscillators (Bahri et al., 2020) in which the goal is to find the parameters that best reconstruct the behavior of the system. These parameters describe the property of the phenomenon (e.g., the strength of the synchrony), although they do not directly express the fabric of the organism.

3. **Agnostic computational:** Data-driven methods that, given a “sufficient” amount of data, can *learn* reconstruct the behavior with little prior knowledge. Examples of such approaches are some self-supervised methods such as latent ODEs (Chen et al., 2018). The term “sufficient” expresses the main limitation of these approaches. Such approaches often need unrealistically large datasets and come with intrinsic biases. In addition, the representation that these models provide can be analytically far from the physics of the system or the phenomenon.

**Figure 1**. Venn diagram of the generative models of interest. Based on the abstraction and assumption, methods might belong to one or more of the three worlds of machine learning, neuroscience, and dynamical systems. This review is structured into three main categories that are in fact, intersections of these fields: biophysical (Section 1), phenomenological (Section 2), and agnostic modeling (Section 3). Tools developed independently in each of these fields can be combined to overcome the limitation of data.

Figure 2 shows an overview of various generative models and the presence in the literature up to this date.

**Figure 2**. Overview of generative models: well-developed models (blue), partially-explored approaches (purple), and modern pathways with little or no literature on neural data (red).

**Key Contributions:** The objective is to bridge a gap in the literature of computational neuroscience, dynamical systems, and AI and to review the usability of the proposed generative models concerning the limitation of data, the objective of the study and the problem definition, prior knowledge of the system, and sets of assumptions (see Figure 2).

## 1. Biophysical Models

Understanding how cognition “emerges” from complex biophysical processes has been one of the main objectives of computational neuroscience. Although inferring high-level cognitive tasks from biological processes is not easily achieved, different biophysical simulations provide some “explanation” of how neural information relates to behavior. Those attempts are motivated by the need for interpretable and biologically-detailed models.

While there is as yet no “unifying theory of neuroscience,” biological neuronal models are being developed at different scales and with different degrees of abstraction (see Figure 3). These models are usually grouped into two main categories:the first represents a “bottom-up” approach, which emphasizes biophysical details for fine-scale simulation and expects the emergence^{1}. An example of this approach is the Blue Brain project (Markram, 2006).

**Figure 3**. Instances of modeling across different levels of the organization and problem dimension. The *conceptual scope* is an indicator of biophysical details incorporated in the model. It determines how the focus of the model is directed toward mechanistic reality or the behavioral output. It is also an indicator of where a given model sits on the Marr's level.

Conversely, “top-down” schemes focus on explicit high-level functions and design frameworks based on some targeted behavior. Each of the two approaches works with a different knowledge domain and has its own pitfalls. The top-down approach can incorporate behavioral insights without concerning itself with hard-to-code biological details to generate high-level observed behavior. Models of this kind do not provide low-level explanations and are prone to biases related to data collection (Srivastava et al., 2020). The bottom-up perspective, on the other hand, benefits from a customized level of biophysical insight. At the same time, its description is not generalizable to behavior, and it can be difficult to scale (thanks to unknown priors and numerous parallel mechanisms). Also, the reductionist approach to complex systems (e.g., the brain) is subject to substantial criticism. In particular, while a reductionist approach can help to examine causality, it is not enough for understanding how the brain maps onto the behavior (Anderson, 1972; Krakauer et al., 2017).

In this section, we review brain models across different scales that are faithful to biological constraints. We focus primarily on the first column from the left in Figure 3, starting from the realistic models with mesoscopic details to more coarse-grained frameworks.

### 1.1. Modeling at the Synaptic Level

The smallest interacting blocks of the nervous system are proteins (van den Heuvel et al., 2019). Genetic expression maps and atlases are useful for discovering the functions of these blocks in the neural circuit (Mazziotta et al., 2000). However, these maps are not uniformly expressed in the brain (Lein et al., 2007). While the expression maps of those proteins continue to unfold (Hawrylycz et al., 2012), combined with connectivity data, they can help quantify dynamics. These maps link the spatial distribution of gene expression patterns, and neural coupling (Richiardi et al., 2015) as well as other large-scale dynamics, e.g., dynamic connectivity as a dependent of neurogenetic profile (Diez and Sepulcre, 2018).

A notable effort in this regard is the Allen Brain Atlas (Jones et al., 2009) in which genomic data of mice, humans, and non-human primates (Hawrylycz et al., 2014) have been collected and mapped for understanding structural and functional architecture of the brain (Gilbert, 2018). While genomic data by itself is valuable for mapping out connectivity in different cell types, a fifth division of AA, *Allen Institute for Neural Dynamics* was recently announced, with the aim of studying the link between neural circuits of laboratory mice and behaviors related to foraging (Chen and Miller, 2021).

On a slightly larger scale, a considerable amount of work concerns the relationship between cellular and intracellular events and neural dynamics. Intracellular events and interactions models could generate accurate responses on small (Dougherty et al., 2005) and large scales (Whittington et al., 1995). Some of these models laid the foundation of computational neuroscience and are reviewed in Section 1.2. In what follows, we start with neurocomputational models at the mesoscale level (realistic models of small groups of neurons, i.e., the top-left corner of Figure 3), after which we move on toward macro-scale levels with different degrees of abstraction.

### 1.2. Basic Biophysics of Neurons: A Quick History

Zooming out from the intra-neuron synaptic level, inter-neuron communication emerges as a principal determinant of the dynamics. Information transmission is mainly based on the emission of action potentials. The mechanism of this flow of ions was first explained by the influential Hodgkin-Huxley equations and corresponding circuits. The electrical current of the equivalent circuit is described by four differential equations that incorporate membrane capacity and the gating variables of the channels (Hodgkin and Huxley, 1952). While the Hodgkin-Huxley model agrees with a wide range of experiments (Patlak and Ortiz, 1985; Traub et al., 1991) and continues to be a reference for models of ion channels, it needs to be simplified to be expandable to the models of the neuronal population. The main difficulty with the Hodgkin-Huxley model is that it requires solving a system of differential equations for each of the gating parameters of each of the single ion channels of a cell while there are more than 300 types of ion channels discovered as of today (Gabashvili et al., 2007). Various relaxing assumptions have been proposed, one of which is to dismiss the time dependence of membrane conductance and the dynamics of the action potential by simply assuming the firing happens when the electrical input accumulated at the membrane exceeds a threshold (Abbott and Kepler, 1990). The latter model is known as integrate-and-fire (Stein and Hodgkin, 1967), and it comes in different flavors depending on the form of nonlinearity assumed for the dynamics of leaky or refractory synapses (Michaels et al., 2016).

To model the interesting dynamics of various ion channels, a model of compartments of dendrites, called the multicompartment model, can be employed. An exclusive review article by Herz et al. (2006) categorizes compartmental models into five groups based on the level of balance and details involved from Hodgkin-Huxley description to black-box.

While the research on hyper-realistic modeling of many neurons continues, other frameworks focus on simulating the biophysics of the population of neurons. In Section 1.3, we pause on the state of large-scale synaptic simulations to show how a change in computational paradigm helps in overcoming some of the limitations inherent in these models. Models of Neural mass and Wilson-Cowan are examples of such alternatives (see Sections 1.3.1, 1.3.2, respectively).

### 1.3. Population-Level Models

Izhikevich and Edelman (2008) describe the first attempt in reconstructing the whole cortex. Their simulation includes a microcircuitry of 22 basic types of neurons with simplified dendrite trees and fewer synapses. The underlying structural data based on the geometry of the white matter is drawn from diffusion tensor imaging (DTI) (Honey et al., 2009) of the human brain. The microcircuitry of the six-layered neocortex was reconstructed based on cats' visual cortex. The spiking dynamics employed in this model comes from Izhikevich (2003) and it is a simplification of the Hodgkin-Huxley model as it outputs the firing rates instead of currents. On a larger scale, some subcortical dynamics (e.g., dopaminergic rewarding from the brainstem) are also implemented.

The significance of this simulation compared to preceding efforts is its inclusion of all cortical regions and some of their interplays in the form of cortico-cortical connections. The researchers also considered synaptic plasticity a significant factor in studying developmental changes such as learning. The model demonstrates several emergent phenomena such as self-sustained spontaneous activity, chaotic dynamics, and avalanches, alongside delta, alpha, and beta waves, and other heterogeneous oscillatory activities similar to those in the human brain.

Complexity aside, the model has its shortcomings, including extreme sensitivity to the initial condition. To address this, the authors suggest studying the population behavior instead of single-cell simulations. Despite all the limits, Izhikevich and Edelman (2008) is the first benchmark of whole-cortex modeling and the foundation of future detailed projects such as Blue Brain project (Markram, 2006) and MindScope (Koch et al., 2014).

Following Izhikevich and Edelman (2008), the Blue Brain Project (Markram, 2006) was founded in 2005 as a biological simulation of synapses and neurons of the neocortical microcircuitry. The ambitious goal was to extend this effort to a whole-brain level and build “the brain in a box.” The initial simulated subject was only a 2*mm* tall and 210μ*m* in radius fragment of the somatosensory cortex of a juvenile rat (~100, 000 neurons). The efforts for further expansion to larger scales, i.e., mouse whole-brain and human-whole brain, are far-fetched by many critics (Abbott, 2020).

Far from the initial promise of “understanding” of the brain, the Blue Brain Project is still far from incorporating the full map of connections (also known as connectome Horn et al., 2014) in the mouse brain, which is still an order of magnitude smaller than the human brain (Frégnac and Laurent, 2014). That being said, acquiring the connectomic map does not necessarily result in a better understanding of function. Note that while the connectomic structure of the roundworm Caenorhabditis elegans nervous system has been entirely constructed since 1986 (White et al., 1986), research is still unable to explain the behavior of the network, e.g., predicting stimuli based on excitation (Koch, 2012). Finally, strong concerns regarding the validity of the experiments rise from the fact that the simulation still does not account for glial cells. Glial cells constitute 90% of the brain cells. They have distinctive mechanisms as they do not output electric impulses (Fields et al., 2014) but are responsible for inactivating and discharging products of neuronal activities which influence the synaptic properties (Henn and Hamberger, 1971) and consequently learning and cognitive processes (Fields et al., 2014). This point of incompleteness sheds extra doubt on the achievability of *brain in silico* from the Human Brain Project.

The above critiques have been called for a revision of the objectives of the Blue Brain project with more transparency. Hence, new strategies such as the division of Allen Institute, MindScope (Hawrylycz et al., 2016), and the Human Brain Project (Amunts et al., 2016) aim for adaptive granularity, more focused research on human data, and pooling of resources through cloud-based collaboration and open science (Fecher and Friesike, 2014). Alternatively, smaller teams developed less resource-intensive simulation tools such as Brian (Stimberg et al., 2019) and NEST (Gewaltig and Diesmann, 2007).

There are several readily-available simulators of large networks of spiking neurons to reconstruct many-neuron biophysics. Brian (Stimberg et al., 2019) is a Python package for defining a customized spiking network. The package can automatically generate the code for simulating a computationally-optimal language (e.g., C++, Python, or Cython). With GPUs available, it can also enable parallelism for faster execution. Brian is more focused on single-compartment models while GENESIS (Bower and Beeman, 2012) and NEURON (Carnevale and Hines, 2006) center around multicompartment cells.

NEST is another popular package for building *ad-hoc* models of spiking neurons with adjusted parameters. These parameters include the spiking rules (such as IF, Hodgkin-Huxley AdEx), networks (topological or random neural networks), synaptic dynamics (plasticity expressions, neuromodulation) (Gewaltig and Diesmann, 2007).

While working with mid-level packages, Technical limitations and the objective of the study should be considered. These include computational efficiency and the code generation pipeline. Interested researchers are encouraged to refer to the review by Blundell et al. (2018) to learn more about the guidelines and proposed solutions.

The steep price of high-resolution computation and the remoteness from high-level cognition can be levitated by replacing the detailed dynamics of single neurons with the collective equations of the population. This dimensionality-reduction strategy is the essence of the neural mass models (David and Friston, 2003), spiking neural network (Vreeken, 2003), and dynamical causal modeling (Friston et al., 2003).

#### 1.3.1. Neural Mass Models

Staying faithful to the biophysical truth of the system can happen at scales larger than a few cells. In other words, by reducing the degrees of freedom, one can reduce a massive collection of individual integrate-and-fire equations (mentioned in Section 1.2) to a functional DE of the probabilistic evolution of the whole population known as Fokker-Planck DE. However, since Fokker-Planck equations are generally high-dimensional and intractable, a complimentary formalism, known as the mean-field approximation, is proposed for finessing the system (Deco et al., 2008).

In statistical physics, the mean-field approximation is a conventional way of lessening the dimensions of a many-body problem by averaging over the degrees of freedom. A well-known classic example is the problem of finding collective parameters (such as pressure or temperature) of a bulk of gas with known microscopic parameters (such as velocity or mass of the particles) by the Boltzmann distribution. The analogy of the classic gas shows the gist of the neural mass model: the temperature is an emergent phenomenon of the gas *ensemble*. Although higher temperatures correspond to higher *average* velocity of the particles, one needs a computational bridge to map microscopic parameters to the macroscopic one(s). To be clear, remember that each particle has many relevant attributes (e.g., velocity, mass, and the interaction force relative to other particles). Each attribute denotes one dimension in the phase space. One can immediately see how this problem can become computationally impossible even for 1*cm*^{3} of gas with ~10^{19} molecules.

The current state of thermodynamics accurately describes the macroscopic behavior of gas, so why not use this approximation to the many-body problems of neuronal populations? The analogous problem for a neural mass model can be described with the single-neuron activity and membrane potential as the microscopic parameter and the state of the neural ensemble in phase space as the macroscopic parameter. The computational bridge is based on Fokker-Planck equations for separate ensembles.

Neural mass models can be used both for understanding the basic principles of neural dynamics and building generative models (Friston, 2008). They can also be generalized to neural fields with wave equations of the states in phase space (Coombes, 2005) as well as other interesting dynamical patterns (Coombes et al., 2014). Moreover, these models are applicable across different scales and levels of granularity from subpopulations to the brain as a whole. This generalizability makes them a good candidate for analysis on different levels of granularity, ranging from modeling the average firing rate to decision-making and seizure-related phase transitions. The interested readers are encouraged to refer to the review in Deco et al. (2008) to see how neural mass models can provide a unifying framework to link single-neuron activity to emergent properties of the cortex. Neural mass and field models build the foundation for many of the large-scale *in-silico* brain simulations (Coombes and Byrne, 2019) and have been deployed in many of the recent computational environments (Ito et al., 2007; Jirsa et al., 2010). Note that the neural mass model can show inconsistency in the limits of synchrony and require complementary adjustments for systems with rich dynamics (Deschle et al., 2021) by mixing with other models of neural dynamics such as Wilson-Cowan (Wilson and Cowan, 1972) as in Coombes and Byrne (2019).

#### 1.3.2. Wilson-Cowan

Wilson-Cowan is a large-scale model of the collective activity of a neural population based on mean-field approximation (see Section 1.3.1). Seemingly the most influential model in computational neuroscience after Hodgkin-Huxley (Hodgkin and Huxley, 1952) is Wilson-Cowan (Wilson and Cowan, 1972) with presently over 3,000 mentions in the literature.

The significance of this work in comparison to its proceedings (e.g., in Beurle, 1956; Anninos et al., 1970) is more than a formal introduction of tools from dynamical systems in neuroscience. This model acknowledges the diversity of synapses by integrating distinct inhibitory and excitatory subpopulations. Consequently, the system is described by two state variables instead of one. Moreover, the model accounts for Dale's principle (Eccles et al., 1954) for a more realistic portrayal. That is to say, each neuron is considered purely inhibitory or excitatory. The four theorems proved in the seminal paper (Wilson and Cowan, 1972) conclude the existence of oscillations as a response to a specific class of stimulus configuration and the exhibition of simple hysteresis for other classes of stimuli.

Wilson-Cowan model lays the foundation for many of the major theoretical advances. Examples of the derivative studies include energy function optimization for formulating associative memory (Hopfield, 1982), artificial neural networks as a special case with binary spiking neurons (Hinton and Sejnowski, 1983), pattern formation (Amari, 1977), brain wave propagation (Roberts et al., 2019), movement preparation (Erlhagen and Schöner, 2002), and Dynamic Causal Modeling (Sadeghi et al., 2020). Other studies also demonstrate the possibility of diverse nonlinear behavior of networks of Wilson-Cowan oscillators (MacLaurin et al., 2018; Wilson, 2019). More detailed extensions are on the way. For example, second-order approximations (El Boustani and Destexhe, 2009) and simulation of intrinsic structures such as spiking-frequency adaptation or depressing synapses (Chen and Miller, 2018). For a comprehensive list of continuations, see Destexhe and Sejnowski (2009).

#### 1.3.3. Spiking Neural Network: Artificial Neural Networks as a Model of Natural Nervous System

With the introduction of neural networks, the idea of implementing neural circuits and biological constraints into artificial neural networks (ANN) gained momentum. McCulloch and Pitts (1943) is an early example that uses ANN with threshold spiking behavior. Despite being oversimplified, their idea formed the basis for a particular type of trainable network known as spiking neural networks (SNN) or biological neural networks (as in Vreeken, 2003). Note that the distinction here with the other forms of spiking networks like Izhinevich's and derivatives (discussed earlier in Section 1.3) is that here we are talking about the networks that demonstrate a function approximation as a deep learning algorithm would do (Box 1).

**Box 1**. Neuromorphic computers: Architectures tailored for spiking networks.

The disparity in energy consumption and computing architecture of biological and silicon neurons are the most important factors that raise eyebrows in assessing *brain-like* algorithms. The brain consumes ~20 watts of power while this amount for a supercomputer is in the order of megawatts (Zhang et al., 2018). This twist verifies that the processing of information in these simulations is far from the biological truth. Apart from the energy consumption gap, the non-Von Neumann architecture of the brain is another discrepancy that stands in the way of realistic brain simulation *in silico*. There is no *Von Neumann bottleneck* in the brain as there is no limitation on throughput as a result of separation of memory and computing unit (Wulf and McKee, 1995). The brain also has other features that are greatly missed in deep networks. These include synaptic plasticity, high parallelism due to a large number of neurons, high connectivity due to a large number of synapses, resilience to degradation, and low speed and frequency of communication, among other things. Although many of the aspects of biological cognition are complicated to reconstruct (e.g., embodiment and social interaction), the research in neuromorphic computing is addressing the disparities above by targeting hardware design (Cai and Li, 2021).

A potential solution for narrowing this computation gap can be sought at the hardware level. An instance of such a dedicated pipeline is neuromorphic processing units (NPU) that are power efficient and take time and dynamics into the equation from the beginning. An NPU is an array of neurosynaptic cores that contain computing models (neurons) and memory within the same unit. In short, the advantage of using NPUs is that they resemble the brain more realistically than a CPU or GPU because of asynchronous (event-based) communication, extreme parallelism (100–1,000,000 cores), and low power consumption (Eli, 2022). Their efficiency and robustness also result from the Physical proximity of the computing unit and memory. Below popular examples of such NPUs are listed. Each of them stemmed from different initiatives.

• **SpiNNaker** or “Spiking Neural Network architecture” is an architecture based on low-power microprocessors and was first introduced in 2005 to help the European Brain Project with computations of large cortical area. The first version could imitate ten thousand spiking neurons and four million synapses with 43 nano Joules of energy per *synaptic event* (Sharp et al., 2012).

• **TrueNorth** chips are arrays of 4,096 neurosynaptic cores amounting to 1 million digital neurons and 256 million synapses. IBM builds TrueNorth primarily as a low-power processor suitable for drones, but it is highly scalable and customizable (Akopyan et al., 2015).

• **Loihi** chips have demonstrated significant performance in optimization problems. Intel's fifth NPUs has incorporated biophysical reconstruction of hierarchical connectivity, dendritic compartments, synaptic delays, reward traces. Its circuit is composed of *dandrite units* (for updating state variables), *axon units* (generating feed for the subsequent cores), and *learning unit* (for updating weights based on customized learning rules) (Davies et al., 2018)

An integrative example of the implementation discussed above is NeuCube. NueCube is a 3D SNN with plasticity that learns the connections among populations from various STBD modulations such as EEG, fMRI, genetic, DTI, MEG, and NIRS. Gene regulatory networks can be incorporated as well if available. Finally, This implementation reproduces trajectories of neural activity. It has more robustness to noise and higher accuracy in classifying STBD than standard machine learning methods such as SVM (Kasabov, 2014).

Beyond biological alikeness, neuromorphic computing has important technical aspects that are missing in conventional compute units and can revolutionize neural data processing. They demonstrate lower latency, power consumption, and high portability required for real-time interpretation. These attributes make them useful for recent signal collectors like wearable EEG. On the other hand, although they have shown to be highly scalable and adaptable, their high cost per bit is a major pitfall (Davies, 2021; Sharifshazileh et al., 2021).

In contrast to deep neural networks, the activity in this architecture (transmission) is not continuous in time (i.e., during each propagation cycle). Instead, the activities are event-based occurrences with the event being the action potential depolarization^{2}. Although ANN architectures that are driven by spiking dynamics have been long used for optimization problems such as pattern recognition (Kasabov, 2007) and classification (Soltic et al., 2008), they lag behind conventional learning algorithms in many tasks, but that is not the end of the story.

Maass (1997) argues that concerning network size, spiking networks are more efficient in computation compared to other types of neural networks such as sigmoidal. Therefore it is worthwhile to implement SNNs in a more agnostic manner as spiking RNNs. Examples of such promising implementations are reservoir computing, liquid, and each state machine. For more on those architectures, see Section 3.1.2.

### 1.4. Brain Atlases: Whole- and Population-Level Modeling

The 21^{st} century has been the bursting era of large-scale brain initiatives. The objective of the simulation partly justifies this multitude. As it was previously mentioned, the notion *simulation* is highly versatile in meaning depending on the goal of the project (de Garis et al., 2010), i.e., where it sits on the Figure 3. Some of the projects of this spectrum are listed below.

• BigBrain: a free-access and few-cell-resolution model of human brain in 3D (Landhuis, 2017).

• Allen Brain Atlas: genome-wide map of gene expression for the human adult and mouse brain (Jones et al., 2009).

• Human Connectome Project: a large-scale structural and functional connectivity map of the human brain (coined as connectome in Sporns et al., 2005; Van Essen et al., 2013).

• Brain Research through Advancing Innovate Neurotechnologies: BRAIN (Devor et al., 2013).

• The Virtual Brain (TVB): an open-source neural dynamics simulator using real anatomical connectivity (Jirsa et al., 2010).

• Human Brain Project (HBP): aimed to realistically simulate the human brain in supercomputers (Miller, 2011).

Scaling compute power does not suffice for leveling up to the whole-brain models. Another challenge is the integration of time delays that become significant at the whole-brain level. In local connections, the time delays are small enough to be ignored (Jirsa et al., 2010) the transmission happens in a variety of finite speeds from 1 to 10 m per second. As a result of this variation, time delays between different brain parts are no longer negligible. Additional spatial features emerge by the implementation of this heterogeneity (Jirsa and Kelso, 2000; Petkoski and Jirsa, 2019).

Larger scale approaches could adapt neural mechanisms that rely on intra-region interactions (da Silva, 1991) in order to ditch the problems related to the synaptic level studies mentioned earlier. The Virtual Brain (TVB) project is one of these initiatives. TVB captures the network dynamics of the brain by stimulating the neural population structural connectivity, the variant time scales, and noise (Sanz Leon et al., 2013). TVB allows testing subject-specific hypotheses as the structural connectivity is based on individual DTI. The large-scale activity is an integration of local neural masses connected through large-range dynamics. It has a web platform GUI and can run on a personal computer and has already implemented many types of dynamics for different types of brain signals, namely EEG, MEG, BOLD, fMRI.

With models like TVB, one should note the shift in paradigm from the fine-scale simulations like Blue Brain. Contrary to the Blue Brain, the nodes consist of large groups of neurons (order of a few millimeters), not one or a few neurons. Consequently, the governing equations are the ones for deriving population dynamics and statistical modes. Another essential point is that TVB allows researchers to study the brain's phenomenology parametrically. The following section is dedicated to such studies.

## 2. Phenomenological Models

In contrast to realistic biological models of *in-vivo* events, phenomenological^{3} models offer a way of qualitative simulation of certain observable behaviors [or, as it is discussed in Dynamical Systems literature (Strogatz, 2018), phase trajectories]. The key assumption is that although short- and long-range dynamics depend on intricate biophysical events, the emerging observables can be encoded in significantly lower dimensions. This dimensionality reduction is thanks to dynamics that are capable of constructing similar statistical features of interest. Since a detailed enough biophysical model should eventually exhibit the same collective statistics, one may argue that the phenomenological models offer a detour to system-level reconstruction by ditching lots of cellular and physiological considerations.

Compared to detailed biophysical models, coarse-grained approaches rely on a smaller set of biological constraints and might be considered “too simplistic.” However, they are capable of reconstructing many collective phenomena that are still inaccessible to hyper-realistic simulations of neurons (Piccinini et al., 2021). A famous example of emergence at this level is synchronizations in cortex (Arenas et al., 2008). Moreover, experiments show that the population-level dynamics that are ignorant about the fine-grained detail *better* explain the behavior (Briggman et al., 2005; Churchland et al., 2012).

The significance of phenomenological models in the reconstruction of brain dynamics is also because of their intuitiveness and reproducibility. They may demonstrate critical properties of the neuronal population. An interesting example is noise-driven dynamics of the brain, which is responsible for multistability and criticality during resting state (Deco and Jirsa, 2012; Deco et al., 2017).

### 2.1. Problem Formulation, Data, and Tools

The idea of using phenomenological models for neural dynamics is mainly motivated by the possibility of using tools from dynamical system theory. The goal is to quantify the evolution of a *state space* built upon the *state variables* of the system. For example, if one can find two population variables (*x, y*) that determine the state of a neural ensemble, then all the possible pairs of *x* and *y*s form the basis for the *state space* of the system, let us call this 2-dimensional space *A*. The state of this ensemble at any given time *t* can always be expressed as a 2-D vector in *A*. In mathematics, *A* is called a *vector space* defined by the sets of differential equations that describe the evolution of *x* and *y* in time. As an intuitive visualization of a vector space, imagine a water swirl: each point of the surface of a water swirl can be represented by a vector with the magnitude and direction of local velocity. One can see how at each point in this space, there is a *flow* that pushes the system in a specific trajectory. Reproducing features of the brain signals or identifying such a sparse state space and the dynamics of a parsimonious set of state variables allows for forecasting the fate of the neural ensemble in future timesteps (Saggio and Jirsa, 2020). The evolution of the state variables is described by differential equations. In what follows in this section, some of the most prominent phenomenological models and their findings are discussed.

Interactions and connectivity can be observed in a wide set of settings from resting-state activity (Piccinini et al., 2021) to task-specific experiments (Pillai and Jirsa, 2017) by various imaging techniques including fMRI (Hutchison et al., 2013), EEG (Atasoy et al., 2018), MEG (Tait et al., 2021), and Calcium imaging (Abrevaya et al., 2021). In order to study segregation and integration of dynamics, networks of brain connectivity need to be constructed based on imaging data. *Brain connectivity* here refers to either *anatomical, functional*, or *effective connectivity* as described in Table 1. The anatomical connectivity is based in the physiological components and the morphology on fiber pathways, functional connectivity represents the correlation of activity between different regions, and the effective connectivity demonstrates the information flow (Sporns, 2007). For a more comprehensive review of such networks, refer to Wein et al. (2021).

**Table 1**. Complex brain networks are measured through Structural Connectivity (SC), Functional Connectivity (FC), or Effective Connectivity (EC). Computational Connectomics is a common ways of formulating structural and functional networks of the whole brain.

#### 2.1.1. Model Selection for Brain Connectivity

Deducing the effective connectivity of functionally-segregated brain regions is crucial in developing bio-plausible and explainable models. It is important to note that while anatomical connectivity rests directly upon data and functional connectivity is based on statistical dependencies in data space; effective connectivity could only be estimated through the inversion of a generative model. In other words, functional connectivity (FC) is data-driven and effective connectivity (EC) is hypothesis-driven, meaning that the FC is derived statistically from spatiotemporal data while EC is not directly observable from imaging and is parameterized as the causal relations among brain regions for different tasks. To find the best descriptive parameters, one needs to test various hypotheses. Table 1 shows examples of formulating brain connectivities. Granger causality is only a validation tool that is used both for optimizing functional and effective connectivity (Valdes-Sosa et al., 2011). Dynamical Causal Modeling (DCM), introduced in Friston et al. (2003), quantitatively generates the connectivities that fit the observed data by maximizing model evidence, aka marginal likelihood of the model (Daunizeau et al., 2011).

DCM can be thought of as a method of finding the optimal parameters of the causal relations that best fit the observed data. The parameters of the connectivity network are (1) anatomical and functional couplings, (2) induced effect of stimuli, and (3) the parameters that describe the influence of the input on the intrinsic couplings. The expectation-maximization (EM) algorithm is the widely-used optimizer. However, EM is slow for large, changing, and/or noisy networks. Zhuang et al. (2021) showed Multiple-Shooting Adjoint Method for Whole-Brain Dynamic outperforming EM on classification tasks while being used for continuous and changing networks.

DCM is, in fact, a method for testing hypotheses and guiding experiments, not a predictive or generative model by itself. Models of the intra-connected regions can be built based on the earlier subsections, e.g., neural mass model, neural fields, or conductance-based models. For a review of such hybrid approaches, see Moran et al. (2013).

Connectivity matrices introduced in Table 1 are the backbone of the information process pipeline. That being said, this parameter needs to be married to the dynamics of the states in the brain. To date, a large portion of studies have focused on mapping these networks onto the resting-state network, and a lot of structure-function questions remained to be answered by studying the task-related data (Cabral et al., 2017). In what follows, the models that quantify these dynamics based on the phenomenology of the behavior are discussed.

#### 2.1.2. Generative Graph Models

Recent progress in the science of complex networks and information theory has paved the way for analytical and numerical models of structural and functional connectivity (Lurie et al., 2020). The network approach to the neural population is a conventional way to study neural processes as information transmission in time-varying networks. This analogy allows for examining the path and behavior of the system in terms of different dynamical properties.

An insightful interplay of function vs. structure is observed along the biologically plausible line of work by Deco and Jirsa (2012). They reconstructed the emergence of equilibrium states around multistable attractors and characteristic critical behavior like scaling-law distribution of inter-brain pair correlations as a function of global coupling parameters. Furthermore, new studies show that synchrony not only depends on the topology of the graph but also on its hysteresis (Qian et al., 2020).

Tools from graph theory and network science (Newman et al., 2006) are used to formulate this relation. Spectral mapping (Becker et al., 2018) and structure-function topological mapping (Liang and Wang, 2017) are proofs of concept in this regard. Generative graph models (traditionally developed by graph theory such as the one for random graph introduced in Erdős and Rényi, 1960) are principle tools of inference in this approach and now have been enhanced by machine learning, see for example, deep-network generative models in Kolda et al. (2014) and Li et al. (2018). Simulations of brain network dynamics and study of controllability (Kailath, 1980) has shown how differently regions are optimized for diverse behavior (Tang et al., 2017).

### 2.2. Inspiration From Statistical Physics and Nonlinear Dynamical Models

In addition to network science, another axis for interpreting neural data is based on well-established tools initially developed for parametrizing the time evolution of physical systems. Famous examples of these systems include spin-glass (Deco et al., 2012), different types of coupled oscillators (Cabral et al., 2014; Abrevaya et al., 2021), and multistable and chaotic many-body systems (Deco et al., 2017; Piccinini et al., 2021). This type of modeling has already offered promising and intuitive results. In the following subsections, we review some of the recent literature on various methodologies.

#### 2.2.1. Brain as a Complex System

It is not easy to define what a complex system is. Haken (2006) defines the *degree of complexity* of a sequence as the minimum length of the program and of the initial data that a Turing machine (aka the universal computer) needs to produce that sequence. Despite being a debatable definition, one can conclude that according to it, the spatiotemporal dynamics of the mammalian brain qualifies as a complex system (Hutchison et al., 2011; Sforazzini et al., 2014). Therefore, one needs a complex mechanism to reconstruct the neural dynamics. In the following few subsections, we review candidate equations for the oscillations in cortical network (Buzsáki and Draguhn, 2004).

##### 2.2.1.1. Equilibrium Solutions and Deterministic Chaos

Whole-brain phenomenological models like the Virtual Brain (Sanz Leon et al., 2013) are conventional generators for reconstructing spontaneous brain activity. There are various considerations to have in mind to choose the right model for the right task. A major trade-off is between the complexity and abstractiveness of the parameters (Breakspear, 2017). In other words, to capture the behavior of detailed cytoarchitectural and physiological make-up with a reasonably-parametrized model. Another consideration is the incorporation of noise, which is a requirement for multistable behavior (Piccinini et al., 2021) i.e., transitions between stable patterns of reverberating activity (aka attractors) in a neural population in response to perturbation (Kelso, 2012).

##### 2.2.1.2. Kuramoto

Kuramoto model is a mathematical descriptor of coupled oscillators, one that can be written down as simple as a system of ODEs solely based on sinusoidal interactions (Kuramoto, 1984; Nakagawa and Kuramoto, 1994). Kuramoto model is widely used in physics for studying synchronization phenomena. It is relevant to neurobiological systems as it enables a *phase reduction approach*: Neural populations can be regarded as similar oscillators that are weakly coupled together. These couplings are parameterized in the model. Kuramoto can be extended to incorporate anatomical and effective connectivity and can expand from a low-level model of few-neuron activity to a stochastic population-level model with partial synchrony and rich dynamical properties. One way to do that is to upgrade the classic linear statistics to nonlinear Fokker-Planck equations (Breakspear et al., 2010).

There is significant literature on Kuramoto models on neural dynamics on different scales and levels. Strogatz (2000) is a conceptual review of decades of research on the principles of the general form of the Kuramoto model. Numerous studies have found consistency between the results from Kuramoto and other classical models in computational neuroscience like Wilson-Cowan (Wilson and Cowan, 1972; Hoppensteadt and Izhikevich, 1998). Kuramoto model is frequently used for quantifying phase synchrony and for controlling unwanted phase transitions in neurological diseases like epileptic seizures and Parkinson's (Boaretto et al., 2019). Still, there are many multistability questions regarding cognitive maladaptation yet to be explored, potentially with the help of Kuramoto models and the maps of effective connectivity. Anyaeji et al. (2021) is a review targeting clinical researchers and psychiatrists. It is a good read for learning about the current challenges that could be formulated as a Kuramoto model. Kuramoto is also unique in adaptability to different scales: from membrane resolution with each neuron acting as a delayed oscillator (Hansel et al., 1995) to the social setting where each subject couples with the other one in the dyad by means of interpersonal interactions (Dumas et al., 2012).

##### 2.2.1.3. Van der Pol

Another model relevant to neuroscience is the van der Pol (VDP) oscillator which is probably the simplest relaxation oscillator (Guckenheimer and Holmes, 2013) and a special case of the FitzHugh-Nagumo model, which is, in turn, a simplification of the Hodgkin-Huxley model (see Section 1.2) (FitzHugh, 1961). Through the Wilson-Cowan approximation (Kawahara, 1980), VDP can also model neural populations. For more information about the Wilson-Cowan model, please see Section 1.3.2. Recently, Abrevaya et al. (2021) have used coupled VDP oscillators to model a low-dimensional representation of neural activity in different living organisms (larval zebrafish, rats, and humans) measured by different brain imaging modalities, such as calcium imaging (CaI) and fMRI. Besides proposing a method for inferring functional connectivity by using the coupling matrices of the fitted models, it was demonstrated that dynamical systems models could be a valuable resource of data augmentation for spatiotemporal deep learning problems.

Looking at the brain as a complex system of interacting oscillators is a detour for expanding the modeling to larger organization scales. The emergent behavior of the system can be described with “order parameters.” Although this is a description with much lower dimensions than the biophysical equations, it still expresses many remarkable phenomena such as phase transitions, instabilities, multiple stable points, metastability, and saddle points (Haken, 2006). However, parametrizing such models is still an ongoing challenge, and many related studies are limited to the resting-state network. The following section reviews the prospects of recent data-driven methods and how they can leverage the study of system-level behavior.

## 3. Agnostic Computational Models

Jim Garys's framework (Hey, 2009) divides the history of science into four paradigms. Since centuries ago, there have been experimental and theoretical paradigms. Then the phenomena of interest became too complicated to be quantified analytically, so the computational paradigms started with the rise of numerical estimations and simulations. Today, with the bursting advances in recording, storage, and computation capacity of neural signals, neuroscience is now exploring the *fourth paradigm* of Jim Garys's framework (Hey, 2009) i.e., data exploration in which the scientific models are fit to the data by learning algorithms.

In the introduction, we reflected on how scientists should not settle for mere prediction. While the literature on data-driven methods is enormous, this review focuses mainly on the strategies that help gain mechanistic insights rather than those that reproduce data through operations that are difficult to relate to biological knowledge. Instances of these unfavored methods include strict generative adversarial networks with uninterpretable latent spaces or black-box RNN with hard-to-explain parameters. The following section categorizes these methods into established and emerging techniques and discusses some showcases.

### 3.1. Established Learning Models

Data-driven models have long been used in identifying structure-function relations (similar to the ones mentioned in Table 1) (McKeown et al., 1998; Koppe et al., 2019). The shift of studies from single-neuron to networks of neurons, has accelerated in the last decade. This trend is because relying on collective properties of a population of neurons to infer behavior seems more promising than reconstructing the physiological activity of single neurons in hopes of achieving emergence. Yuste (2015) argues that the mere representations that relate the state of individual neurons to a higher level of activity have serious shortcomings (Michaels et al., 2016). However, these shortcomings can be addressed by incorporating temporal dynamics and collective measures into the model. We review the models that satisfy this consideration.

#### 3.1.1. Dimensionality Reduction Techniques

Clustering and unsupervised learning are useful for mapping inputs (* X*) to features (

*). Later, this set of (*

**Y***,*

**X***) can be extrapolated to unseen data. There are various methods for identifying this mapping or, in other words, for approximating this function. Principal Component Analysis (PCA) is a primary one. PCA maps data onto a subspace with the maximal variance (Markopoulos et al., 2014). It is a common method of dimensionality reduction. However, the orthogonal set of features found by this method are not necessarily statistically-dependent. Therefore, they are not always helpful in finding sources and effective connectivity. Alternatively, Independent Component Analysis, commonly known as ICA, was introduced as a solution to the Blind Source Separation (BSS) problem. Each sample of the data is an ensemble of the state of different sources. However, the characteristics of these sources are the hidden variable (Pearlmutter and Parra, 1997). ICA is effective in finding the related source as it maps the data onto the feature space by minimizing the statistical independence for each feature rather than by minimizing the variance. Conventional use of component analysis is with fMRI and EEG recordings. In each time window, each sensor receives a noisy mix of activities in segregated brain regions. One is usually interested in inferring effective connectivity based on such data. Having a large number of electrodes around the scalp enables ICA to identify the independent sources of activities and artifacts. ICA algorithms come in different flavors depending on the dataset and the property of interest. For example temporal- (Calhoun et al., 2001), spatial- (McKeown et al., 1998), and spatiotemporal-ICA (Wang et al., 2014; Goldhacker et al., 2017) are tailored for different types of sampling. Hybrid approaches, e.g., ICA amalgamated with structural equation modeling (SEM), have shown better performance in given setups with less prior knowledge than SEM alone (Rajapakse et al., 2006). The interested reader is encouraged to refer to Calhoun and Adali (2012) for a dedicated review of ICA methods.*

**Y**#### 3.1.2. Recurrent Neural Networks

Recurrent neural networks (RNN) are the Turing-complete (Kilian and Siegelmann, 1996) algorithms for learning dynamics and are widely used in computational neuroscience. In a nutshell, RNN processes data by updating a “state vector.” The state vector holds the memory across steps in the sequence. This state vector contains long-term information of the sequence from the past steps (LeCun et al., 2015).

Current studies validate diverse types of RNNs as promising candidates for generating neural dynamics. Sherstinsky (2020) shows how the implicit “additive model,” which evolves the state signal, incorporates some of the interesting bio-dynamical behavior such as saturation bounds and the effects of time delays. Several studies modeled the cerebellum as an RNN with granular (Buonomano and Mauk, 1994; Medina et al., 2000; HofstoÈtter et al., 2002; Yamazaki and Tanaka, 2005) or randomly-connected layers (Yamazaki and Tanaka, 2007). Moreover, similarities of performance and adaptability to limited computational power (as in biological systems) are observed both in recurrent convolutional neural networks and in the human visual cortex (Spoerer et al., 2020).

RNNs vary greatly in architecture. The choice of architecture can be implied by the output of interest (for example text Sutskever et al., 2011 vs. natural scenes Socher et al., 2011) or the approaches to overcome the problem with vanishing and exploding gradient (e.g., long short-term memory (LSTM) Hochreiter and Schmidhuber, 1997, hierarchical Hihi and Bengio, 1995, or gated RNNs Chung et al., 2014).

##### 3.1.2.1. Hopfield

Hopfield network (Hopfield, 1982) is a type of RNN inspired by the dynamics of Ising model (Brush, 1967; Little, 1974). In the original Hopfield mechanism, the units are threshold (McCulloch and Pitts, 1943) neurons, connected in a recurrent fashion. The state of the system is described by a vector *V* which represents the states of all units. In other words, the network is in fact, an undirected graph of artificial neurons. The strength of connection between units *i* and *j* is described by *w*_{ij} which is trained by a given learning rule i.e., commonly Storkey (Storkey, 1997) or Hebbian rule (stating that “neurons that fire together, wire together”) (Hebb, 1949). After the training, these weights are set, and an energy landscape is defined as a function of *V*. The system evolves to minimize the energy and moves toward the basin of the closest attractor. This landscape can exhibit the stability and function of the network (Yan et al., 2013).

The Hopfield model can accommodate some biological assumptions and work in tandem with cortical realizations. Similar to the human brain, Hopfield connections are mostly symmetric. Most importantly, since its appearance, it has been widely used for replicating associative memory. However, soon it was revealed that other dynamical phenomena like cortical oscillations and stochastic activity (Wang, 2010) need to be incorporated in order to capture a comprehensive image of cognition.

##### 3.1.2.2. LSTM

In addition to the problem of vanishing and exploding gradients, other pitfalls also demand careful architecture adjustment. Early in the history of deep learning, RNNs demonstrated poor performance on sequences with long-term dependencies (Schmidhuber, 1992). Long short term memory (LSTM) is specifically designed to resolve this problem. The principle difference of LSTM and vanilla RNN is that instead of a single recurrent layer, it has a “cell” composed of four layers that interact with each other through three gates: input gate, output gate and forget gate. These gates control the flow of old and new information in the “cell state” (Hochreiter and Schmidhuber, 1997). On certain scales of computation, LSTM still has considerable performance compared to trendy sequential models like transformers.

##### 3.1.2.3. Reservoir Computing

A reservoir computer (RC) (Maass et al., 2002) is an RNN with a reservoir of interconnected spiking neurons. Broadly speaking, the distinction of RC among RNNs, in general, is the absence of granular layers between input and output. RCs themselves are dynamical systems that help learn the dynamics of data. Traditionally, the units of a reservoir have nonlinear activation functions that allow them to be universal approximators. Gauthier et al. (2021) show that this nonlinearity can be consolidated in an equivalent *nonlinear vector autoregressor*. With the nonlinear activation function out of the picture, the required computation, data, and metaparameter optimization complexity are significantly reduced, the interpretability is consequently improved while the performance stays the same.

##### 3.1.2.4. Liquid State Machine

LSM can be thought of as an *RNN soup* that maps the input data to a higher dimension that more explicitly represents the features. The word *liquid* come from the analogy of a stone (here an input) dropping into the water (here a spiking network) and propagating waves. LSM maintains intrinsic memory and can be simplified so much that it processes real-time data (Polepalli et al., 2016). Zoubi et al. (2018) shows LSM performs notably in building latent space of EEG data (extendable to fMRI). As for the faithfulness to the biological truth, Several studies argue that LSM surpasses RNNs with granular layers in matching organization and circuitry of cerebellum (Yamazaki and Tanaka, 2007) and cerebral cortex (Maass et al., 2002). Lechner et al. (2019) demonstrate the superiority of a biologically-designed LTM on given accuracy benchmarks to other ANNs, including LSTM.

##### 3.1.2.5. Echo State Network

ESN works as a tunable frequency generator developed by Maass et al. (2002) at the same time and with similar fundamentals as LSM but independent of that. The idea is to have the input induce nonlinear responses in the neurons of a large reservoir and train linear combinations of these responses to produce the desired output. ESN used to be one of the gold standards of nonlinear dynamics modeling before 2010 (Jaeger and Haas, 2004; Jaeger et al., 2007). The first significance of ESN is due to training with linear regression, enabling easy implementation and freedom from gradient decent problems (e.g., bifurcation and vanishing/exploding gradient). With the vast development of deep learning, this feature is no longer a remarkable advantage. However, ESN is still a plausible architecture for non-digital computation substrates such as neuromorphic hardware (Jaeger, 2007; Bürger et al., 2015).

##### 3.1.2.6. Physically-Informed RNN

A prominent factor in determining the dynamical profile of the brain is the intrinsic time delays (Chang et al., 2018). Integrating these time delays into artificial networks was initially an inspiration from neuroscience for AI. Later, they came back as a successful tool for integrated sequence modeling for multiple populations. In the last decade, RNN has been used for reconstructing neural dynamics via interpretable latent space in different recording modalities such as fMRI (Koppe et al., 2019) and Calcium imaging (Abrevaya et al., 2021).

Continuity of time is another extension that can make RNNs more compatible with various forms of sampling and thus neural dynamics from spikes to oscillations. Continous time RNNs (CT-RNNs) are RNNs with activation functions made up of differential equations. They have been proved to be universal function approximators (Funahashi and Nakamura, 1993) and have surfaced recently in the literature as reservoir computers (Verstraeten et al., 2007; Gauthier et al., 2021) and liquid time-constant neural networks (Hasani et al., 2020).

Essentially, finding the optimal architecture and hyperparameters for a given problem does not have a straightforward recipe. The loss function in a deep neural network can be arbitrarily complex and usually takes more than a convex optimization. Li et al. (2018) shows how parameters of the network can change the loss landscape and trainability. Another more specific issue to the algorithms trained on a temporal sequence is catastrophic forgetting and attention bottleneck. These complications arise from the limitation of memory and attention to the past time steps. New attention models such as transformers and recurrent independent mechanisms (see Sections 3.1.4, 3.1.5, respectively) are specifically built to address these issues. As memory-enhanced components, RNN layers appear in other deep and shallow architectures with sequential data as input, including encoder-decoders.

#### 3.1.3. Variational Autoencoder

Variational autoencoder (VAE for short) is a type of neural network that encodes the ground truth as the input onto a “latent space” and then decodes that space for reconstructing the input (Kingma and Welling, 2013). The network is parameterized by minimizing the reconstruction loss, which is, in this case, a metric of information gained by a metric called Kullback-Leibler divergence (Kullback and Leibler, 1951). This metric is also known as variational free energy or evidence lower bound (ELBO). It is the same objective function used in dynamic causal modeling (Winn et al., 2005).

An example of VAE used for regenerating dynamics is by Perl et al. (2020) in which the coupling dynamics of the whole brain and the transitions between the states of wake-sleep progression is generated. The goal is to find low (e.g., as low as 2-) dimensional manifolds that can capture the signature structure-function relationship that demonstrates the stage in the wake-sleep cycle (Vincent et al., 2007; Barttfeld et al., 2015) and the parameters of generic coupled Stuart-Landau oscillators as in Deco et al. (2017). An idea for regenerating dynamics is to use a deep-network embedded differential equations (as in Section 3.2.2) in the latent VAE structure (Chen et al., 2018).

#### 3.1.4. Transformers

The transformer is a relatively new class of ML models that recently has shown state-of-the-art performance on sequence modeling such as natural language processing (NLP) field of research (Vaswani et al., 2017). Beyond NLP, this architecture demonstrates good performance on a wide variety of data, including brain imaging (Kostas et al., 2021; Song et al., 2021; Sun et al., 2021). Similar to RNNs, transformers aim to process sequential data such as natural language or temporal signals. It differs from the RNN paradigm because it does not process the data sequentially; instead, it looks at whole sequences with a mechanism called “attention,” and by doing so, it alleviates the problem of forgetting long dependencies, which is common in RNN and LSTMs. This mechanism can make both long- and short-term connections between points in the sequence and prioritize them. Transformers are widely used for generating *foundation models* (i.e., models that are pretrained on big data Bommasani et al., 2021) and they can outperform recurrent networks like LSTM with large models/data (Kaplan et al., 2020).

#### 3.1.5. Recurrent Independent Mechanisms

Recurrent independent mechanisms (RIM) are a form of attention model that learns and combines independent mechanisms to boost generalizability and robustness in executing a *task*. The task in the sense of signal processing can be generating a sequence based on the observed data. The hypothesis is that the dynamics can be learned as a sparse modular structure. In this recurrent architecture, each module independently specializes in a particular mechanism. Then all the RIMs compete through an attention bottleneck so that only the most relevant mechanisms get activated to communicate sparsely with others to perform the task (Goyal et al., 2020).

### 3.2. Hybrid Approaches, Scientific ML, and the New Frontiers

The independence from prior knowledge sounds interesting as it frees the methodology from inductive biases and makes the models more generalizable by definition. However, this virtue comes at the cost of a need for large training sets. In other words, the trade-off of bias and computation should be considered: Applying lots of prior knowledge and inductive biases result in a lesser need for data and computation. In contrast, little to no inductive bias calls for a great need for big and curated data. It is true that with the advancement of recording techniques, the scarcity of data is less of a problem than it was before, but even with all these advances, having *clean* and *sufficiently large* medical dataset that helps with the problem in hand is not guaranteed.

Total reliance on data is especially questionable when the data has significant complications (as discussed in the introduction). Opting for a methodology guided by *patterns* rather than *prior knowledge* is problematic in particular when the principle patterns of data arise from uninteresting phenomena such as the particular way a given facility may print out the brain images (Ng, 2021).

The thirst for data aside, one of the prominent drawbacks of agnostic modeling and ML, in particular, is that they are famous for providing *opaque blackbox* solutions, meaning that by leaving biological priors out of the picture, the explainability of the outcome is weak. Lack of explainability is a pet peeve for people in science as they are interested in both prediction and the reasoning behind those predictions.

In addition to the implicit assumption of the adequacy of training data, the explicit assumption that these models rely on is that the solution is parsimonious, i.e., there are few descriptive parameters. Despite some possibility of error with this assumption in given problems (Su et al., 2017), it is particularly useful in having arbitrarily less complicated descriptions that are generalizable, interpretable, and less prone to overfitting.

The following sections describe general function approximators that could identify data dynamics without injecting any prior knowledge about the system. They could provide a perfect solution for a well-observed system with unknown dynamics. Although some of neural ODE methods have already been applied to fMRI and EEG data (Zhuang et al., 2021), other deep architectures such as GOKU-net and latent ODEs are new frontiers.

#### 3.2.1. Sparse Identification of Nonlinear Dynamics

Kaheman et al. (2020) proposed a novel approach for quantifying underlying brain dynamics. The key assumption is that the governing multi-dimensional principles can be derived by a system of equations describing the first-order rate of change. In order to use sparse regression methods such as Sparse Identification of Nonlinear Dynamics (SINDy), one needs to precisely specify the set of parsimonious state variables (Quade et al., 2018). That being said, SINDy does not work for small datasets. If it is given fewer data than possible terms, the system of governing equations is underspecified. Therefore, the underfitting as a result of insufficient training data is a secondary problem. One approach to address this issue is incorporating the known terms and dismissing the learning for those parts. An example is discussed in Section 3.2.3.

#### 3.2.2. Differential Equations With Deep Neural Networks

A relatively new class of dynamical frameworks combines differential equations with machine learning in a more explicit fashion. It is noteworthy to mention that by neural ODE here, we are referring to the term used in Chen et al. (2018). Neural ODEs are a family of deep neural networks that learn the governing differential equations of the system, not to be confused with the differential equations of neuronal dynamics. This class of frameworks has been used to model the dynamics of time-varying signals. They begin by assuming that the underlying dynamics follow a differential equation. They can then be used to discover the parameters of that differential equation by using standard optimization of deep neural networks. As is evident, such formulations are quite useful in modeling and analyzing brain dynamics, especially using deep networks. Below we describe some of the relevant works in this subfield.

##### 3.2.2.1. Neural Ordinary Differential Equations

Combining ordinary differential equations (ODEs) with deep neural networks has recently emerged as a feasible method of incorporating differentiable physics into machine learning. A Neural Ordinary Differential Equation (Neural ODE) (Chen et al., 2018) uses a parametric model as the differential function in an ODE. This architecture can learn the dynamics of a process without explicitly stating the differential function, as has been done previously in different fields. Instead, standard deep learning optimization techniques could be used to train a parameterized differential function that can accurately describe the dynamics of a system. In the recent past, this has been used to infer the dynamics of various time-varying signals with practical applications (Chen et al., 2018; Jia and Benson, 2019; Kanaa et al., 2019; Rubanova et al., 2019; Yildiz et al., 2019; Kidger et al., 2020; Li et al., 2020; Liu et al., 2020).

##### 3.2.2.2. Latent ODE

A dynamic model such as the Neural ODE can be incorporated in an encoder-decoder framework, resembling a Variational Auto-Encoder, as mentioned in Chen et al. (2018). Such models assume that latent variables can capture the dynamics of the observed data. Previous works (Chen et al., 2018; Kanaa et al., 2019; Rubanova et al., 2019; Yildiz et al., 2019) have successfully used this framework to define and train a generative model on time series data.

##### 3.2.2.3. Stochastic Neural ODEs

Parametric models can also be incorporated into stochastic differential equations to make Neural Stochastic Differential Equations (Neural SDEs) (Li et al., 2020; Liu et al., 2020). Prior works have also introduced discontinuous jumps (Jia and Benson, 2019) in the differential equations.

##### 3.2.2.4. Neural Controlled Differential Equations

Furthermore, latent ODE models can add another layer of abstraction. The observed data is assumed to be regularly/irregularly sampled from a continuous stream of data, following the dynamics described by a continuously changing hidden state. Both the dynamics of the hidden state and the relationship between the interpolated observations and the hidden state can be described by neural networks. Such systems are called Neural Control Differential Equations (Neural CDE) (Kidger et al., 2020). Broadly speaking, they are the continuous equivalent of RNNs.

#### 3.2.3. Differential Equations Enhanced by Deep Neural Networks

The above methods use deep neural networks to define the differential function in ordinary differential equations. In contrast, UDEs and GOKU-nets (described below) take the help of deep neural networks to enhance differential equations. UDEs replace only the unknown parts of a known partial differential equation, while GOKU-nets use explicit differential equations as part of deep neural network pipelines.

##### 3.2.3.1. Universal Differential Equations

Universal Differential Equations (UDE) offer an alternate way of incorporating neural networks into differential equations while accounting for prior knowledge. In their seminal work, Rackauckas et al. (2020) demonstrate how it is possible to aid a partial differential equation model by learning the unknown terms with universal approximators such as neural networks. Furthermore, they show how by combining this approach with a symbolic regression, such as SINDy, these models can accelerate the discovery of dynamics in limited data with significant accuracy.

##### 3.2.3.2. Generative ODE Modeling With Known Unknowns

Another promising approach is the case of the Generative ODE Modeling with Known Unknowns, aka GOKU-nets (Linial et al., 2021). GOKU-net consists of a variational autoencoder structure with ODEs inside. In contrast with Latent ODEs, here, the ODEs are not parameterized but given explicit forms. Hence, it is possible to use some prior knowledge of the dynamics governing the system, such as in SINDy and UDEs, but there is no need to have direct observations of the state variables as in those cases. For example, one could hypothesize that the latent dynamics of a system follow some particular differential model such as Kuramoto or van der Pol. This model then jointly learns the transformation from the data space to a lower-dimensional feature representation and the parameters of the explicit differential equation.

The machine learning techniques are now routinely used for classification and regression of brain states (see Wein et al., 2021 for a review). However, they have much more potential than black-box, data-intensive classifiers. This is because new sequential models are sometimes designed to identify the missing pieces of the puzzle of dynamics. They can also act as generative models and provide a broad potential for testing biophysical and system-level hypotheses. Some of the methods introduced in this section are explained in detail in Kutz (2013) textbook. Moreover, extremely helpful tutorials can be found in Brunton (2011) YouTube^{TM} channel.

## 4. Conclusion

The key purpose of this review was to dive into samples of already popular paradigms or the ones authors found most promising for reconstructing neural dynamics with all the special considerations. To achieve this, we sorted the computational models with respect to two indicators: the scale of organization and the level of abstraction (Figure 3).

The scope of our study is broadly generative models of neural dynamics in biophysics, complex systems, and AI with some limitations. This paper is an interdisciplinary study that covers a time span from the mid-twentieth century when the pioneer models like Wilson-Cowan (Wilson and Cowan, 1972), and Hodgkin-Huxley (Hodgkin and Huxley, 1952) arose, up until the recent decade when gigantic brain atlas initiatives, groundbreaking research in ML, and unprecedented computation power became available. Given the rate of publication in the related fields, a systematic review was impossible. Therefore, this paper is a starting point for gaining an eagle-eye view of the current landscape. It is up to the reader to adjust the model scale and abstraction depending on the problem at hand (see Figure 3).

There is established work on formal hypothesis testing and model selection procedure for generating effective connectivity. While the advances in ML literature enable new frontiers of generative models, it is crucial to be aware of standard practices in generative modeling, such as Bayesian model reduction for selecting the model with the priors that fit the data best (Friston et al., 2003). Model inversion is a crucial procedure for model validation and can be helpful in *opening the black box* of deep neural networks by computing the model evidence and posteriors based on the prior parameters suggested by predominantly data-driven models. Furthermore, the model inversion can be extended to large, continuous, and noisy systems by improving parameter estimation using new optimization tools (Zhuang et al., 2021).

We emphasized the distinctiveness of the problems in computational neuroscience and cognitive science. One key factor is the trade-off of complexity and inductive bias with the availability of data and prior knowledge of the system. While there is still no ultimate recipe yet, hybrid methods could simultaneously tackle explainability, interpretability, plausibility, and generalizability.

## Author Contributions

MR-P authored the bulk of the paper. GA significantly contributed to the general idea of the paper, review of the literature on scientific machine learning and phenomenological models, and the writing and feedback process. J-CG-A and VV to the review of the agnostic models. Senior authors directed the idea behind paper and provided feedback and mentoring. The figures are the result of brainstorming with all the authors. All authors contributed to the article and approved the submitted version.

## Funding

GD is funded by the Institute for Data Valorization (IVADO), Montréal and the Fonds de recherche du Québec (FRQ). MR-P, J-CG-A, VV, and IR acknowledge the support from Canada CIFAR AI Chair Program and from the Canada Excellence Research Chairs (CERC) program. GA acknowledges the funding from CONICET.

## Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

## Publisher's Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

## Acknowledgments

The authors are grateful for the discussions and revisions from Hadi Nekoei, Timothy Nest, Quentin Moreau, Eilif Muller, and Guillaume Lajoie.

## Footnotes

1. ^Emergence is the manifestation of collective behavior that cannot be deduced from the sum of the behavior of the parts (Johnson, 2002; Krakauer et al., 2017).

2. ^For a more comprehensive overview on types and applications, see Schliebs and Kasabov (2013).

3. ^Note that here, the notion of “phenomena” here is different than that used by e.g., Revonsuo (2006) where phenomenological architecture and properties are regarded as a representation of environment in the first-person mind (Smith, 2018), complementary to “physiological” architecture in the brain-as in Fingelkurts et al. (2009).

## References

Abbott, A.. (2020). Documentary follows implosion of billion-euro brain project. *Nature* 588, 215–216. doi: 10.1038/d41586-020-03462-3

Abbott, L., and Kepler, T. B. (1990). “Model neurons: from hodgkin-huxley to hopfield,” in *Statistical Mechanics of Neural Networks* (Springer), 5–18.

Abrevaya, G., Dumas, G., Aravkin, A. Y., Zheng, P., Gagnon-Audet, J.-C., Kozloski, J., et al. (2021). Learning brain dynamics with coupled low-dimensional nonlinear oscillators and deep recurrent networks. *Neural Comput*. 33, 2087–2127. doi: 10.1162/neco_a_01401

Akopyan, F., Sawada, J., Cassidy, A., Alvarez-Icaza, R., Arthur, J., Merolla, P., et al. (2015). Truenorth: Design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip. *IEEE Trans. Comput. Aided Design Integrat. Circ. Syst*. 34, 1537–1557. doi: 10.1109/TCAD.2015.2474396

Amari, S.-I.. (1977). Dynamics of pattern formation in lateral-inhibition type neural fields. *Biol. Cybern*. 27, 77–87. doi: 10.1007/BF00337259

Amunts, K., Ebell, C., Muller, J., Telefont, M., Knoll, A., and Lippert, T. (2016). The human brain project: creating a european research infrastructure to decode the human brain. *Neuron* 92, 574–581. doi: 10.1016/j.neuron.2016.10.046

Anninos, P., Beek, B., Csermely, T., Harth, E., and Pertile, G. (1970). Dynamics of neural structures. *J. Theor. Biol*. 26, 121–148. doi: 10.1016/S0022-5193(70)80036-4

Anyaeji, C. I., Cabral, J., and Silbersweig, D. (2021). On a quantitative approach to clinical neuroscience in psychiatry: lessons from the kuramoto model. *Harv. Rev. Psychiatry* 29, 318–326. doi: 10.1097/HRP.0000000000000301

Arenas, A., Diaz-Guilera, A., Kurths, J., Moreno, Y., and Zhou, C. (2008). Synchronization in complex networks. *Phys. Rep*. 469, 93–153. doi: 10.1016/j.physrep.2008.09.002

Atasoy, S., Deco, G., Kringelbach, M. L., and Pearson, J. (2018). Harmonic brain modes: a unifying framework for linking space and time in brain dynamics. *Neuroscientist* 24, 277–293. doi: 10.1177/1073858417728032

Bahri, Y., Kadmon, J., Pennington, J., Schoenholz, S. S., Sohl-Dickstein, J., and Ganguli, S. (2020). Statistical mechanics of deep learning. *Ann. Rev. Condensed Matter Phys*. 11, 501–528. doi: 10.1146/annurev-conmatphys-031119-050745

Barttfeld, P., Uhrig, L., Sitt, J. D., Sigman, M., Jarraya, B., and Dehaene, S. (2015). Signature of consciousness in the dynamics of resting-state brain activity. *Proc. Natl. Acad. Sci. U.S.A*. 112, 887–892. doi: 10.1073/pnas.1418031112

Becker, C. O., Pequito, S., Pappas, G. J., Miller, M. B., Grafton, S. T., Bassett, D. S., et al. (2018). Spectral mapping of brain functional connectivity from diffusion imaging. *Sci. Rep*. 8, 1–15. doi: 10.1038/s41598-017-18769-x

Beurle, R. L.. (1956). Properties of a mass of cells capable of regenerating pulses. *Philos. Trans. R. Soc. Londo. B Biol. Sci*. 240, 55–94. doi: 10.1098/rstb.1956.0012

Blundell, I., Brette, R., Cleland, T. A., Close, T. G., Coca, D., Davison, A. P., et al. (2018). Code generation in computational neuroscience: a review of tools and techniques. *Front. Neuroinform*. 12, 68. doi: 10.3389/fninf.2018.00068

Boaretto, B., Budzinski, R., Prado, T., Kurths, J., and Lopes, S. (2019). Protocol for suppression of phase synchronization in hodgkin–huxley-type networks. *Physica A*. 528, 121388. doi: 10.1016/j.physa.2019.121388

Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., et al. (2021). On the opportunities and risks of foundation models. *arXiv preprint arXiv:2108.07258*. doi: 10.48550/arXiv.2108.07258

Bower, J. M., and Beeman, D. (2012). *The Book of GENESIS: Exploring Realistic Neural Models With the GEneral NEural SImulation System*. Springer Science &Business Media.

Breakspear, M.. (2017). Dynamic models of large-scale brain activity. *Nat. Neurosci*. 20, 340–352. doi: 10.1038/nn.4497

Breakspear, M., Heitmann, S., and Daffertshofer, A. (2010). Generative models of cortical oscillations: neurobiological implications of the kuramoto model. *Front. Hum. Neurosci*. 4, 190. doi: 10.3389/fnhum.2010.00190

Briggman, K. L., Abarbanel, H. D., and Kristan, W. B. (2005). Optical imaging of neuronal populations during decision-making. *Science* 307, 896–901. doi: 10.1126/science.1103736

Brunton, S.. (2011). *Steve Brunton'S Youtube Channel*. Available online at: https://www.youtube.com/c/Eigensteve/featured.

Brush, S. G.. (1967). History of the lenz-ising model. *Rev. Mod. Phys*. 39, 883. doi: 10.1103/RevModPhys.39.883

Buonomano, D. V., and Mauk, M. D. (1994). Neural network model of the cerebellum: temporal discrimination and the timing of motor responses. *Neural Comput*. 6, 38–55. doi: 10.1162/neco.1994.6.1.38

Bürger, J., Goudarzi, A., Stefanovic, D., and Teuscher, C. (2015). “Hierarchical composition of memristive networks for real-time computing,” in *Proceedings of the 2015 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH1́5)* (Boston, MA: IEEE), 33–38.

Buzsáki, G., and Draguhn, A. (2004). Neuronal oscillations in cortical networks. *Science* 304, 1926–1929. doi: 10.1126/science.1099745

Bzdok, D., and Ioannidis, J. P. A. (2019). Exploration, inference, and prediction in neuroscience and biomedicine. *Trends Neurosci.* 42, 251—262. doi: 10.1016/j.tins.2019.02.001

Cabral, J., Kringelbach, M. L., and Deco, G. (2017). Functional connectivity dynamically evolves on multiple time-scales over a static structural connectome: models and mechanisms. *Neuroimage* 160, 84–96. doi: 10.1016/j.neuroimage.2017.03.045

Cabral, J., Luckhoo, H., Woolrich, M., Joensson, M., Mohseni, H., Baker, A., et al. (2014). Exploring mechanisms of spontaneous functional connectivity in meg: how delayed network interactions lead to structured amplitude envelopes of band-pass filtered oscillations. *Neuroimage* 90, 423–435. doi: 10.1016/j.neuroimage.2013.11.047

Cai, Z., and Li, X. (2021). “Neuromorphic brain-inspired computing with hybrid neural networks,” in *2021 IEEE International Conference on Artificial Intelligence and Industrial Design (AIID)* (uangzhou,: IEEE), 343–347.

Calhoun, V. D., and Adali, T. (2012). Multisubject independent component analysis of fmri: a decade of intrinsic networks, default mode, and neurodiagnostic discovery. *IEEE Rev. Biomed. Eng*. 5, 60–73. doi: 10.1109/RBME.2012.2211076

Calhoun, V. D., Adali, T., Pearlson, G., and Pekar, J. J. (2001). Spatial and temporal independent component analysis of functional mri data containing a pair of task-related waveforms. *Hum. Brain Mapp*. 13, 43–53. doi: 10.1002/hbm.1024

Chang, B., Meng, L., Haber, E., Ruthotto, L., Begert, D., and Holtham, E. (2018). “Reversible architectures for arbitrarily deep residual neural networks,” in *Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32*.

Chen, B., and Miller, P. (2018). *Attractors in Networks of Bistable Neuronal Units with Depressing Synapses*.

Chen, B., and Miller, P. (2021). *Announcing the Allen Institute for Neural Dynamics, A New Neuroscience Division of the Allen Institute*. Available online at: https://bit.ly/30SFpqP

Chen, R. T. Q., Rubanova, Y., Bettencourt, J., and Duvenaud, D. (2018). Neural ordinary differential equations. *Adv. Neural Inf. Process. Syst*. 31, 6571. doi: 10.1007/978-3-030-04167-0

Chung, J., Gulcehre, C., Cho, K., and Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. *arXiv preprint arXiv:1412.3555*. doi: 10.48550/arXiv.1412.3555

Churchland, M. M., Cunningham, J. P., Kaufman, M. T., Foster, J. D., Nuyujukian, P., Ryu, S. I., et al. (2012). Neural population dynamics during reaching. *Nature* 487, 51–56. doi: 10.1038/nature11129

Coombes, S.. (2005). Waves, bumps, and patterns in neural field theories. *Biol. Cybern*. 93, 91–108. doi: 10.1007/s00422-005-0574-y

Coombes, S., beim Graben, P., Potthast, R., and Wright, J. (2014). *Neural Fields: Theory and Applications*. Springer.

Coombes, S., and Byrne, Á. (2019). Next generation neural mass models. *Nonlinear Dyn. Comput. Neurosci*. 2020, 726–742. doi: 10.1007/978-3-319-71048-8_1

da Silva, F. L.. (1991). Neural mechanisms underlying brain waves: from neural membranes to networks. *Electroencephalogr. Clin. Neurophysiol*. 79, 81–93. doi: 10.1016/0013-4694(91)90044-5

Daunizeau, J., David, O., and Stephan, K. E. (2011). Dynamic causal modelling: a critical review of the biophysical and statistical foundations. *Neuroimage* 58, 312–322. doi: 10.1016/j.neuroimage.2009.11.062

David, O., and Friston, K. J. (2003). A neural mass model for meg/eeg:: coupling and neuronal dynamics. *Neuroimage* 20, 1743–1755. doi: 10.1016/j.neuroimage.2003.07.015

Davies, M.. (2021). “Lessons from loihi: progress in neuromorphic computing,” in *2021 Symposium on VLSI Circuits* (Kyoto: IEEE), 1–2.

Davies, M., Srinivasa, N., Lin, T.-H., Chinya, G., Cao, Y., Choday, S. H., et al. (2018). Loihi: a neuromorphic manycore processor with on-chip learning. *IEEE Micro* 38, 82–99. doi: 10.1109/MM.2018.112130359

de Garis, H., Shuo, C., Goertzel, B., and Ruiting, L. (2010). A world survey of artificial brain projects, part i: large-scale brain simulations. *Neurocomputing* 74, 3–29. doi: 10.1016/j.neucom.2010.08.004

Deco, G., and Jirsa, V. K. (2012). Ongoing cortical activity at rest: criticality, multistability, and ghost attractors. *J. Neurosci*. 32, 3366–3375. doi: 10.1523/JNEUROSCI.2523-11.2012

Deco, G., Jirsa, V. K., Robinson, P. A., Breakspear, M., and Friston, K. (2008). The dynamic brain: from spiking neurons to neural masses and cortical fields. *PLoS Comput. Biol*. 4, e1000092. doi: 10.1371/journal.pcbi.1000092

Deco, G., Kringelbach, M. L., Jirsa, V. K., and Ritter, P. (2017). The dynamics of resting fluctuations in the brain: metastability and its dynamical cortical core. *Sci. Rep*. 7, 1–14. doi: 10.1038/s41598-017-03073-5

Deco, G., Senden, M., and Jirsa, V. (2012). How anatomy shapes dynamics: a semi-analytical study of the brain at rest by a simple spin model. *Front. Comput. Neurosci*. 6, 68. doi: 10.3389/fncom.2012.00068

Deschle, N., Ignacio Gossn, J., Tewarie, P., Schelter, B., and Daffertshofer, A. (2021). On the validity of neural mass models. *Front. Comput. Neurosci*. 14, 118. doi: 10.3389/fncom.2020.581040

Destexhe, A., and Sejnowski, T. J. (2009). The wilson-cowan model, 36 years later. *Biol. Cybern*. 101, 1–2. doi: 10.1007/s00422-009-0328-3

Devor, A., Bandettini, P., Boas, D., Bower, J., Buxton, R., Cohen, L., et al. (2013). The challenge of connecting the dots in the brain. *Neuron* 80, 270–274. doi: 10.1016/j.neuron.2013.09.008

Diez, I., and Sepulcre, J. (2018). Neurogenetic profiles delineate large-scale connectivity dynamics of the human brain. *Nat. Commun*. 9, 1–10. doi: 10.1038/s41467-018-06346-3

Ding, M., Chen, Y., and Bressler, S. L. (2006). “17 granger causality: basic theory and application to neuroscience,” in *Handbook of Time Series Analysis: Recent Theoretical Developments and Applications 437*.

Dougherty, D. P., Wright, G. A., and Yew, A. C. (2005). Computational model of the camp-mediated sensory response and calcium-dependent adaptation in vertebrate olfactory receptor neurons. *Proc. Natl. Acad. Sci. U.S.A*. 102, 10415–10420. doi: 10.1073/pnas.0504099102

Dumas, G., Chavez, M., Nadel, J., and Martinerie, J. (2012). Anatomical connectivity influences both intra-and inter-brain synchronizations. *PLoS ONE* 7, e36414. doi: 10.1371/journal.pone.0036414

Eccles, J. C., Fatt, P., and Koketsu, K. (1954). Cholinergic and inhibitory synapses in a pathway from motor-axon collaterals to motoneurones. *J. Physiol*. 126, 524–562. doi: 10.1113/jphysiol.1954.sp005226

El Boustani, S., and Destexhe, A. (2009). A master equation formalism for macroscopic modeling of asynchronous irregular activity states. *Neural Comput*. 21, 46–100. doi: 10.1162/neco.2009.02-08-710

Eli (2022). Available online at: https://books.google.com/books/about/How_to_Build_a_Brain.html?id=BK0YRJPmuzgC.

Epstein, J. M.. (2008). Why model? *J. Artif. Societies Soc. Simulat*. 11, 12. Available online at: https://www.jasss.org/11/4/12.html

Erdős, P., and Rényi, A. (1960). On the evolution of random graphs. *Publ. Math. Inst. Hung. Acad. Sci*. 5, 17–60.

Erlhagen, W., and Schöner, G. (2002). Dynamic field theory of movement preparation. *Psychol. Rev*. 109, 545. doi: 10.1037/0033-295X.109.3.545

Fecher, B., and Friesike, S. (2014). “Open science: one term, five schools of thought,” in *Opening Science*, eds S. Bartling and S. Friesike (Cham: Springer).

Fields, R. D., Araque, A., Johansen-Berg, H., Lim, S.-S., Lynch, G., Nave, K.-A., et al. (2014). Glial biology in learning and cognition. *Neuroscientist* 20, 426–431. doi: 10.1177/1073858413504465

Fingelkurts, A. A., Fingelkurts, A. A., and Neves, C. F. H. (2009). Phenomenological architecture of a mind and operational architectonics of the brain: the unified metastable continuum. *New Math. Natural Comput*. 05, 221–244. doi: 10.1142/S1793005709001258

FitzHugh, R.. (1961). Impulses and physiological states in theoretical models of nerve membrane. *Biophys. J*. 1, 445–466. doi: 10.1016/S0006-3495(61)86902-6

Frégnac, Y., and Laurent, G. (2014). Neuroscience: where is the brain in the human brain project? *Nat. News* 513, 27. doi: 10.1038/513027a

Friston, K.. (2008). Mean-fields and neural masses. *PLoS Comput. Biol*. 4, e1000081. doi: 10.1371/journal.pcbi.1000081

Friston, K., Harrison, L., and Penny, W. (2003). Dynamic causal modelling. *Neuroimage* 19, 1273–1302. doi: 10.1016/S1053-8119(03)00202-7

Funahashi K.-I. and Nakamura, Y.. (1993). Approximation of dynamical systems by continuous time recurrent neural networks. *Neural Netw*. 6, 801–806. doi: 10.1016/S0893-6080(05)80125-X

Gabashvili, I. S., Sokolowski, B. H., Morton, C. C., and Giersch, A. B. (2007). Ion channel gene expression in the inner ear. *J. Assoc. Res. Otolaryngol*. 8, 305–328. doi: 10.1007/s10162-007-0082-y

Gauthier, D. J., Bollt, E., Griffith, A., and Barbosa, W. A. (2021). Next generation reservoir computing. *arXiv preprint arXiv:2106.07688*. doi: 10.1038/s41467-021-25801-2

Gewaltig, M.-O., and Diesmann, M. (2007). Nest (neural simulation tool). *Scholarpedia* 2, 1430. doi: 10.4249/scholarpedia.1430

Gilbert, T. L.. (2018). The allen brain atlas as a resource for teaching undergraduate neuroscience. *J. Undergrad. Neurosci. Educ*. 16, A261. Available online at: https://www.funjournal.org/2018-volume-16-issue-3/

Goldhacker, M., Keck, P., Igel, A., Lang, E. W., and Tomé, A. M. (2017). A multi-variate blind source separation algorithm. *Comput. Methods Programs Biomed*. 151, 91–99. doi: 10.1016/j.cmpb.2017.08.019

Goyal, A., Lamb, A., Hoffmann, J., Sodhani, S., Levine, S., Bengio, Y., et al. (2020). “Recurrent independent mechanisms,” in *International Conference on Learning Representations*.

Guckenheimer, J., and Holmes, P. (2013). *Nonlinear Oscillations, Dynamical Systems, and Bifurcations of Vector Fields, Vol. 42*. Springer Science & Business Media.

Guest, O., and Martin, A. E. (2020). How computational modeling can force theory building in psychological science. *Perspect. Psychol. Sci*. 16, 789–802. doi: 10.31234/osf.io/rybh9

Haken, H.. (2006). *Information and Self-Organization: A Macroscopic Approach to Complex Systems. Springer Series in Synergetics, 3rd Edn*. Springer-Verlag.

Hansel, D., Mato, G., and Meunier, C. (1995). Synchrony in excitatory neural networks. *Neural Comput*. 7, 307–337. doi: 10.1162/neco.1995.7.2.307

Hasani, R., Lechner, M., Amini, A., Rus, D., and Grosu, R. (2020). Liquid time-constant networks. *arXiv preprint arXiv:2006.04439*. doi: 10.48550/arXiv.2006.04439

Hawrylycz, M., Anastassiou, C., Arkhipov, A., Berg, J., Buice, M., Cain, N., et al. (2016). Inferring cortical function in the mouse visual system through large-scale systems neuroscience. *Proc. Natl. Acad. Sci. U.S.A*. 113, 7337–7344. doi: 10.1073/pnas.1512901113

Hawrylycz, M., Ng, L., Feng, D., Sunkin, S., Szafer, A., and Dang, C. (2014). “The allen brain atlas,” in *Springer Handbook of Bio-Neuroinformatics*, 1111–1126.

Hawrylycz, M. J., Lein, E. S., Guillozet-Bongaarts, A. L., Shen, E. H., Ng, L., Miller, J. A., et al. (2012). An anatomically comprehensive atlas of the adult human brain transcriptome. *Nature* 489, 391–399. doi: 10.1038/nature11405

Hebb, D. O.. (1949). *The Organisation of Behaviour: A Neuropsychological Theory*. New York, NY: Science Editions New York.

Henn, F. A., and Hamberger, A. (1971). Glial cell function: Uptake of transmitter substances. *Proc. Natl. Acad. Sci. U.S.A*. 68, 2686–2690. doi: 10.1073/pnas.68.11.2686

Herz, A. V., Gollisch, T., Machens, C. K., and Jaeger, D. (2006). Modeling single-neuron dynamics and computations: a balance of detail and abstraction. *Science* 314, 80–85. doi: 10.1126/science.1127240

Hey, A. J.. (2009). “The fourth paradigm–data-intensive scientific discovery,” in *E-Science and Information Management. IMCW 2012. Communications in Computer and Information Science, Vol. 317*, eds S. Kurbanoglu, U. Al, P. L. Erdogan, Y. Tonta, and N. Uçak (Berlin; Heidelberg: Springer).

Hihi, S., and Bengio, Y. (1995). “Hierarchical recurrent neural networks for long-term dependencies,” in *Advances in Neural Information Processing Systems, Vol. 8*, eds D. Touretzky, M. C. Mozer, and M. Hasselmo (MIT Press). Available online at: https://proceedings.neurips.cc/paper/1995/file/c667d53acd899a97a85de0c201ba99be-Paper.pd

Hinton, G. E., and Sejnowski, T. J. (1983). “Optimal perceptual inference,” in *Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Vol. 448* (Citeseer).

Hochreiter, S., and Schmidhuber, J. (1997). Long short-term memory. *Neural Comput*. 9, 1735–1780. doi: 10.1162/neco.1997.9.8.1735

Hodgkin, A. L., and Huxley, A. F. (1952). A quantitative description of membrane current and its application to conduction and excitation in nerve. *J. Physiol*. 117, 500–544. doi: 10.1113/jphysiol.1952.sp004764

HofstoÈtter, C., Mintz, M., and Verschure, P. F. (2002). The cerebellum in action: a simulation and robotics study. *Eur. J. Neurosci*. 16, 1361–1376. doi: 10.1046/j.1460-9568.2002.02182.x

Honey, C. J., Sporns, O., Cammoun, L., Gigandet, X., Thiran, J.-P., Meuli, R., et al. (2009). Predicting human resting-state functional connectivity from structural connectivity. *Proc. Natl. Acad. Sci. U.S.A*. 106, 2035–2040. doi: 10.1073/pnas.0811168106

Hopfield, J. J.. (1982). Neural networks and physical systems with emergent collective computational abilities. *Proc. Natl. Acad. Sci. U.S.A*. 79, 2554–2558. doi: 10.1073/pnas.79.8.2554

Hoppensteadt, F. C., and Izhikevich, E. M. (1998). Thalamo-cortical interactions modeled by weakly connected oscillators: could the brain use fm radio principles? *Biosystems* 48, 85–94. doi: 10.1016/S0303-2647(98)00053-7

Horn, A., Ostwald, D., Reisert, M., and Blankenburg, F. (2014). The structural–functional connectome and the default mode network of the human brain. *Neuroimage* 102, 142–151. doi: 10.1016/j.neuroimage.2013.09.069

Hutchison, R. M., Leung, L. S., Mirsattari, S. M., Gati, J. S., Menon, R. S., and Everling, S. (2011). Resting-state networks in the macaque at 7 t. *Neuroimage* 56, 1546–1555. doi: 10.1016/j.neuroimage.2011.02.063

Hutchison, R. M., Womelsdorf, T., Allen, E. A., Bandettini, P. A., Calhoun, V. D., Corbetta, M., et al. (2013). Dynamic functional connectivity: promise, issues, and interpretations. *Neuroimage* 80, 360–378. doi: 10.1016/j.neuroimage.2013.05.079

Ito, J., Nikolaev, A. R., and van Leeuwen, C. (2007). Dynamics of spontaneous transitions between global brain states. *Hum. Brain Mapp*. 28, 904–913. doi: 10.1002/hbm.20316

Izhikevich, E. M.. (2003). Simple model of spiking neurons. *IEEE Trans. Neural Netw*. 14, 1569–1572. doi: 10.1109/TNN.2003.820440

Izhikevich, E. M., and Edelman, G. M. (2008). Large-scale model of mammalian thalamocortical systems. *Proc. Natl. Acad. Sci. U.S.A*. 105, 3593–3598. doi: 10.1073/pnas.0712231105

Jaeger, H.. (2007). Echo state network. *Scholarpedia* 2, 2330. revision #196567. doi: 10.4249/scholarpedia.2330

Jaeger, H., and Haas, H. (2004). Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communication. *Science* 304, 78–80. doi: 10.1126/science.1091277

Jaeger, H., Maass, W., and Principe, J. (2007). Special issue on echo state networks and liquid state machines. *Neural Netw*. 20, 287–289. doi: 10.1016/j.neunet.2007.04.001

Jia, J., and Benson, A. R. (2019). “Neural jump stochastic differential equations,” in *Advances in Neural Information Processing Systems, Vol. 32*, eds H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Curran Associates, Inc.).

Jirsa, V., Sporns, O., Breakspear, M., Deco, G., and McIntosh, A. R. (2010). Towards the virtual brain: network modeling of the intact and the damaged brain. *Arch. Ital Biol*. 148, 189–205. doi: 10.4449/aib.v148i3.1223

Jirsa, V. K., and Kelso, J. S. (2000). Spatiotemporal pattern formation in neural systems with heterogeneous connection topologies. *Phys. Rev. E* 62, 8462. doi: 10.1103/PhysRevE.62.8462

Johnson, S.. (2002). *Emergence: The Connected lives of ANTS, Brains, Cities, and Software*. Simon and Schuster.

Jones, A. R., Overly, C. C., and Sunkin, S. M. (2009). The allen brain atlas: 5 years and beyond. *Nat. Rev. Neurosci*. 10, 821–828. doi: 10.1038/nrn2722

Kaheman, K., Nathan Kutz, J., and Brunton, S. L. (2020). Sindy-pi: a robust algorithm for parallel implicit sparse identification of nonlinear dynamics. *arXiv preprint arXiv:2004.02322*. doi: 10.1098/rspa.2020.0279

Kanaa, D., Voleti, V., Kahou, S., Pal, C., and Cifar, M. (2019). “Simple video generation using neural odes,” in *Workshop on Learning With Rich Experience, Advances in Neural Information Processing Systems, Vol. 32*.

Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., et al. (2020). Scaling laws for neural language models. *arXiv preprint arXiv:2001.08361*. doi: 10.48550/arXiv.2001.08361

Kasabov, N. K.. (2007). *Evolving Connectionist systems: The Knowledge Engineering Approach*. Springer Science & Business Media.

Kasabov, N. K.. (2014). Neucube: a spiking neural network architecture for mapping, learning and understanding of spatio-temporal brain data. *Neural Netw*. 52, 62–76. doi: 10.1016/j.neunet.2014.01.006

Kawahara, T.. (1980). Coupled van der pol oscillators-a model of excitatory and inhibitory neural interactions. *Biol. Cybern*. 39, 37–43. doi: 10.1007/BF00336943

Kelso, J. A. S.. (2012). Multistability and metastability: understanding dynamic coordination in the brain. *Philos. Trans. R. Soc. B Biol. Sci*. 367, 906–918. doi: 10.1098/rstb.2011.0351

Kidger, P., Morrill, J., Foster, J., and Lyons, T. (2020). Neural controlled differential equations for irregular time series. *arXiv preprint arXiv:2005.08926*. doi: 10.48550/arXiv.2005.08926

Kilian, J., and Siegelmann, H. T. (1996). The dynamic universality of sigmoidal neural networks. *Inform. Comput*. 128, 48–56. doi: 10.1006/inco.1996.0062

Kingma, D. P., and Welling, M. (2013). Auto-encoding variational bayes. *arXiv preprint arXiv:1312.6114*. doi: 10.48550/arXiv.1312.6114

Koch, C., Reid, C., Zeng, H., Mihalas, S., Hawrylycz, M., Philips, J., et al. (2014). “Project mindscope,” in *The Future of the Brain* (Princeton, NJ: Princeton University Press), 25–39.7

Kolda, T. G., Pinar, A., Plantenga, T., and Seshadhri, C. (2014). A scalable generative graph model with community structure. *SIAM J. Scientific Comput*. 36, C424–C452. doi: 10.1137/130914218

Koppe, G., Toutounji, H., Kirsch, P., Lis, S., and Durstewitz, D. (2019). Identifying nonlinear dynamical systems via generative recurrent neural networks with applications to fMRI. *PLoS Comput. Biol*. 15, e1007263. doi: 10.1371/journal.pcbi.1007263

Kostas, D., Aroca-Ouellette, S., and Rudzicz, F. (2021). Bendr: Using transformers and a contrastive self-supervised learning task to learn from massive amounts of eeg data. *Front. Hum. Neurosci*. 15, 253. doi: 10.3389/fnhum.2021.653659

Krakauer, J. W., Ghazanfar, A. A., Gomez-Marin, A., MacIver, M. A., and Poeppel, D. (2017). Neuroscience needs behavior: correcting a reductionist bias. *Neuron* 93, 480–490. doi: 10.1016/j.neuron.2016.12.041

Kullback, S., and Leibler, R. A. (1951). On information and sufficiency. *Ann. Math. Stat*. 22, 79–86. doi: 10.1214/aoms/1177729694

Kuramoto, Y.. (1984). “Chemical turbulence,” in *Chemical Oscillations, Waves, and Turbulence. Springer Series in Synergetics, Vol. 19* (Berlin; Heidelberg: Springer).

Kutz, J. N.. (2013). *Data-Driven Modeling & Scientific Computation: Methods for Complex Systems & Big Data*. Oxford: Oxford University Press.

Lechner, M., RHasani, R., Zimmer, M., Henzinger, T. A., and Grosu, R. (2019). “Designing worm-inspired neural networks for interpretable robotic control,” in *2019 International Conference on Robotics and Automation (ICRA)* (Montreal, QC: IEEE), 87—94.

LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. *Nature* 521, 436–444. doi: 10.1038/nature14539

Lein, E. S., Hawrylycz, M. J., Ao, N., Ayres, M., Bensinger, A., Bernard, A., et al. (2007). Genome-wide atlas of gene expression in the adult mouse brain. *Nature* 445, 168–176. doi: 10.1038/nature05453

Li, H., Xu, Z., Taylor, G., Studer, C., and Goldstein, T. (2018). “Visualizing the loss landscape of neural nets,” in *Advances in Neural Information Processing Systems, volume 31*, eds S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Curran Associates, Inc.). Available online at: https://proceedings.neurips.cc/paper/2018/file/a41b3bb3e6b050b6c9067c67f663b915-Paper.pdf.

Li, X., Wong, T.-K. L., Chen, R. T. Q., and Duvenaud, D. (2020). “Scalable gradients for stochastic differential equations,” in *Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, volume 108 of Proceedings of Machine Learning Research*, eds S. Chiappa, and R. Calandra (PMLR), 3870–3882.

Li, Y., Vinyals, O., Dyer, C., Pascanu, R., and Battaglia, P. (2018). Learning deep generative models of graphs. *arXiv preprint arXiv:1803.03324*. doi: 10.48550/arXiv.1803.03324

Liang, H., and Wang, H. (2017). Structure-function network mapping and its assessment via persistent homology. *PLoS Comput. Biol*. 13, e1005325. doi: 10.1371/journal.pcbi.1005325

Linial, O., Ravid, N., Eytan, D., and Shalit, U. (2021). “Generative ode modeling with known unknowns,” in *Proceedings of the Conference on Health, Inference, and Learning*, 79–94.

Little, W. A.. (1974). The existence of persistent states in the brain. *Math. Biosci*. 19, 101–120. doi: 10.1016/0025-5564(74)90031-5

Liu, X., Xiao, T., Si, S., Cao, Q., Kumar, S., and Hsieh, C.-J. (2020). “How does noise help robustness? explanation and exploration under the neural sde framework,” in *Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)* (Seattle, WA: IEEE).

Lurie, D. J., Kessler, D., Bassett, D. S., Betzel, R. F., Breakspear, M., Kheilholz, S., et al. (2020). Questions and controversies in the study of time-varying functional connectivity in resting fmri. *Netw. Neurosci*. 4, 30–69. doi: 10.1162/netn_a_00116

Maass, W.. (1997). Networks of spiking neurons: the third generation of neural network models. *Neural Netw.* 10, 1659—1671.

Maass, W., Natschläger, T., and Markram, H. (2002). Real-time computing without stable states: a new framework for neural computation based on perturbations. *Neural Comput*. 14, 2531–2560. doi: 10.1162/089976602760407955

MacLaurin, J., Salhi, J., and Toumi, S. (2018). Mean field dynamics of a wilson–cowan neuronal network with nonlinear coupling term. *Stochastics Dyn*. 18, 1850046. doi: 10.1142/S0219493718500466

Markopoulos, P. P., Karystinos, G. N., and Pados, D. A. (2014). Optimal algorithms for 1-subspace signal processing. *IEEE Trans. Signal Process*. 62, 5046–5058. doi: 10.1109/TSP.2014.2338077

Mazziotta, J. C., Toga, A., Evans, A., Fox, P., Lancaster, J., and Woods, R. (2000). “A probabilistic approach for mapping the human brain: the international consortium for brain mapping (icbm),” in *Brain Mapping: The Systems* (Elsevier), 141–156.

McCulloch, W. S., and Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. *Bull. Math. Biophys*. 5, 115–133. doi: 10.1007/BF02478259

McKeown, M. J., Makeig, S., Brown, G. G., Jung, T.-P., Kindermann, S. S., Bell, A. J., et al. (1998). Analysis of fmri data by blind separation into independent spatial components. *Hum. Brain Mapp*. 6, 160–188. doi: 10.1002/(SICI)1097-0193(1998)6:3andlt;160::AID-HBM5andgt;3.0.CO;2-1

Medina, J. F., Garcia, K. S., Nores, W. L., Taylor, N. M., and Mauk, M. D. (2000). Timing mechanisms in the cerebellum: testing predictions of a large-scale computer simulation. *J. Neurosci*. 20, 5516–5525. doi: 10.1523/JNEUROSCI.20-14-05516.2000

Michaels, J. A., Dann, B., and Scherberger, H. (2016). Neural population dynamics during reaching are better explained by a dynamical system than representational tuning. *PLoS Comput. Biol*. 12, e1005175. doi: 10.1371/journal.pcbi.1005175

Miller, G.. (2011). Blue brain founder responds to critics, clarifies his goals. *Science* 334, 748–749. doi: 10.1126/science.334.6057.748

Moran, R., Pinotsis, D. A., and Friston, K. (2013). Neural masses and fields in dynamic causal modeling. *Front. Comput. Neurosci*. 7, 57. doi: 10.3389/fncom.2013.00057

Nakagawa, N., and Kuramoto, Y. (1994). From collective oscillations to collective chaos in a globally coupled oscillator system. *Physica D* 75, 74–80. doi: 10.1016/0167-2789(94)90275-5

Newman, M. E., Barabási, A.-L. E., and Watts, D. J. (2006). *The Structure and Dynamics of Networks*. Princeton, NJ: Princeton University Press.

Ng, A.. (2021). *Andrew Ng X-Rays the AI Hype. AI Pioneer Says Machine Learning May Work On Test Sets, But That's A Long Way From Real World Use*. Available online at: https://spectrum.ieee.org/andrew-ng-xrays-the-ai-hype

Patlak, J. B., and Ortiz, M. (1985). Slow currents through single sodium channels of the adult rat heart. *J. Gen. Physiol*. 86, 89–104. doi: 10.1085/jgp.86.1.89

Pearlmutter, B. A., and Parra, L. C. (1997). “Maximum likelihood blind source separation: a context-sensitive generalization of ica,” in *Advances in Neural Information Processing Systems*, 613–619.

Perl, Y. S., Boccacio, H., Pérez- Ipiña, I., Zamberlán, F., Laufs, H., Kringelbach, M., et al. (2020). Generative embeddings of brain collective dynamics using variational autoencoders. *arXiv preprint arXiv:2007.01378*. doi: 10.48550/arXiv.2007.01378

Petkoski, S., and Jirsa, V. K. (2019). Transmission time delays organize the brain network synchronization. *Philos. Trans. R. Soc. A Math. Phys. Eng. Sci*. 377, 20180132. doi: 10.1098/rsta.2018.0132

Piccinini, J., Ipi nna, I. P., Laufs, H., Kringelbach, M., Deco, G., Sanz Perl, Y., et al. (2021). Noise-driven multistability vs deterministic chaos in phenomenological semi-empirical models of whole-brain activity. *Chaos* 31, 023127. doi: 10.1063/5.0025543

Pillai, A. S., and Jirsa, V. K. (2017). *Symmetry Breaking in Space-Time Hierarchies Shapes Brain Dynamics and Behavior, volume 94*. Cell Press.

Polepalli, A., Soures, N., and Kudithipudi, D. (2016). “Digital neuromorphic design of a liquid state machine for real-time processing,” in *2016 IEEE International Conference on Rebooting Computing (ICRC)* (San Diego, CA: IEEE), 1–8.

Qian, W., Papadopoulos, L., Lu, Z., Wiley, K., Pasqualetti, F., and Bassett, D. S. (2020). Path-dependent dynamics induced by rewiring networks of inertial oscillators. *Phys. Rev. E* 105, 024304. doi: 10.1103/PhysRevE.105.024304

Quade, M., Abel, M., Nathan, J. K., and Brunton, S. (2018). Sparse identification of nonlinear dynamics for rapid model recovery. *Chaos* 28, 063116–063116. doi: 10.1063/1.5027470

Rabinovich, M. I., Varona, P., Selverston, A. I., and Abarbanel, H. D. (2006). Dynamical principles in neuroscience. *Rev. Mod. Phys*. 78, 1213. doi: 10.1103/RevModPhys.78.1213

Rackauckas, C., Ma, Y., Martensen, J., Warner, C., Zubov, K., and Supekar, R.. (2020). *Universal Differential Equations for Scientific Machine Learning*. Available online at: https://arxiv.org/abs/2001.04385v3

Rajapakse, J. C., Tan, C. L., Zheng, X., Mukhopadhyay, S., and Yang, K. (2006). Exploratory analysis of brain connectivity with ica. *IEEE Eng. Med. Biol. Mag*. 25, 102–111. doi: 10.1109/MEMB.2006.1607674

Richiardi, J., Altmann, A., Milazzo, A.-C., Chang, C., Chakravarty, M. M., Banaschewski, T., et al. (2015). Correlated gene expression supports synchronous activity in brain networks. *Science* 348, 1241–1244. doi: 10.1126/science.1255905

Roberts, J. A., Gollo, L. L., Abeysuriya, R. G., Roberts, G., Mitchell, P. B., Woolrich, M. W., et al. (2019). Metastable brain waves. *Nat Commun*. 10, 1–17. doi: 10.1038/s41467-019-08999-0

Rubanova, Y., Chen, R. T. Q., and Duvenaud, D. (2019). “Latent odes for irregularly-sampled time series,” in *Advances in Neural Information Processing Systems 32 (NeurIPS 2019)*.

Sadeghi, S., Mier, D., Gerchen, M. F., Schmidt, S. N. L., and Hass, J. (2020). Dynamic causal modeling for fmri with wilson-cowan-based neuronal equations. *Front. Neurosci*. 14, 1205. doi: 10.3389/fnins.2020.593867

Saggio, M. L., and Jirsa, V. (2020). Phenomenological mesoscopic models for seizure activity. *arXiv preprint arXiv:2007.02783*. doi: 10.48550/arXiv.2007.02783

Sanz Leon, P., Knock, S. A., Woodman, M. M., Domide, L., Mersmann, J., McIntosh, A. R., et al. (2013). The virtual brain: a simulator of primate brain network dynamics. *Front. Neuroinform*. 7, 10. doi: 10.3389/fninf.2013.00010

Schliebs, S., and Kasabov, N. (2013). Evolving spiking neural network-a survey. *Evolving Syst*. 4, 87—98, doi: 10.1007/s12530-013-9074-9

Schmidhuber, J.. (1992). Learning complex, extended sequences using the principle of history compression. *Neural Comput*. 4, 234–242. doi: 10.1162/neco.1992.4.2.234

Sforazzini, F., Schwarz, A. J., Galbusera, A., Bifone, A., and Gozzi, A. (2014). Distributed bold and cbv-weighted resting-state networks in the mouse brain. *Neuroimage* 87, 403–415. doi: 10.1016/j.neuroimage.2013.09.050

Sharifshazileh, M., Burelo, K., Sarnthein, J., and Indiveri, G. (2021). An electronic neuromorphic system for real-time detection of high frequency oscillations (hfo) in intracranial EEG. *Nat. Commun*. 12, 1–14. doi: 10.1038/s41467-021-23342-2

Sharp, T., Galluppi, F., Rast, A., and Furber, S. (2012). Power-efficient simulation of detailed cortical microcircuits on spinnaker. *J. Neurosci Methods* 210, 110–118. doi: 10.1016/j.jneumeth.2012.03.001

Sherstinsky, A.. (2020). Fundamentals of recurrent neural network (rnn) and long short-term memory (lstm) network. *Physica D* 404, 132306. doi: 10.1016/j.physd.2019.132306

Smith, D. W.. (2018). “Phenomenology,” in *The Stanford Encyclopedia of Philosophy*, ed E. N. Zalta (Stanford, CA: Metaphysics Research Lab, Stanford University, Summer 2018 edition).

Socher, R., Lin, C. C.-Y., Ng, A. Y., and Manning, C. D. (2011). “Parsing natural scenes and natural language with recursive neural networks,” in *ICML*.

Soltic, S., Wysoski, S. G., and Kasabov, N. K. (2008). “Evolving spiking neural networks for taste recognition,” in *2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence)* (Hong Kong: IEEE), 2091–2097.

Song, Y., Jia, X., Yang, L., and Xie, L. (2021). *Transformer-based spatial-temporal feature learning for eeg decoding*.

Spoerer, C. J., Kietzmann, T. C., Mehrer, J., Charest, I., and Kriegeskorte, N. (2020). Recurrent neural networks can explain flexible trading of speed and accuracy in biological vision. *PLoS Comput. Biol*. 16, e1008215. doi: 10.1371/journal.pcbi.1008215

Sporns, O.. (2007). Brain connectivity. *Scholarpedia* 2, 4695. revision #91084. doi: 10.4249/scholarpedia.4695

Sporns, O., Tononi, G., and Kötter, R. (2005). The human connectome: a structural description of the human brain. *PLoS Comput. Biol*. 1, e42. doi: 10.1371/journal.pcbi.0010042

Srivastava, M., Hashimoto, T., and Liang, P. (2020). “Robustness to spurious correlations via human annotations,” in *International Conference on Machine Learning* (PMLR), 9109–9119.

Stein, R., and Hodgkin, A. L. (1967). The frequency of nerve action potentials generated by applied currents. *Proc. R. Soc. Lond. B Biol. Sci*. 167, 64–86. doi: 10.1098/rspb.1967.0013

Stimberg, M., Brette, R., and Goodman, D. F. (2019). Brian 2, an intuitive and efficient neural simulator. *Elife* 8, e47314. doi: 10.7554/eLife.47314

Storkey, A.. (1997). “Increasing the capacity of a hopfield network without sacrificing functionality,” in *International Conference on Artificial Neural Networks* (Springer), 451–456.

Strogatz, S. H.. (2000). From kuramoto to crawford: exploring the onset of synchronization in populations of coupled oscillators. *Physica D* 143, 1–20. doi: 10.1016/S0167-2789(00)00094-4

Strogatz, S. H.. (2018). *Nonlinear Dynamics and Chaos With Student Solutions Manual: With Applications to Physics, Biology, Chemistry, and Engineering*. CRC Press.

Su, W., Bogdan, M., and Candes, E. (2017). False discoveries occur early on the lasso path. *Ann. Stat*. 45, 2133–2150. doi: 10.1214/16-AOS1521

Sun, J., Xie, J., and Zhou, H. (2021). “EEG classification with transformer-based models,” in *2021 IEEE 3rd Global Conference on Life Sciences and Technologies (LifeTech)* (Nara: IEEE), 92–93.

Surampudi, S. G., Misra, J., Deco, G., Bapi, R. S., Sharma, A., and Roy, D. (2019). Resting state dynamics meets anatomical structure: temporal multiple kernel learning (tmkl) model. *Neuroimage* 184, 609–620. doi: 10.1016/j.neuroimage.2018.09.054

Sutskever, I., Martens, J., and Hinton, G. E. (2011). “Generating text with recurrent neural networks,” in *ICML*.

Tait, L., Özkan, A., Szul, M. J., and Zhang, J. (2021). A systematic evaluation of source reconstruction of resting meg of the human brain with a new high-resolution atlas: performance, precision, and parcellation. *Hum. Brain Mapp*. 42, 4685–4707. doi: 10.1002/hbm.25578

Tang, E., Giusti, C., Baum, G. L., Gu, S., Pollock, E., Kahn, A. E., et al. (2017). Developmental increases in white matter network controllability support a growing diversity of brain dynamics. *Nat. Commun*. 8, 1252. doi: 10.1038/s41467-017-01254-4

Traub, R. D., Wong, R. K., Miles, R., and Michelson, H. (1991). A model of a ca3 hippocampal pyramidal neuron incorporating voltage-clamp data on intrinsic conductances. *J. Neurophysiol*. 66, 635–650. doi: 10.1152/jn.1991.66.2.635

Valdes-Sosa, P. A., Roebroeck, A., Daunizeau, J., and Friston, K. (2011). Effective connectivity: Influence, causality and biophysical modeling. *Neuroimage* 58, 339–361. doi: 10.1016/j.neuroimage.2011.03.058

van den Heuvel, M. P., Scholtens, L. H., and Kahn, R. S. (2019). Multiscale neuroscience of psychiatric disorders. *Biol. Psychiatry* 86, 512–522. doi: 10.1016/j.biopsych.2019.05.015

Van Essen, D. C., Smith, S. M., Barch, D. M., Behrens, T. E. J., Yacoub, E., and Ugurbil, K. (2013). The wu-minn human connectome project: an overview. *Neuroimage* 80, 62–79. doi: 10.1016/j.neuroimage.2013.05.041

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., et al. (2017). “Attention is all you need,” in *Advances in Neural Information Processing Systems*, 5998–6008.

Verstraeten, D., Schrauwen, B., D'Haene, M., Stroobandt, D., et al. (2007). A unifying comparison of reservoir computing methods. *Neural Netw*. 20, 391–403. doi: 10.1016/j.neunet.2007.04.003

Vincent, J. L., Patel, G. H., Fox, M. D., Snyder, A. Z., Baker, J. T., Van Essen, D. C., et al. (2007). Intrinsic functional architecture in the anaesthetized monkey brain. *Nature* 447, 83–86. doi: 10.1038/nature05758

Vreeken, J.. (2003). *Spiking Neural Networks, An Introduction*. Available online at: https://www.narcis.nl/publication/RecordID/oai%3Adspace.library.uu.nl%3A1874%2F24416/uquery/vreeken%202003/id/2/Language/NL

Wang, X.-J.. (2010). Neurophysiological and computational principles of cortical rhythms in cognition. *Physiol. Rev*. 90, 1195–1268. doi: 10.1152/physrev.00035.2008

Wang, Z., Xia, M., Jin, Z., Yao, L., and Long, Z. (2014). Temporally and spatially constrained ica of fmri data analysis. *PLoS ONE* 9, e94211. doi: 10.1371/journal.pone.0094211

Wein, S., Deco, G., Tom,é, A. M., Goldhacker, M., Malloni, W. M., Greenlee, M. W., et al. (2021). Brain connectivity studies on structure-function relationships: a short survey with an emphasis on machine learning. *Comput. Intell. Neurosci*. 2021:e5573740. doi: 10.1155/2021/5573740

White, J. G., Southgate, E., Thomson, J. N., and Brenner, S. (1986). The structure of the nervous system of the nematode caenorhabditis elegans. *Philos. Trans. R. Soc. Lond. B Biol. Sci*. 314, 1–340. doi: 10.1098/rstb.1986.0056

Whittington, M. A., Traub, R. D., and Jefferys, J. G. (1995). Synchronized oscillations in interneuron networks driven by metabotropic glutamate receptor activation. *Nature* 373, 612–615. doi: 10.1038/373612a0

Wills, A. J., and Pothos, E. M. (2012). On the adequacy of current empirical evaluations of formal models of categorization. *Psychol. Bull*. 138, 102. doi: 10.1037/a0025715

Wilson, H. R.. (2019). Hyperchaos in wilson–cowan oscillator circuits. *J. Neurophysiol*. 122, 2449–2457. doi: 10.1152/jn.00323.2019

Wilson, H. R., and Cowan, J. D. (1972). Excitatory and inhibitory interactions in localized populations of model neurons. *Biophys. J*. 12, 1–24. doi: 10.1016/S0006-3495(72)86068-5

Winn, J., Bishop, C. M., and Jaakkola, T. (2005). Variational message passing. *J. Mach. Learn. Res*. 6. http://jmlr.org/papers/v6/winn05a.html

Wulf, W. A., and McKee, S. A. (1995). Hitting the memory wall: implications of the obvious. *ACM Sigarch Comp. Arch. News* 23, 20–24. doi: 10.1145/216585.216588

Yamazaki, T., and Tanaka, S. (2005). Neural modeling of an internal clock. *Neural Comput*. 17, 1032–1058. doi: 10.1162/0899766053491850

Yamazaki, T., and Tanaka, S. (2007). The cerebellum as a liquid state machine. *Neural Netw*. 20, 290–297. doi: 10.1016/j.neunet.2007.04.004

Yan, H., Zhao, L., Hu, L., Wang, X., Wang, E., and Wang, J. (2013). Nonequilibrium landscape theory of neural networks. *Proc. Natl. Acad. Sci. U.S.A*. 110, E4185–E4194. doi: 10.1073/pnas.1310692110

Yildiz, C., Heinonen, M., and Lähdesmäki, H. (2019). “Ode2vae: Deep generative second order odes with bayesian neural networks,” in *Advances in Neural Information Processing Systems*.

Yuste, R.. (2015). From the neuron doctrine to neural networks. *Nat. Rev. Neurosci*. 16, 487–497. doi: 10.1038/nrn.3962

Zhang, M., Zonghua, G., and Gang, P. (2018). A survey of neuromorphic computing based on spiking neural networks. *Chin. J. Electron*. 27, 667–674. doi: 10.1049/cje.2018.05.006

Zhuang, J., Dvornek, N., Tatikonda, S., Papademetris, X., Ventola, P., and Duncan, J. S. (2021). “Multiple-shooting adjoint method for whole-brain dynamic causal modeling,” in *Information Processing in Medical Imaging, Lecture Notes in Computer Science*, eds A. Feragen, S. Sommer, J. Schnabel, and M. Nielsen (Cham: Springer International Publishing), 58–70.

Keywords: machine learning, computational neuroscience, interpretability, nonlinear dynamics, brain imaging

Citation: Ramezanian-Panahi M, Abrevaya G, Gagnon-Audet J-C, Voleti V, Rish I and Dumas G (2022) Generative Models of Brain Dynamics. *Front. Artif. Intell.* 5:807406. doi: 10.3389/frai.2022.807406

Received: 02 November 2021; Accepted: 10 June 2022;

Published: 15 July 2022.

Edited by:

Djallel Bouneffouf, IBM Research, United StatesReviewed by:

Francesco Caravelli, Los Alamos National Laboratory (DOE), United StatesKarl Friston, University College London, United Kingdom

Copyright © 2022 Ramezanian-Panahi, Abrevaya, Gagnon-Audet, Voleti, Rish and Dumas. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Mahta Ramezanian-Panahi, mahtaa@gmail.com