UvA-DARE (Digital Academic Repository) A Nonequilibrium-Potential Approach to Competition in Neural Populations

features of the models in this class are bistability—with implications for working memory and slow neural oscillations—and population bursts, associated with signal detection in neuroscience. Instead, limit cycles are not found for the conditions in which the NEP is deﬁned. Their nonexistence can be proven by resorting to the Bendixson–Dulac theorem, at least when the NEP remains positive and in the (also generic) singular limit of these models. This NEP constitutes a powerful tool to understand average neural network dynamics from a more formal standpoint, and will also be of help in the description of large heterogeneous neural networks.


INTRODUCTION
The analysis of dissipative, autonomous 1 dynamic flows (especially high-dimensional ones) can be greatly simplified, if a function can be constructed to provide an "energy landscape" to the problem. Note that only in very few cases can a nonlinear dynamical system be analytically solved; for instance, if the system itself is a quadratic form, one can use the Wei-Norman (Lie-algebraic) method to reduce it to a linear one (see e.g., [1]). Energy landscapes not only help visualize the systems' phase space and its structural changes as parameters are varied, but allow predicting the rates of activated processes [2][3][4]. Some fields that benefit from the energy landscape approach are optimization problems [5], neural networks [6], protein folding [7], cell nets [8], gene regulatory networks [9,10], ecology [11], and evolution [12].
For continuous-time flows, the possibility of this "Lyapunov function"-with its distinctive property˙ < 0 outside the attractors-was suggested in the context of the general stability problem of dynamical systems [13] and in a sense, it adds a quantitative dimension to the qualitative theory of differential equations. The linearization of the flow around its attractors always provides such a function, but its validity breaks down well inside their own basin. Instead, finding a global Lyapunov function is not an easy problem 2 . If only the information of the (deterministic) dynamical system is to be used, this function can be found for the so-called "gradient flows"purely irrotational flows in 3D, exact (longitudinal) forms in any dimensionality. But since for general relaxational flows (having nontrivial Helmholtz-Hodge decomposition) the integrability conditions are not automatically met, some more information is needed.
A hint of what information is needed comes from recalling that dynamical systems are models and as such, they leave aside a multitude of degrees of freedom-deemed irrelevant to the model, but nonetheless coupled to the "system." A useful framework to deal with them is the one set forward by Langevin [15], which makes the dynamical flow into a stochastic process (thus nonautonomous, albeit driven by a stationary "white noise" process).
What Graham and his collaborators [16,17] realized more than 30 years ago is that even in the deterministic limit, this space enlargement can eventually help meet the integrability conditions. Given an initial state x i of a (continuous-time, dissipative, autonomous) dynamic flowẋ = f (x), its conditional probability density function (pdf) P(x, t|x i , 0) when submitted to a (Gaussian, centered) white noise ξ (t) with variance γ , namely 3 in terms of the "drift" D (1) = f (x) and "diffusion" D (2) = γ Kramers-Moyal coefficients [18][19][20]. Being the flow nonautonomous but dissipative, one can expect generically situations of statistical energy balance in which the pdf becomes stationary, ∂ t P st (x) = 0, thus independent of the initial state. Then by defining ( For an n-component dynamic flow submitted to m ≤ n (Gaussian, uncorrelated, centered) white noises ξ i (t) with 2 By this we mean that we know no systematic methods other than the one we describe here. Of course, there is always room for heuristically finding a global Lyapunov function for some systems, even those containing dynamical variables that exert feedback control (see e.g., [14]). 3 We follow the usual notation in physics, which strictly means dx = f (x)dt+dW(t) in terms of the Wiener process W(t) = t 0 ξ (s)ds. Note that dx − dW(t) is still a (deterministic) dynamic flow, which implies some kind of connection in the x − W bundle (but not the one of gauge theory). common variance γ , (σ is an n × m constant matrix) the nonequilibrium potential (NEP) has been thus defined [16] as from which (x) can in principle be found. In an attractor's basin, asymptotic stability imposes D : = det Q > 0. In fact, for m = n (restriction adopted hereafter) it is D = (det σ ) 2 , which in turn requires σ to be nonsingular. Using Equation (2),  4 . Then Equation (2) reads r T (x)∇ = 0, and r(x) is the conservative part of f(x). Note that d(x) is still irrotational (in the sense of the Helmholtz decomposition) but is not an exact form (the Hodge decomposition is made in the enlarged space).
For n = 2 we may write r(x) = κ ∇ , with the N = 1 symplectic matrix. Hence f(x) = −(Q − κ )∇ , with det(Q − κ ) = D + κ 2 > 0, and thus For arbitrary real σ ij we can parameterize and define λ : = √ λ 1 λ 2 cos(α 1 − α 2 ) (note that the condition D > 0 imposes α 2 = α 1 ). Then and Equation (3) reads (∂ k is a shorthand for ∂/∂x k ). If a set {λ 1 , λ 2 , λ, κ} can be found such that (x) fulfills the integrability condition ∂ 2 ∂ 1 = ∂ 1 ∂ 2 , then a NEP exists. Early successful examples are the complex Ginzburg-Landau equation (CGLE) [21,22] and the FitzHugh-Nagumo (FHN) model [23,24] 5 . This scheme has been later reformulated [29], extended [30], and exploited in many interesting cases [6][7][8][9][10][11][12]. The goal of this work is to show that a NEP exists for a broad class of rate models of neural networks, of the type proposed by Wilson and Cowan [31]. The Wilson-Cowan model has been used to model many different dynamics, brain areas, and neural-network structures in the brain. Therefore, the derivation of a NEP has potential implications for many problems in computational neuroscience. Section 2 is devoted to an analysis of the model and variations of section 3, to the derivation of the NEP in some of the cases studied in section 2, which are of high relevance in neuroscience. Section 4 undertakes a thorough discussion of our findings, and section 5 collects our conclusions.

THE WILSON-COWAN MODEL
Elucidating the architecture and dynamics of the neocortex is of utmost importance in neuroscience. But despite ongoing titanic efforts like the Human Brain Project or the BRAIN initiative, we are still very far from that goal. Given that the dynamics of single typical neurons has been relatively well described (in some cases even by analytical means), a fundamental approach can be practiced for small neural circuits. This means describing them as networks of excitable elements (neurons) connected by links (synapses), and solving the network dynamics by hybrid (analytical-numerical) techniques. However, the time employed in the analytical solution has poor scaling with size. Hence, this approach becomes unworkable for more than a few recurrently interconnected neurons, and one has to rely only on numerical simulations.
Fortunately-as evidenced since long ago by the existence (as in a medium) of wavelike excitations-the huge connectivity of the neocortex enables coarse-grained or mean-field descriptions, which provide more concise and relevant information to understand the mesoscopic dynamics of the system 6 . Frequently obtained via mean-field techniques and commonly referred to as rate models or neural mass models, coarse-grained reductions have been widely used in the theoretical study of neural systems [31,[33][34][35]. In particular, the one proposed by Wilson and Cowan [31] has proved to be very useful in describing the macroscopic dynamics of neural circuits. This level of description is able to capture many of the dynamical features associated with several cognitive and behavioral phenomena, such as working memory [33,36] or perceptual decision making [35,37]. It is also possible to use a rate model approach to study the dynamics of networks constituted by heterogeneous neurons [38,39], thus recovering part of the complexity lost in the averaging. Disposing of an "energy function" (not restricted to symmetric couplings) for rate-level dynamics of neural networks would be a major added advantage.
The Wilson-Cowan model describes the evolution of competing populations x 1 , x 2 of excitatory and inhibitory neurons, respectively. The model is defined by [31] where x 1 and x 2 represent the coarse-grained activity of an excitatory and an inhibitory neural population, respectively, and the monotonically increasing (sigmoidal) response functions are such that s k (0) = 0, and range from − So the first crucial observation about the model is that it is asymptotically linear.
The currents i k are in turn linearly related to the x k : All the parameters are real and moreover, the j kl are positive (j 11 and j 22 are recurrent interactions, j 12 and j 21 are cross-population interactions). The above definitions are such that for M = 0, x = 0 is a stable fixed point. To avoid confusions in the following, note that det J = −(j 11 j 22 − j 12 j 21 ). Wilson and Cowan [31] found interesting features as e.g., staircases of bistable regimes and limit cycles. A thorough analysis of the model's bifurcation structure has been undertaken in Borisyuk and Kirillov [40]. The authors create a two-parameter structural portrait by fixing all the parameters but µ 1 and j 21 and find that the µ 1 −j 21 plane turns out to be partitioned into several regions by: • a fold point bifurcation curve (the number of fixed points changes by two when crossed), • an Andronov-Hopf bifurcation curve (separates regions with stable and unstable foci), • a saddle separatrix loop (a limit cycle on one side, none on the other), and • a double limit cycle curve (the number of limit cycles changes by two).
The uncoupled case (j 12 = j 21 = 0) is clearly a gradient system, with potential where i 1 = j 11 x 1 + µ 1 and i 2 = −j 22 x 2 + µ 2 . Functions F k differ only in the values of their parameters. Their functional expression, involving polylogs, is uninteresting besides being complicated. Much more insight is obtained by observing the global features: . So it seems interesting to look at them in the limit β k → ∞ (k = 1, 2), θ (x) being Heaviside's unit step function. Unfortunately, neither Equation (4) nor their singular limit fulfill the above mentioned integrability condition.
In practice however, the names of Wilson and Cowan are associated to the broader class of rate models. In the following we shall show that the model defined by does admit a NEP-for any functional forms of the nonlinear single-variable functions s k (i k ) 7 -provided global stability is assured.

NONEQUILIBRIUM POTENTIAL
For the model defined by Equation (6), it is boils down to (and these in turn to j 21 j 11 τ 1 λ 1 = j 12 j 22 τ 2 λ 2 ) so that λ 2 , λ and κ can be expressed in terms of λ 1 , which sets the global scale of (x): Since r(x) = κ ∇ , τ 1 = τ 2 suffices to render the flow purely dissipative (albeit not gradient). From this, a good choice is λ 1 : = j 22 j 21 τ 2 ρ. In summary, and Equation (3) becomes Integrating Equation (7) over any path from x = 0, yields

THEORETICAL ANALYSIS
The first-crucial-observation is that being s k (i k ) sigmoidal functions, Equation (8) is at most quadratic. Global stability thus imposes det J < 0, i.e., j 11 j 22 > j 12 j 21 . But note that matrix J also determines the paraboloid's cross section 8 .
If the form x T A x, A = j 11 j 21 −j 12 j 21 −j 12 j 21 j 12 j 22 in the first term of Equation (8) could be straightforwardly factored out, then one could tell what its section is by watching whether the factors are real or complex. A more systematic approach is to reduce x T A x to canonical form by a similarity transformation that involves the normalized eigenvectors of A. Then the inverse squared lengths of the principal axes are the eigenvalues λ 1,2 = 1 2 (j 11 j 21 +j 12 j 22 )± 1 4 (j 11 j 21 + j 12 j 22 ) 2 + j 12 j 21 det J of A. Since the j kl are positive and det J < 0, the second term is lesser than the first and the cross section is definitely elliptic. Although global stability rules out det J > 0, we can conclude that the instability proceeds through a pitchfork (codimension one) bifurcation along the minor principal axis (because of the double role of det J), not a Hopf (codimension two) one.
For the remaining terms, we note that Equation (7) can be written as ∇ = 1 ρτ 1 τ 2 det J j 21 0 0 −j 12 J(x − s) and recall that s k (i k ) have sigmoidal shape. So at large |x|, the component Js will tend to different constantsaccording to the signs of i k -so the asymptotic contribution of these terms will be piecewise linear, namely a collection of half planes. The reduction to the uncoupled case can be safely done by writing j 12 = ǫ and j 21 = αǫ: − αǫ S 1 (i 1 ) − S 1 (µ 1 ) − ǫ S 2 (i 2 ) − S 2 (µ 2 ) ρτ 1 τ 2 (j 11 j 22 − αǫ 2 ) . 8 Incidentally, note that j 11 j 21 x 2 1 − 2j 12 j 21 x 1 x 2 + j 12 j 22 x 2 2 ≡ (Jx) 1 (Jx) 2 − x 1 x 2 det J. To first order as ǫ → 0, one retrieves (x) ≈ ǫ ρτ 1 j 11 αj 11 τ 2 j 22 so by choosing α = τ 2 j 22 τ 1 j 11 and ρ = ǫ τ 1 j 11 , A popular choice-that can be cast in the form of Equation highlights the cores of the response functions while keeping the global landscape 10 . As a check of Equation (8), we show in the next subsections the mechanism whereby Equation (6) can sustain bistability.

Analytically, for
Steplike Response Function s k (i k ) : = ν k θ(i k ) • For µ k < 0 (k = 1, 2), there is no question that x = 0 is a fixed point (we may call it the "off " node); Equation (8) reduces to its first term and (0) = 0. • By suitably choosing the half planes-taking advantage of the relative sign in the numerator of the second term in Equation (8)-another fixed point N : = (ν 1 , ν 2 ) T (the "on" node) can be induced 11 If µ 1 is varied (as in [31,40]), equistability is achieved for 9 Using s k (i k ) : = tanh i k , Tsodyks et al. have reported a paradoxical increase in x 1 as a result of an increase in µ 2 . Unfortunately, this occurs for det J > 0. What we can assure is that there is a saddle point involved. 10 For µ k = 0, Equation (10) can be arrived at from Equation (9) given that for β k → ∞, β −1 k ln cosh β k x → |x| − ln 2. Once Equation (10) is obtained, one can let µ k → 0. 11 (through an inverse saddle-node bifurcation at the "on" location: in one variable, The intersection of the cores of the s k (i k ) 12 is a (singular in this limit) saddle point. Figure 1B illustrates this situation for the parameters quoted in the caption (the choice obeys to the fact that global stability makes condition j 21 ν 1 − j 22 ν 2 > −µ 2 rather stringent).
If there is room for some spreading of the core, as seen in Figures 1B-D, the former result remains valid for whatever analytical form of the response functions. In such a case, the saddle point will be analytical.
In the singular limit s k (i k ) : = ν k θ (i k ) we deal with in this subsection, we can prove rigorously the nonexistence of limit cycles (at least for large µ k < 0, k = 1, 2). The Bendixson-Dulac theorem states that if there exists a C 1 function (x) (called the Dulac function) such that div( f) has the same sign almost everywhere 13 in a simply connected region of the plane, then the plane autonomous systemẋ = f(x) has no nonconstant periodic solutions lying entirely within the region. Because of Equation (2), Clearly, div f < 0 almost everywhere [i.e., except at the cores of the s k (i k )]. For µ k < 0 (k = 1, 2) and large, (x) will be essentially the quadratic form in the first term of Equation (8), so it meets the conditions to be a Dulac function in a simply connected region of the plane.
Equation (6), after 100,000,000 iterations with t = 10 −4 . In the contour plots of (x) of frames (b)-(d) of Figure 2, S k (i k ) − S k (µ k ) is given by Equation (12). Even though the details differ between Figures 1, 2, the structural picture (in particular, the inverse-direct saddle-node mechanism) remains the same.

DIFFERENT RELAXATION TIMES
When τ 1 = τ 2 , then and d(x) = f(x) − r(x). However (x) can remain the same, as far as τ 1 τ 2 does not change. So whereas the contour plots of the NEP in Figure 3 reproduce those of Figures 2C,D, the displayed set of trajectories (from random initial conditions within suitably selected tiny patches) have r(x) = 0 and consequently, many of them perform a large excursion toward the attractor.
Excitable events such as those described here by the NEP, in which the activity of excitatory neurons in the population shows a sharp peak, are known in the computational neuroscience literature as "population bursts." These are brief events of high excitatory activity in the neural system being modeled. In neural network models composed of interconnected spiking neurons, they reflect a sudden rise in spiking activity at the level of the whole population (or a significant part of it), in such a way that a high proportion of neurons in the network fire at least one action potential during a short time window. Spiking neurons participating in the population burst are therefore transiently synchronized. In spite of not being able to properly capture synchronous phenomena, the Wilson-Cowan model may capture this phenomenon as a transient peak of activity that is later shut down by inhibition. But for more realistic models, additional biophysical mechanisms (such as actual spiking dynamics, refractory period of neurons or short-term adaptation) have to be considered since they are likely involved in population bursts on real neural systems  Frontiers in Physics | www.frontiersin.org 7 January 2019 | Volume 6 | Article 154 [41]. Population bursts have several computational uses; for example, they can be used to transmit temporally precise information to other brain areas, even in the presence of noise or heterogeneity [38].

CONCLUSIONS
Rate-(also called neural mass-) models have been a useful approach to neural networks for half a century. Today, their simplicity (not short of comprehensivity) makes them ideal to fulfill the node dynamics in, for instance, connectome-based brain networks. So the availability of an "energy function" for rate models is expected to be welcome news. Dynamical systems of the form given by Equation (6) admit a NEP regardless of the functional forms of the nonlinear singlevariable functions s k (i k ). Throughout this work, the latter are assumed to have the same functional form, of sigmoidal shape. But neither condition is necessary to satisfy the integrability condition.
A crucial observation about rate models-even the one put forward by Wilson and Cowan [31], and given by Equations (4)-(5)-is that they are asymptotically linear, so their eventual NEP can be at most quadratic. Then in principle, global stability rules out some coupling configurations. Obviously, this requirement can be relaxed if the rate model fulfills the node dynamics of a neural network, for what matters in that case is the network's global stability.
The here obtained NEP provides a more quantitative intuition on the phenomenon of bistability, that has been naturally found in real neural systems. Neural bistability underlies e.g., the persistent activity which is commonly found in neurons of the prefrontal cortex, a mechanism that is thought to maintain information during working memory tasks [36,42]. In the presence of neural noise and other adaptation mechanisms, bistability is also a useful hypothesis to explain slow irregular dynamics or "up" and "down" dynamics, also observed across cortex and modeled using bistable dynamics [43][44][45].
Our results open the door to considering the calculation of nonequilibrium potentials of rate-based neural network models, and in particular considering the implications of different biologically realistic dynamics in such potentials. One interesting possibility is to consider in our model the effect of short-term synaptic plasticity effects. Short-term plasticity has been shown to impact computational properties of neural systems, such as their signal detection abilities [46][47][48], their pattern storage capacity [49,50], or the statistics of neural bistable dynamics [44,45]. We would expect, for example, that changes in the ability of neural systems to detect weak signals due to short-term plasticity could be reflected in changes in the nonequilibrium potential landscape, making population bursts easier to be triggered by weak stimuli. Changes in the statistics of neural bistable dynamics, or 'up-down' transitions, could be reflected in swifts of the dwells in the landscape, and also on the statistics of real experimental data.
Finally, it is worth mentioning that the here obtained NEPvalid as argued for generic transfer functions s k (i k )-opens the door to the potential use of more generic rate models in the field of artificial neural networks and deep learning. By identifying the NEP with the cost function to be minimized, gradient descent algorithms can be used to train networks of generic Wilson-Cowan units for different tasks. This implies that more realistic and less computationally expensive neural population models can be trained and used for behavioral tasks, a topic that has gathered attention recently [51,52].

AUTHOR CONTRIBUTIONS
All the listed authors have made substantial and direct intellectual contribution to this work, and approve its publication. In particular, JM proposed the idea, contributed preliminary analytical calculations, and provided key information from the field of neuroscience.