Obtaining Arbitrary Prescribed Mean Field Dynamics for Recurrently Coupled Networks of Type-I Spiking Neurons with Analytically Determined Weights

A fundamental question in computational neuroscience is how to connect a network of spiking neurons to produce desired macroscopic or mean field dynamics. One possible approach is through the Neural Engineering Framework (NEF). The NEF approach requires quantities called decoders which are solved through an optimization problem requiring large matrix inversion. Here, we show how a decoder can be obtained analytically for type I and certain type II firing rates as a function of the heterogeneity of its associated neuron. These decoders generate approximants for functions that converge to the desired function in mean-squared error like 1/N, where N is the number of neurons in the network. We refer to these decoders as scale-invariant decoders due to their structure. These decoders generate weights for a network of neurons through the NEF formula for weights. These weights force the spiking network to have arbitrary and prescribed mean field dynamics. The weights generated with scale-invariant decoders all lie on low dimensional hypersurfaces asymptotically. We demonstrate the applicability of these scale-invariant decoders and weight surfaces by constructing networks of spiking theta neurons that replicate the dynamics of various well known dynamical systems such as the neural integrator, Van der Pol system and the Lorenz system. As these decoders are analytically determined and non-unique, the weights are also analytically determined and non-unique. We discuss the implications for measured weights of neuronal networks.


INTRODUCTION
There are many spiking models that exist in the literature that can be fit to reproduce the membrane potential and the firing rates of real neurons. Examples include the leaky integrate and fire neuron, the Izhikevich model (Izhikevich, 2003(Izhikevich, , 2007, the theta model (Ermentrout and Kopell, 1986), the quartic integrate and fire model (Touboul, 2008) and the adaptive exponential integrate and fire model (Brette and Gerstner, 2005;Naud et al., 2008). When these models are coupled together to form networks, one can predict the the macroscopic or mean field behavior of a network of these neurons via a suitably derived mean field system (Nicola and Campbell, 2013a,b;Nesse et al., 2008). This can even be done when one considers the effects of heterogeneity in the neurons (Nicola and Campbell, 2013b).
While the mean field system for networks of neurons with prescribed sources of heterogeneity is important for predicting the behavior of the network of equal importance is the inverse problem: given a particular macroscopic behavior or mean field system, what distributions of heterogeneity, either in the neuronal parameters themselves, or the synaptic weights are required to produce said behavior?
One possible numerical solution to the inverse problem is through the Neural Engineering Framework (NEF) (Eliasmith and Anderson, 2004). In the NEF approach, one can specify the macroscopic dynamics or mean field for a spiking neuronal network. Given a network of neurons with a source of heterogeneity, one can find a set of optimal linear weights, referred to as linear decoders, for their firing rates in such a way that the weighted linear sum of the firing rates optimally approximates any function of choice. This allows for specifying the network connectivity in such a way as to obtain arbitrary dynamics from the network(s) of neurons (Eliasmith, 2005). For example, the NEF has been used to develop a wide variety of models, including the most behaviorally sophisticated spiking neural model to date (Eliasmith et al., 2012) as well as more specialized models of path integration (Conklin and Eliasmith, 2005), working memory (Singh and Eliasmith, 2006), visual attention (Bobier et al., 2014), motor control (DeWolf and Eliasmith, 2011), various cognitive functions (Bekolay et al., 2014;Rasmussen and Eliasmith, 2014), and many others.
However, the optimality requirement in the linear decoders introduces complications in the NEF approach. The optimal decoders are computed via least-squares optimization which is a computationally-intensive process; and yet very little information about the network can be determined once the optimal decoders have been obtained. Additionally, one cannot determine how the distribution of heterogeneity in the tuning curves of the neurons is related to the other distributions across the network, such as the distribution of connection weights.
Here, we will show that if one loosens the optimality requirement in the linear decoders, it is possible to obtain linear decoders that converge to any function of choice in the large network limit. Due to their form, we will refer to these decoders as scale-invariant linear decoders. These scale-invariant decoders have several advantages over optimal decoders, at the primary cost of a slower convergence rate in network size. However, using any gradient descent algorithm that does not directly compute the Hessian, one can decrease the error of the scale-invariant decoders very rapidly with very few iterations for any finite network size.
In Section 2.1, we will quickly introduce the NEF. A more thorough introduction can be found in (Eliasmith and Anderson, 2004). This will be followed by Section 2.2 where we will demonstrate that as the networks become arbitrarily large, the optimal decoders tend to an asymptotic limit. This will be our motivation in defining a scale-invariant decoder. In Section 2 we will determine what this asymptotic limit is for the scalar case and for multivariable functions in Section 2.4. In Section 3 we will demonstrate how the decoders can yield the weights to couple neurons together and simulate spiking networks with the specified dynamics by using these weights.

The Neural Engineering Framework
Suppose we knew the firing rate of a class of neurons, f (I) as a function of the input current I. Then we can take any input variable x and linearly transform it into a current via I = αx + β. If we allow α and β to be drawn from a random distribution, then we can generate a network of neurons with firing rates f (α i x+β i ) where α i , β i are drawn from some specified probability distribution ρ α,β (α, β). As a function of x the curve f (α i x + β i ) is typically referred to as the tuning curve of neuron i. The output of these neurons is the sum of their weighted firing rates: (1) Thus, the network takes any input x belonging to the appropriate space, and transforms it into some functionĝ N (x). If for example we wanted to compute the function g(x), we would need to pick φ i such thatĝ N (x) ≈ g(x). The φ i are referred to as the linear decoders in the NEF approach (Eliasmith and Anderson, 2004). They can be determined by minimizing the the following functional with respect to φ over some region X in x (Salinas and Abbott, 1995;Eliasmith and Anderson, 2004): where the first term in 2 corresponds to the error in the approximation and the second term penalizes large φ i . Minimizing C(φ) for φ yields the following linear system of equations: Equations (3-5) correspond to standard function approximation (Bishop, 1995), although the basis functions f are randomly drawn. We will refer to the optimal decoders as and any other decoder as φ. There are various functions f that have appeared in the literature. These are derived from complicated neural models using topological normal form theory (Ermentrout and Kopell, 1986;Izhikevich, 2007), are fits to experimental data from real neurons (Shriki et al., 2006), or are analytically derived from integrate and fire neurons. The general form the integrate-andfire models we will consider is given bẏ Specific examples include: Leaky Integrate-and-Fire Model (Lapicque, 1907Abbott, 1999Brunel and Van Rossum, 2007) F(v) = v 2 (Quadratic Integrate-and-Fire Model) (Izhikevich, 2003(Izhikevich, , 2007 (10) (Ermentrout and Kopell, 1986) (11) − v (Exponential-Integrate-and-Fire Model) (Brette and Gerstner, 2005;Naud et al., 2008).
Other FI curves are fits to the measured FI curves of more sophisticated conductance based models or experimental measurements. For example, the function can be fit to type-II firing rates when c > 0, and can be shown to be the steady state firing rate for neurons that display spike frequency adaptation when c = 0 (Ermentrout, 2006). Equation (13) has also been fit to conductance based models (Shriki et al., 2006) (with x = 0) and adequately describes the FI curves for many real cortical neurons (Stafstrom et al., 1984;Azouz et al., 1997;Ahmed et al., 1998). As x is often thought of as a real world input variable in the NEF approach, the α i , β i distribution can only be known once one specifies a distribution of maximal firing rates, r i and x-intercepts, a i for the tuning curves. For the time being, we will restrict the variable x to the interval [−1, 1]. It can be rescaled to an arbitrary interval, so this is no loss of generality. We will show later how x can also be extended to a vector as in the original NEF framework (Eliasmith and Anderson, 2004;Eliasmith, 2005). Once one specifies the distribution of (r i , a i ), one can obtain a transformation of random variables. First, let us consider a population of neurons such that the maximal firing rate is achieved at x = 1: We will refer to these neurons as ON neurons as the neurons can either increase in firing rate with respect to x (ON neurons) or decrease (OFF neurons). The maximal firing rate for the ON population is reached at x = 1 (Eliasmith and Anderson, 2004) while the maximal firing rate for the OFF population is reached at x = −1. To generate a population of OFF neurons, we can reflect the tuning curves in the x = 0 axis by multiplying α i by −1, which yields the following pair of transformations: where the ± indicates ON/OFF, respectively and by f −1 (x). Note that one can generate a population of ON and OFF neurons in different ways, for example by reflecting about the x = a i axis for each neuron. We will treat a i and r i as our primary sources of heterogeneity in the case of approximating a function of a single variable and we will assume that the marginal densities are given by ρ a (a) and ρ r (r). Furthermore, we will write a ± i and r ± i to distinguish between the heterogeneous parameters for the ON(+) and OFF(−) populations. In this case, we can rewrite the sum (1) asĝ where the first half represents the population of ON neurons and the second half of the sum represents the population of OFF neurons. Note that when we refer to the heterogeneous parameters r max i and a i for the ON and OFF populations, we do not imply that they have the same value for both populations for i = 1, 2, . . . N and they should have a superscript ± to denote which population the parameter belongs to. We do not include this superscript and note the abuse of notation for readers.
Suppose, for example, we wanted to approximate the function g(x) = x using a population of 50 quadratic integrate and fire tuning curves with 25 ON and 25 OFF neurons. This is shown in Figure 1 where the decoders are given by Equation (5). Note that reasonable accuracy is achieved despite the small population of neurons.
So far this has been fairly standard function approximation with a non-orthogonal basis (Bishop, 1995). The difference in the NEF approach is that one uses these linear decoders obtained from the firing rate curves to design a network of spiking neurons and the function is represented in the output of the network simulation. For example, the differential equation for the quadratic integrate and fire model is given bẏ where if v(t − ) = ∞, v(t + ) = −∞. This can be written as the equivalent θ model with the transformation v = tan(θ/2) yielding:θ which produces a spike when θ (t − ) = π and is reset to θ (t + ) = −π. Each of these differential equations generates a sequence of action potentials at specific spike times, t jk where t jk is the time of the kth spike fired by the jth neuron. These spike times are then fed into a post-synaptic filter s(t); The linear decoders, φ j are used to weight the spikes for their respective neuron. This post-synaptic filter equation can be explicitly integrated to yield: where E(t) = exp(−t/τ s ). The integrated spike train for the jth neuron is approximately equal to its tuning curve, f (α j x + β j ): provided that x varies on a suitably slow time scale (Dayan and Abbott, 2001;Eliasmith and Anderson, 2004). In this case, the dynamics in Equation (19) are approximately given by This allows one to approximate an arbitrary dynamical system (Eliasmith, 2005). For example, if we consider a recurrent network (x = s), then to approximate the dynamics s ′ = F(s) we merely require and where the φ i are given by Equation (3). Returning to the neural equations, if we take x = s and consider a recurrently coupled network of neurons then we have the following: where ω ij = α i φ j is the NEF equation for the weight coupling neuron j to neuron i (Eliasmith and Anderson, 2004;Eliasmith, 2005) and the quantity is the post-synaptic current going to the ith neuron.
For example, if we wanted the macroscopic dynamics to be exponential decay, F(x) = ks, then we requireĝ(s) = s(1 + τ s k). We would obtain the φ i by using Equations (3-5) which yields the optimal decoders i forĝ(x) = x(1 + τ s k) and simulate our spiking network using the weights ω ij = α i j . This yields a recurrently coupled spiking neural network with macroscopic dynamics s ′ = ks.
In addition to recurrent networks, one can also construct feedforward networks with the NEF approach. For example, we can also treat x as an input variable. This allows a network to represent an input variable x in terms of its spiking. If τ s is not large, then one can represent the input variable x as a postsynaptic current s: assuming that x varies on a suitably slow time scale (slower than τ s ). This is shown for example with networks of various sizes in Figure 2, with a synaptic time constant of τ s = 5 ms approximating the function g(x) = x. The network of differential equations for the neurons is simulated using Equation (17). These neurons then generate a spike train which is weighted by the decoders. The weighted spike train is fed into the post-synaptic current variable s(t), which acts as the approximation for g(x) = x. A time varying x(t) is used that varies on a suitably slow time scale.
It is clear that given the fact that arbitrary functions or dynamics (via recurrent networks) can be computed, then one can generate multiple networks that perform different functions, and feed into one another. In this way, one could create large networks composed of interconnected subnetworks that perform functions such as controlling limbs, detecting objects, and performing tasks by using the mathematical approaches that already exist for accomplishing these feats and translating them into an equivalent neural network representation. This is the core idea in the NEF (Eliasmith and Anderson, 2004;Eliasmith et al., 2012).
Although a network of N = 100 neural tuning curves f (α i x + β i ) is sufficient for a good approximation of many functions, depending on the dynamics being computed, significantly more neurons are needed in spiking simulations, as shown in Figure 2. Hundreds, if not thousands, of neurons are necessary for adequate approximation when spikes are used. The network size becomes even larger when we want to perform complicated functions involving more then one variable x or functions with higher frequency oscillations present. As the decoders are determined by large matrix inversion (Equation 3), this can take quite a while when dealing with more then 5000 neurons on a conventional computer. Furthermore, the smaller the synaptic time constant τ s , the more neurons are required. This is due to the fact that Equation (21) is effectively a kernel density estimator of the firing rate and when the bandwidth is too small, the resulting estimate is under-smoothed, thus requiring more neurons for a comparable degree of accuracy The representation problem is approximated by networks of various sizes by imposing the condition (26). The neurons are quadratic integrate-and-fire neurons given by Equation (17). The spike train for (B) is plotted in (A). A time varying randomly generated signal (red) is fed into the network, and is computed via the synaptic current variable s(t) using Equation 19 (blue). As the network size increases, the approximation becomes better.
as that of a network with larger τ s . Furthermore, as the NEF weights are numerically determined (via the NEF decoders), the possible analysis is very limited. Thus, an analytical solution to the NEF decoders (and thus weights) would allow a greater insight into large networks and may also facilitate faster numerics.

Decoder Asymptotics as N → ∞
In order to proceed analytically, we will first look at the behavior of the optimal decoders for large networks (N → ∞). To facilitate plotting, let us consider the case where for an arbitrary neural model, , which reduces the random variables associated with the heterogeneity to the intercept variable, a ± i . Additionally, note that for Type-I neurons: which immediately implies that we can absorb the quantitiy into the decoder φ ± i and rescale any solution we obtain by this quantity at the end. With Equation (27), the sum in Equation (16) becomes: where the i are determined by Equation (5). One should note that there is an abuse of notation here, as the optimal decoders differ for the ON/OFF subpopulations for i = 1, 2, . . . N however we have used the same symbol to denote the optimal decoders, i for both populations. Additionally, the value a + i is the threshold to firing for the ith ON neuron while the quantity −a − i is the threshold to firing for the ith off neuron. If one were to plot the decoders for large N, then one can easily see that in the limit of large network size (N → ∞), the individual decoders vanish ( i → 0, not shown here). However, for increasing N, it seems that the quantity γ i = N i /2 converges to some non-zero value γ (a i ) and thus it appears that γ i converges to some function of the x-intercept, a i , the source of heterogeneity for the neurons. This is shown in Figure 3 for increasingly large networks. The quantity FIGURE 3 | Convergence of the optimal decoders to an invariant surface. (A-D) The function g(x) = x is approximated by networks of various sizes using Equation (17) with firing rate curves. Plotted are the optimal decoders i scaled up by the network size N for ON (black dots) and OFF (red dots) neurons as a function of the intercept, f(0) for QIF firing rate functions. The quantity N i appears to converge as N → ∞ to the blue curve. The blue curve is determined by using a uniform grid of neurons over the heterogeneous parameter a i and optimizing for the resulting decoders. N i /2 is plotted vs. a i . The predicted surface for convergence, γ (a), is also plotted which is determined by optimizing over a uniform mesh in the parameter space. We will refer to any γ i that satisfies γ i = φ i N/2 for some decoder φ i as scale invariant decoders and γ ± (a) as the decoder surface. We will not necessarily use the same decoders for the ON and OFF neurons hence the superscipt on γ (a). We will show in the subsequent sections how to determine the decoder surfaces for the type-I and type-II (approximate) firing rates for single variable and multivariable functions.
In order to determine the decoder surface analytically, we need to understand the behavior of the network as N → ∞. Using the scale invariant decoders from Equation (28): where the overline denotes the finite average over the inhomogeneity in the intercepts. We should expect that as N → ∞, the finite network averaging turns into an expectation: where ρ a (a) is the probability density describing the heterogeneity variable a i for the neurons. For the Equations (32, 33) to be valid for the optimal decodoers, we would need to formally show that i → 2γ (a i )/N, where i are the optimal decoders for some particular scale-invariant decoder γ (a i ). However, this is unnecessary as we can regard φ i = 2γ (a i )/N as a suboptimal decoder and independent of the optimal decoders which are generated by minimizing the integral Equation (2). In which case, the limit exists by the law of large numbers. Furthermore, as we shall show later, i will not necessarily converge to γ (a i ) as γ (a i ) is non-unique, where as i is the optimal decoder which is unique due to the quadratic error surface in C(φ).
Note that γ (a)ρ a (a) appears as a product in the integral. These terms can be collapsed into a single functionP ± (a) = γ ± (a)ρ a (a). We will refer to these quantities as the weighted decoders of the ON/OFF neurons and use the weighted decoders to define the linear operators; which we will refer to as the tuning curve transforms (TCT). The TCTs map functions from the space of the variable(s) assigned to the heterogeneous parameters to the space of functions we are trying to approximate. Note that these operators are actually applied to different weighted decoders as the decoder surfaces are different for ON and OFF neurons. Furthermore, the density ρ a (a) need not be identical for both ON and OFF neurons. However, in all numerical implementations, ρ a (a) will be identical for the sake of simplicity. Suppose we could determineP(a) analytically. In this case, as γ (a)ρ a (a) =P(a), whenever ρ a (a) = 0, we can compute γ (a) =P(a)/ρ a (a) and leave γ (a) undefined otherwise (as there is no neuron that has parameter(s) in this region). Now, given the fact that we obtain a linear operator as N → ∞ case, the real problem becomes in finding the (P + ,P − ) such that M(P + ,P − ) maps to g(x), the function we want to approximate. That is, we have to invert the operator M for theseP. If we know theseP, then as we presumably know the distribution of tuning curve intercepts, we can determine the decoders φ i with: and thus the analytically determined scale-invariant decoders γ (a i ) are effectively weights for a Monte Carlo estimate of the integral operator TCTs.
Here, we will explicitly invert the tuning curve transforms for single variable functions in Section 2.3. The resulting equation for the weighted decoders is a convolution integral. In Section 2.4 we will show that with a basis to basis mapping, one can also invert the tuning curve transforms for multi-variable functions.

Single Variable Functions
If we work with the operators L + and L − separately, the problem becomes entirely tractable. One of the surprising things about the operators L + and L − is that provided that the functions we are considering are constrained to a subset where g + (x) vanishes to first order at x = −1 and g − (x) vanishes to first order at x = 1, and are both smooth, then the operators are invertible analytically on this constrained subspace of functions using Laplace transforms (see Appendix). Additionally, by using bothP + (a) andP − (a), one can compute any smooth function irrespective of the conditions at x = ±1. Furthermore, piecewise smooth continuous functions can also be computed (see Supplementary Materials). Closed form solutions do not exist for all neuronal firing rates as the Laplace transform cannot always be inverted explicitly. However, the type-I and type-II firing rate models do have analytically determined decoders. For the type-I/theta neuron firing rate, we have (see Appendix for derivation): The weighted decoder solutions for the type-II firing rate are also contained in the Appendix. For example, we have approximated the function g(x) = sin(2πx) using 10,000 type I tuning curves, as shown in Figure 4. An important thing to notice is the linearity in Equations (38,39) and (36) in the target function g and thus linearity for the scale-invariant decoders γ (a). Furthermore, due to the fact we have consideredP ± (a) to be separate for the ON and OFF populations, our operator for determining g( However, while our range in L + and L − was constrained, it was not constrained enough to provide a unique solution to M(P + , P − ) = g(x). In particular, if we consider any function ǫ(x) that lies in both admissable spaces (vanishes to first orders at x = 1 and x = −1), then ǫ(x) can be represented by both populations withP ± ǫ (a), respectively and thusP are also valid solutions to M(P + , P − ) = g(x). One can interpret this as a degree of freedom in terms of the decoders (and thus the synaptic weights). For example, we can use e(x) to minimize the expected squared error or other criterion. Thus, in the following we will strictly assume that e(x) = 0. We should note that it is possible to numerically invert the Laplace transforms resulting from the derivation process for the other firing rate curves. However, for now we will primarily work with the type I and type II curves. Our numerics will also primarily consist of networks of theta neurons (type-I).

Convergence Rate for Single-Variable Functions
With our decoder surfaces in hand, we can now proceed to determine the various convergence properties in the limit as N → ∞. In particular, we have the following: . The conjugate gradient descent improves the approximation (from 6*10 −1 to 7*10 −5 ) of magnitude while still maintaining a tight correlation with the scale-invariant decoders (p = 0.9811). The optimal decoders have a mean-squared error of 9*10 −7 for comparison purposes.
which immediately implies that our approximantĝ N (x) converges in mean-square to g(x) pointwise in x provided that: A derivation of Equation (42) can be found in the Supplementary Materials. Letting γ N = (γ (a i )/N, . . . γ (a N )/N), then we can also consider how the distribution of the quadratic cost function C(γ N ) scales as N → ∞ ; from Equation (2): where e i is the symbolic variable ± denoting the identify of a neuron as OFF/ON. One can show that provided that Equation (43) holds, and γ ± (a) is bounded on [−1, 1], then we have the following which implies that as N → ∞, then C(γ N ) → 0 in a meansquare sense. This is proven in the Supplementary Materials. As the cost function is strictly positive, then we can interpret this as γ N minimizing the cost asymptotically. For other neuronal models, one can merely use the maximum firing rates and intercepts to approximate their tuning curves with the type I standard form or the linear firing rate tuning curves. This will yield scale-invariant decoders that can be used on the tuning curves for the actual neuronal model with some degree of error. Additionally, gradient descent algorithms can be used to refine the scale-invariant decoders and that take into account the systematic error in using the type-I/type-II tuning curve approximation.

Multivariable Functions
It is clear that in the preceding section, one could approximate any arbitrary single variable function using scale invariant decoders. The same can be said about multi-variable functions. We will first introduce linear encoding for multi-variable inputs. In the NEF, it is assumed that the current input into each neuron takes the form: where e is the encoding vector that lies on the n-dimensional unit sphere and x lies in the interior; e, x is the standard dotproduct. The maximum firing rate occurs when x = e, and thus e, x = 1 is the maximum. The vector e is also referred to as the preferred direction vector. In this case, there are no ON and OFF neurons as they are effectively taken care of by the angle in between x and e. If x and e are colinear, then the maximum firing rate occurs when e = x, and the firing rate is zero when e, x = a, the equation for the hyperplane with normal vector e. Note that because the unit sphere in one-dimension is merely ±1, we have a direct correspondence with the e i from the single variable analysis in the previous section.
Once again, we can non-dimensionalize: To simplify the situation, we will again assume that f −1 (r i ) = 1 − a i , to remove this term. As before, for type-I firing rates this occurs without any loss of generality. While it may seem like this setup complicated matters somewhat, in the limit that N → ∞, the end result is simpler then the single variable case as we can make use of the orthogonality of the trigonometric functions to derive an appropriate basis to basis mapping. Consider suboptimal decoders of the form φ = γ (e,a) N = γ e (e)γ a (a) N . For a separable decoder we have: which in the large network limit becomes: by treating z = e, x and using Equation (38) to determinê P(a). The specific form ofP a (a) that performs this transformation varies from neural model to neural model. For the type I/type II firing rate, it is given by a recurrence relationship in terms of the binomial exponent n and is included in the Supplementary Materials. WithP a (a) determined, the decoderŝ P e (e) are characterized by the integral equation: e (e)( e, x + 1) n de To proceed further, we will exploit the orthogonoality of the Fourier series in a hyper-spherical coordinate system. For example, in two-dimensions we have: The second term in the integrand is a polynomial in cos(θ ) and sin(θ ). By DeMoivres formula, this can be expressed as a Fourier series with coefficients that depend on x and y where the series contains no cos(mθ ) or sin(mθ ) for m > n. Thus, we can extract out polynomial basis functions in the x and y usinĝ P(θ ) = cos(mθ ) sin(kθ ) for m, k < n. For example, the first feŵ P(θ ) and the correspondingĝ N (x) are shown in Table 1.
As we can obtain a polynomial basis where the maximum polynomial power is arbitrary, we can approximate any arbitrary integrable function. In general, one uses a sequence of trigonometric basis in the heterogeneous space to yield a polynomial basis in the function approximation space. One may wonder if the non-uniqueness in the scalar case was due to the peculiarities of the unit sphere in 1-dimension (an isolated pair of points). This turns out not be the case (see Supplementary Materials).

Convergence Rate for Multivariable Functions
One can again determine the order of convergence for the scaleinvariant decoders for a multi-variable funciton. An application of the law of large numbers yields: (51) and thus the expected square error converges like 1/N implying thatĝ N (x) converges to g(x) in mean-squared. The expectation is taken over the random variables e i and a i . As the convergence rate is somewhat slow, it is natural to ask whether or not it is possible to improve the the expected squared error. While there are many analytical paths one may take, we leave these approaches for future work. We will primarily use gradient descent variants that do not require computing the Hessian. If we knew the Hessian, then for a finite network we could immediately solve the system of Equations (3-5) as the problem is entirely quadratic and can be resolved numerically with the Hessian matrix. However, solving the quadratic problem with the Hessian requires large matrix inversion, and this is simply not feasible for large networks. Thus, we can use Hessian-free gradient descent methods. For example, one can use various conjugate gradient type algorithms to improve the expected  x 2 + y 2 + 2 1 π 2 1 2 (x 2 − y 2 ) sin(2θ) π 2 xy cos(2θ) π 2 squared error significantly with only a few iterations, and no large matrix inversion. Additionally, one can use the methods of stochastic gradient descent, such as weight perturbation, and node perturbation (Werfel et al., 2005). We will primarily use conjugate-gradient descent implemented with the PCG function in MATLAB (MATLAB, 2014). The crucial thing about these approaches is that we can obtain substantial improvements to the expected squared error with only slight perturbations to the scale-invariant decoders, as we shall see when we look at specific examples.

RESULTS
To simulate networks with arbitrary dynamics, we can use the decoders derived in the previous sections along with neurons that correspond to the appropriate firing rates (Eliasmith and Anderson, 2004). Suppose the variable s(t) represents a vector of decoded firing rates given by the following equation: Where φ j is the decoder for the jth neuron. Equation (54) is referred to as the rate equation while Equation (52) is the equation for s i under neuronal spiking with a simulated spiking neuronal network. The time constant used will be 50 ms unless otherwise stated. Using the same procedure as before, by integrating the spiking equation for s(t) explicitly, one can derive the NEF equation for the synaptic weights: where ω ij is the synaptic weight for the post-synaptic neuron i and the presynaptic neuron j in a recurrent neuronal network and φ j is the scale invariant decoder for the function

F(s)τ s + s
For a scale invariant decoder, this yields the following synaptic weight: Note that ω ij = F(a i , a j , r i , r j , e i , e j ), is a function of random variables for the presynaptic and post-synaptic neurons. Thus, instead of thinking the weights as a matrix of numerical values, or as a direct graph, one may think of the weights as defining a hypersurface in a higher dimensional space. For example, the formula (Equation 56) describes a hypersurface with 2m + 4 dimesions where m is the dimension of the dynamics the network simulates.
In the following examples, we will generate networks with these analytically determined weights using scale-invariant decoders that display the prescribed dynamics. Additionally, we will assume that The variable M controls the maximum firing rate of the neurons, with the range of maximum firing rates being between [0, √ 2M]. We take M to be 60 Hz for all subsequent numerics, unless otherwise specified. Note that we need α i to compute the weights. The α i differs depending on whether or not we are simulating a scalar system or a multi-variable system. For a scalar system, α i = M 2 e i where e i = 1 for ON neurons and −1 for OFF neurons. For a vector, α i = M 2 . For multi-variable dynamics we will assume uniform distributions in the hyperspherical coordinate systems. With the former assumption, the tuning curves for the neurons simply become M √ e, x − a where the M can be absorbed into the decoder. We will generate networks of spiking theta neurons that simulate a neural integrator, a Van der Pol Oscillator, and the Lorenz system.

Example 1: Neural Integrator
A neural integrator is a recursively coupled neural network that integrates an incoming signal, u(t). The coupling variable s(t), will have dynamics given by (59) where e i = 1 if neuron i is an ON neuron and −1 for OFF neurons. Note that we have scaled u(t) by τ s as this allows us to write: We then require the scale invariant decoders such thatĝ(z) A set of g ± (x) and the corresponding scale-invariant decoders is given by: which yieldsĝ(z) = z. From formula (14) for α i and (58) for the density of a i , and we have α i = e i M 2 and the neuronal weight All the synaptic weights here are given by 4 √ 2M/(Nπ) for ON/ON and OFF/OFF connections and −4 √ 2M/(Nπ) for ON/OFF and OFF/ON connections. Now, we will exploit symmetry and non-uniqueness to generate two more neuronal integrators with the same initial distributions of heterogeneity ρ a (a). In particular, consider the function ǫ(x) = (1 − x 2 ) 2 , this function lies in both function spaces for g ± (x) as it vanishes to second order at both boundaries. Additionally, it can be computed using the following scale-invariant decoders: which implies the following decoders for the ON/OFF population still give us g(x) = x (67) which yields the weight matrix Additionally, we can exploit symmetry to derive yet another weight matrix with precisely the same network of neurons: and thus, we have the following three separate weight matrices for i, j = 1, 2, . . . N that yield identical macroscopic dynamics from the same network of neurons as N → ∞. Furthermore, while all the weights converge to 0 as N → ∞, the scaled weights Nω ij do not converge toward one another in the same limit. One important point is that none of the weights necessarily satisfy the constraint that the action of neuron j on all its downstream targets is the same, either excitatory or inhibitory. Or more precisely that sign ω ij = sign ω i ′ j for all i, i ′ , j = 1, 2, . . . N. Fortunately however, this issue has already been dealt with in the existing literature (Parisien et al., 2008). To summarize, one is able to take the weights generated by the NEF solution, and linearly transform them to yield a new network consisting of excitatory and inhibitory neurons (instead of ON/OFF) with weights that respect this constraint, which is related to Dale's principle.
We have simualted four neural networks with 5000 neurons each all generated with the same random sample drawn from the distribution (Equation 58) using the weights given by , in addition to the weights generated by determining the optimal decoders. The scale-invariant decoders that correspond to Equations (73-75) were put through conjugate gradient finetuning with the final decoders being correlated to the initial decoders with a correlation coefficient greather than 0.98 in all cases. This lowered the root-mean-squared-error by 2-3 orders of magnitude with only slight perturbations off the scaleinvariant decoding surface in each case. A sample set of tuning curves is shown in Figure 5A with theĝ ± (x) that correspond to the four different weight structures in Figure 5B. The neural integrators are shown in Figures 5C-F. The synaptic weights that correspond to the integrators are shown in Figure 6. The neurons have been sorted into ON/OFF populations and increasing a within a sub-population prior to plotting the weight matrix in the left column of Figure 6. A sub-sample of 20 neurons is also selected (which is identical across the four integrator networks) and their weights are plotted in the right column of Figure 6. For the optimal decoders, g ± (x) ≈ ±(1 ± x)/2, which results in a weight structure similar to Equation (73). The weights differ substantially however in comparison to Equations (74) and (75) as theĝ ± differ substantially from ±(1 ± x)/2 This example illustrates that identical networks of neurons can have identical macroscopic dynamics with vastly different weight matrices. While this is not particularly surprising as going from a microscopic description (the individual weights) to a macroscopic description (the dynamics) of a dynamical system is seldom a unique process, the surprising thing is one can explore this issue analytically. An important point to note is that even though the weight matrices are non-unique, they all have the form and the weight matrices are nothing more than sample points drawn from different surfaces.

Exampe 2: Van der Pol Oscillator
The Van der Pol oscillator (Van der Pol, 1926) is given by the dynamical system: Here, we will simulate the oscillator with large networks of neurons using the scale-invariant decoders with conjugate gradient descent fine tuning. As the decoding is linear, then from the above equations we only require the decoders for the functions f (x, y) = x, f (x, y) = y and f (x, y) = x 3 . We use a two-dimensional spherical coordinate system for the encoding e i = (cos(θ i ), sin(θ i )) and assume uniform distributions in the a i and the θ i . Note that: which, immediately allows us to use the following scale-invariant decoders for the sub-functions x, y, and x 3 : which yields the decoders for F(x, y) and G(x, y): Thus, the weights are given by Both the rate equations and the spiking neural network are simulated using the scale invariant decoders to weight the firing rates/spikes. To more explicitly show the effects of the conjugategradient descent fine tuning, we have plotted the scale-invariant decoder surfaces Nφ F and Nφ G in Figure 7, in addition to the decoders after conjugate gradient descent fine tuning. The scaleinvariant decoders and the conjugate gradient descent fine-tuned decoders are again very tightly correlated with r > 0.95. We have simulated the Van der Pol Oscillator, as shown in Figure 8 in both the relaxation (µ = 5) and harmonic (µ = 0.7) oscillator regimes. In both cases, we have excellent correspondence with the network and the actual oscillator. The synaptic weight matrices are also shown in Figure 8 for a subset of neurons. Like the neural integrator, the weights again lie on a surface, only the surface is 3-dimensional.

Example 3: Lorenz Attractor
The Lorenz system is given by the equationṡ and is known to exhibit chaotic behavior for specific values of σ , ρ and β (Lorenz, 1963). In order to approximate the Lorenz system, we require the decoders for the functions x, y, z, xz and xy.
With a 3-dimensional spherical coordinate system, the encoding vectors e are given by e = (sin(θ ) cos(ψ), sin(θ ) sin(ψ), cos(θ )) where θ ∈ [0, π] ψ ∈ [0, 2π]. The decoders as a function of (ψ, θ, a) are given by: which yields the decoders for F, G, H as: The strange attractor generated by the Lorenz system and the neural rate equations using the decoders from Equations (88-90) are shown in Figure 9. The chaotic behavior and the strange attractor is also preserved when one uses a spiking neuronal network with the decoder weights on the spikes. Note that a great many neurons are required to adequately visualize the strange attractor, however the chaotic behavior is present even for smaller networks. We have also plotted the location of neural spiking with regards to the strange attractor. The neurons tend to spike more in specific regions of the strange attractor in accordance with their preferred orientation vectors and their a i parameters. The weights are again given by the NEF formula: which defines a five dimensional surface.

DISCUSSION
The NEF has been used to develop a wide variety of neural circuit models. The spiking networks generated from the NEF approach are spiking neural networks that are capable of functionally reproducing very sophisticated behaviors (Eliasmith, 2005). In the NEF approach, a synaptic weight between two neurons is a dot product of the post-synaptic neuron's preferred direction vector and the presynaptic neuron's linear decoding vector or "decoder." The decoders weight the tuning curves for the neurons and are determined by an optimization criterion that minimizes the L 2 error in the linear combination of tuning curves and the target dynamics of the network in addition to a factor that punishes the size of the decoders. The optimal decoders are unique, as they are determined by a convex optimization problem.
The first main point of this study is that in the large network limit, one can define a scale-invariant decoder that scales in inverse proportion to the network size N such that the scale-invariant decoders zero the cost function used to define the optimal decoders asymptotically in the mean-squared error. In the asymptotic limit, the scale-invariant decoders multiplied by the probability density governing the source(s) of heterogeneity converge to a constant function. The resulting linear combination of weighted firing rates converges to the tuning curve transforms asymptotically. We have determined this to also be the case with optimal decoders given by large matrix inversion, where the product of the optimal decoders and the heterogeneous density also converges to the weighted decoder. Furthermore, we have shown how one can invert tuning curve transform operators for type-I (theta model) and type-II firing rates. It turns out that the inversion is non-unique, which results in an infinite-dimensional family of scale-invariant decoder surfaces and thus synaptic weights.
The second contribution of this paper is a method of finding the scale-invariant decoders analytically from the decoded function and the probability density of the neuronal parameters. Additionally, we have demonstrated that as the weighted decoder is non-unique for any particular dynamics, the NEF solution to the weights with scale-invariant decoders also becomes nonunique.
Finally, we have demonstrated how the weights that couple neurons together can actually be thought of as surfaces defined entirely by the source(s) of heterogeneity in the network. In particular, a weight ω ij is defined as the inner product of a scaleinvariant decoder γ (e j , a j )/N of the post-synaptic neuron j with the encoding vector e i of the presynaptic neuron i. Thus, one can consider this inner product as a function over a higher dimensional space and the weights are merely sample points on this hypersurface. Furthermore, the weights generated via this approach have the advantage that the coupling between the jth presynaptic neuron and the ith postsynaptic neuron depends only upon the local properties of neuron i and neuron j for any specified dynamics. This is different from Hebbian plasticity, in that the coupling does not depend on presynaptic and postsynaptic activity, but rather on an intrinsic property of the presynaptic neuron times an intrinsic property of the postsynaptic neuron. While this perspective has been used before in the literature, for example to model hypercolumns in the primary visual cortex in (Shriki et al., 2006), here we show how it can be extended to arbitrary (smooth) dynamics.

Relationships between the Analytical and Optimal Decoders
The optimal decoders appear to have an asymptotic weighted decoder, the product of the density function multiplied by the scaled optimal decoder, N i , when one computes this quantity numerically after large-matrix inversion (see Figure 2). For the optimal decoders, the weighted decoder has high frequency oscillations that are related to the idiosyncracies of the particular sample of random neurons generated. These seem to attenuate with increasing network size, and regularization parameter λ. These high frequency oscillations are for example eliminated when the neurons are drawn from a grid. Indeed, when the neurons are drawn from a grid, the optimal decoders for a much larger network can be approximated by simply interpolating between the decoder values for the smaller grid network, and rescaling the interpolated decoders in accordance with the network size.
Thus, one may ask is the weighted decoder generated by the optimal decoders (1) convergent as N → ∞, and (2) does it converge to any particular weighted decoder in the set defined by the requirement thatĝ(x) = M(P + , P − ) = L + (P + ) + L − (P − ) = g(x)? It seems that numerically the quantity Nρ a (a i ) i does converge to some surface P ± that varies depending on the identity of neuron i as an ON/OFF neuron. The likely candidate for the specificP ± in the set defined by M(P + , P − ) is the surface that minimizes Equation (42). However, the relationship between the optimal decoders, and any particular scale-invariant decoder as N → ∞ is outside of the scope of this paper and is best left for future work. Relatedly, there seem to be some systematic differences between scale-invariant and optimal decoders due to finite neurons. For example, a scale-invariant decoder surface can have discontinuities, while corresponding regularized optimal decoder surfaces do not, even with very large numbers of neurons

Relationship to Other Approaches
We have demonstrated that if one defines a scale-invariant decoder that is a function of the source(s) of heterogeneity of the neurons, one can obtain arbitrary macroscopic dynamics with the NEF weight solution, which is the dot product of the decoder for the presynaptic neuron, and the preferred orientation vector or encoding vector for the post-synaptic neuron. Thus, both the weights and the decoders for the neurons can be thought of as surfaces as they are merely functions of the sources of heterogeneity of the network. While this perspective is novel in the sense that we have shown one can obtain arbitrary dynamics with this approach, similar decoders and weights have been suggested in previous work. In particular, the scale-invariant decoders here can be thought of as an extension of the neuronal population vectors of Georgopolous (Georgopoulos et al., 1986(Georgopoulos et al., , 1994. It has been previously noted (for example in Eliasmith and Anderson, 2004) that the population vector from (Georgopoulos et al., 1986(Georgopoulos et al., , 1994 is similar to the case where one uses the encoding vectors e i as the decoding vectors φ i . Here, we have derived a more general approach where φ i is a function of e i and the other sources of heterogeneity in the tuning curves. Additionally, weights that are a function of the presynaptic and post-synaptic preferred orientations have been also used in the literature (Ben-Yishai et al., 1995;Shriki et al., 2006).
There are also methods in the literature that construct networks of neurons with prescribed dynamics. For example, spiking networks of leaky integrate-and-fire models have been constructed that can display arbitrary linear dynamics with weights that are seemingly unrelated to the NEF solution for the synaptic weights (Boerlin et al., 2013). The solution obtained in (Boerlin et al., 2013) is derived through minimizing the time integral of the L2 norm of the decoded estimate of the dynamics,x(t) and the intended dynamics in addition to other terms intended to minimize the spiking, and distribute the spikes equitably across the network. The end result is a network of leaky integrate and fire neurons and weights that when coupled together display the desired dynamics. While the solution is very elegant, it is unknown if it can be extended to non-linear dynamics.

Numerical Applications
We considered whether this work has practical applications for neural simulations. As an example, we considered whether (having found the weighted decoders for a given network) it would be useful to adjust ρ in order to make the weights uniform, thus avoiding large fluctuations associated with the spiking of heavily weighted neurons. This sometimes led to modest reductions in spike noise in simulations.
As discussed above, scale-invariant decoders can serve as a starting point for iterative optimization, quickly leading to highly optimized weights for large networks. However, it should be pointed out that since the optimization problem is convex, any starting point can be used, and in our experience scale-invariant decoders are only moderately better than other reasonable choices.
It is possible that our approach could be applied to groups of neurons that are so large as to be impractical even for efficient iterative methods. The cortex consists of a fairly continuous sheet of neurons with few distinct boundaries, suggesting that it may be somehow useful to simultaneously consider the activity of billions of neurons. On the other hand, if the goal is simply to optimize synaptic weights, the degree of convergence onto single neurons (< 200, 000) is within a practical range for iterative methods. Furthermore, our solution for multivariable functions requires encoder distributions that are separable in hyper-spherical coordinates, which may be a limitation for modeling extensive sheets of neurons with overlapping tuning.
In light of these experiences, we consider this work to be valuable mainly as a source of new insights into network function and dynamics, rather than as a basis for new numerical tools.

Analyzing Measured Synaptic Weights
Using experimental methods, it is possible to measure both the synaptic weights (defined as the peak post-synaptic current) in addition to fitting integrate-and-fire type neurons with heterogeneity using the dynamic current-voltage curve approach (Badel et al., 2008;Harrison, 2014;Harrison et al., 2015). While there are typically many parameters for more complicated integrate-and-fire models (such as the AdEx), one can always reduce the number of heterogeneous parameters to a much smaller set governing the properties of the tuning curves. We refer to these generically as b i , a vector with the parameter values for the ith neuron. Given these assumptions, and the work done in this paper, one kind of analysis that can be conducted without much effort is the construction of a non-linear regression model of the weights: where F(b i , b j ) is either a non-linear or linear model with a number of free coefficients and ω ij are the experimentally determined synaptic weights of a recurrently coupled neural network. The coefficients can be estimated using optimization techniques to minimize ǫ ij . For this weight analysis to be valid, we require the following . The core result of this paper is that any dynamics are possible with synaptic weights of the form (Equation 92), and their is no unique weight matrix that confers specfic dynamics. Thus, a regression analysis of the synaptic weights (Equation 92) is a reasonable analysis to conduct if one knows the sources of heterogeneity for the neurons in the network and provided that the residuals are sufficiently small.

Future Work
While the networks constructed here display the desired macroscopic dynamics, this is not always the case. In particular, if the time constants are too small, then the collective macroscopic state can destabilize. For the weight solution we have determined to be valid, one needs to prove that the macroscopic dynamics form a stable attractor in the large network limit. Unfortunately, resolution of this problem is well outside the scope of this paper. While a great deal of work has been done in determing the stability of asynchronous states in large network limits (for example in Abbott and van Vreeswijk, 1993;van Vreeswijk, 1996), to our knowledge no work has been when the weights have structure present here. The majority of work done on large network stability analysis is devoted to weights that are constant throughout the network, ω ij = ω however there is some work on non-constant, randomly distributed weights (Hermann and Touboul, 2012). The authors of (Hermann and Touboul, 2012) note that oscillations arise when considering randomly distributed weights. Here, we demonstrate that with a little bit more structure to the weights/network (the weights are functions of the properties of the f (I) curves of the neurons), arbitrary prescribed macroscopic dynamics can be generated by the network. Networks of heterogeneous theta oscillators have been extensively analyzed in (Barreto et al., 2008;Luke et al., 2013;So et al., 2014) by using the Ott-Antonsen Anzats initially applied to networks of Kuramoto Oscillators (Martens et al., 2009;Ott and Antonsen, 2009). Additionally, one of the weight solutions for a network with one-dimensional dynamics that arises from the scale-invariant decoders sets all the weights to ±ω by setting the density to ρ ∝ |P(a)| where the constant of proportionality normalizes |P(a)|. Given that, it may be possible to apply some of the existing literature on the stability analysis of networks of heterogeneous theta neurons to this network.
In addition to stability analysis of the large network, the weight solutions were derived here under a pair of simplifying assumptions. In particular the two strongest assumptions were that the FI curves were constant in time, and that the neurons were coupled using current-based synapses instead of conductance based synapses. We intend to extend the approach we have taken here with scale-invariant decoders to neurons with conductance based synapses, and f (I) curves that vary due to forces like spike frequency adaptation (Ermentrout, 2006). Fortunately, some of the initial work on generating macroscopic rate-equations (a necessary initial step) for conductance based neurons has been done in (Ermentrout, 1994;Shriki et al., 2006), in addition to work on rate equations for adapting neurons (Nicola and Campbell, 2013b) AUTHOR CONTRIBUTIONS WN and MS performed much of the analysis. The numerical experiments were performed by WN with assistance from BT with regards to the Neural Engineering Framework. WN, BT, and MS contributed to the writing of the manuscript.