Structured chaos shapes spike-response noise entropy in balanced neural networks

Large networks of sparsely coupled, excitatory and inhibitory cells occur throughout the brain. A striking feature of these networks is that they are chaotic. How does this chaos manifest in the neural code? Specifically, how variable are the spike patterns that such a network produces in response to an input signal? To answer this, we derive a bound for the entropy of multi-cell spike pattern distributions in large recurrent networks of spiking neurons responding to fluctuating inputs. The analysis is based on results from random dynamical systems theory and is complimented by detailed numerical simulations. We find that the spike pattern entropy is an order of magnitude lower than what would be extrapolated from single cells. This holds despite the fact that network coupling becomes vanishingly sparse as network size grows -- a phenomenon that depends on ``extensive chaos,"as previously discovered for balanced networks without stimulus drive. Moreover, we show how spike pattern entropy is controlled by temporal features of the inputs. Our findings provide insight into how neural networks may encode stimuli in the presence of inherently chaotic dynamics.

If a time-dependent signal is presented to a network whose dynamics are chaotic and whose initial conditions cannot be perfectly controlled, how much variability can one expect in its responses? Such a scenario is central to questions of stimulus encoding in the brain.
In this article, we study population level spiking responses in neural networks with sparse, random connectivity and balanced excitation and inhibition. Such networks are ubiquitous models in neuroscience, and reproduce the irregular firing that typifies cortical activity. Moreover the autonomous activity of such networks is known to be chaotic, with extremely strong sensitivity of spike outputs to tiny changes in a network's initial conditions [1][2][3]. Remarkably, in these autonomous systems, the chaos is invariant to the network scale (i.e., it is extensive): the same spectrum of Lyapunov exponents recurs regardless of network size, even when coupling remains localized [4,5]. Our goal is to add a stimulus drive, and understand the implications for the network spike patterns that result -a task made challenging by the fact that spikes are related to phase space dynamics in a highly nonlinear way.
Intriguingly, when such chaotic networks respond to time-dependent signals, they produce spiking that is less variable than one might expect (c.f. [6,7]). In theoretical work, this has been attributed to low-dimensional chaotic attractors that "project" only intermittently to produce variable spiking in any given single cell [8]. Similar phenomena occur in in vivo experiments, where fluctuating sensory stimuli are repeatedly presented to an animal. Here, cortical neurons produce spikes with a wide range of variability, with some spikes repeatedly evoked with millisecond precision [9,10]. Information theoretic methods suggest that this type of "intermittent noise" may permit information to be encoded in the spike patterns that single neurons produce over time [10,11].
However, the impact of variability on network coding cannot be understood by extrapolating from single cells alone [12][13][14][15][16][17]. Thus, to eventually understand how network chaos impacts coding, we need to capture the multicell spike train variability in chaotic networks -and relate this to well-quantified measurements at the level of single cells. Direct, sampling-based approaches to this problem will fail, due to the combinatorial explosion of spike patterns that can occur in high-dimensional networks. Another method is needed.
Studies of variability in recurrent networks typically address two distinct properties. On one hand, there is the question of spike-timing variability, often measured by binarized spike pattern entropy and usually studied for single cells or small cell groups [10,12,18]. On the other hand, recent theoretical work investigates the dynamical entropy production of entire networks, quantifying the state space expansion globally [4,5]. It is not clear how these two quantities are related. Here, we extend the work of [8] to bridge this gap, leveraging random dynamical systems theory to develop a direct symbolic mapping between phase-space dynamics and binary spike pattern statistics.
The result is a new bound for the variability of joint spike pattern distributions in large spiking networks that receive fluctuating input signals. This bound is in terms of spike-response noise entropy, an information-theoretic quantity that is directly related dynamical entropy production. By verifying that the previous extensivity results of [4,5] continue to hold in the presence of stimulus drive, we show how the bound applies to networks of all sizes.
We then apply this bound to make two observations about the spike-pattern variability in chaotic networks. The first is that the joint variability of spike responses across large networks is at least an order of magnitude lower than what would be extrapolated from measurements of spike-response entropy in single cells, despite noise correlations that are very low on average. Second, we show that the spike-response entropy of the network as a whole is strongly controlled by the tradeoff between the mean (i.e. DC) and higher-frequency components of the input signals. Entropy increases monotonically with the mean input strength by almost an order of magnitude, even as network firing rates remain constant.

NETWORK MODEL
To develop these results, we use large random networks of N "θ-neurons", as in [4,8]. The state of each cell is represented by a phase variable θ i (t) ∈ [0, 1] where 0 and 1 are identified (ie. S 1 ) and a spike is said to occur when θ i = 1 ∼ 0. This model has non-dimensionalized units but is equivalent to the Quadratic Integrate-and-Fire model via a smooth change of coordinates [19]. In addition, the network receives a temporally structured input signal I(t), as described below.
The dynamics of the i th cell in the network are given by the random dynamical system (RDS) where F (θ i ) = 1 + cos(2πθ i ), Z(θ i ) = 1 − cos(2πθ i ) and ; else is a smooth coupling function with small support around θ j = 1 ∼ 0, mimicking the rapid rise and fall of a synaptic current (b = 1/20, d = 35/32). The ε 2 term comes from an Ito correction [20].
, represented by the last term in (1), models a temporal stimulus. It is a collection of N independent signals I i (t) = η + εdW i,t /dt driving each neuron, where the dW i,t /dt are quenched realizations of white noise -that is, scaled increments of the independent Wiener processes W i,t . Note that η controls the network's "excitability" and can take negative values [19] while ε ≥ 0 controls the amplitude of input fluctuations. Both parameters are constant across all cells. We begin by investigating network (1) in the excitable regime with parameters η = −0.5 and ε = 0.5. We emphasize that I is a signal and not stochastic noise, and study the solutions of (1) arising from distinct initial conditions (IC) but receiving the same input I. The model (1) has been analyzed previously for uncoupled neurons [21,22], and for a series of gradually more complex networks in [8,22,23], cf. [4].   We assign 20% of the N neurons to be inhibitory and 80% to be excitatory, meaning that outgoing weights of neuron j are either a ij ≤ 0 or a ij ≥ 0 respectively. The coupling matrix A = {a ij } i,j=1,...,N is chosen randomly with mean in-degree κ such that each neuron receives on average κ incoming connections from independently chosen neurons, from each excitatory/inhibitory population. Here, |a ij | ∼ O(1/ √ κ) when non-zero, in accordance with classical balanced state coupling [1]. Throughout, we set κ = 20 but find that as long as κ N , our findings are qualitatively robust to the choice of κ.
Two consequences of this connectivity will be important below. First, as the in-degree κ is the same for all neurons, the spiking statistics of single cells are fairly stereotypical across the network. This is evident in the spike rasters of Figure 1 (a). Second, the magnitude of inputs to single cells remains similar as network size N grows, because κ is fixed.

SPIKE-RESPONSE NOISE ENTROPY AND DIRECT ESTIMATES
To quantify spike pattern variability, we treat spike trains as binary time series. We discretize time in bins of width ∆t small enough so that for a given cell, each bin contains at most a single spike. Throughout, we use time bins of width ∆t = 0.05; we found that moderately different resolutions did not significantly affect our results. Let us define finite binary words for K neurons over L time bins starting at time t l = l∆t for some integer l: S KL (t l ) = {S k l , ..., S k l+L−1 } k=k1,...,k K with S k j ∈ {0, 1} (see Figure 1 (a)).
The variability of the evoked spike response S KL (t l ) is captured by the spike-response noise entropy (2) where P (S KL (t l )|I) denotes probability of observing word S KL (t l ) conditioned on input I, given a random initial state of the network. This quantity may also be referred to as conditional response entropy. It is normalized to have units of bits per time-unit (bits/tu), as opposed to bits per time-bin, and thus represents an entropy rate in continuous time. Since the inputs I and network dynamics are statistically stationary processes [8], it follows that the expected noise entropy rate of KL words conditioned on any I from the same input distribution -controlled by the parameters η and ε -can be obtained from a long time average on any single I * (see eg. [18,24]): H KL noise (I * , t l ).
(3) As demonstrated in [18] and reviewed below, (3) can be used to estimate the true entropy rate of K-neuron groups considered when L → ∞. As we will see this is only practical for small K -we will need other tools to understand this quantity for entire networks (K = N ). Nevertheless, we begin by applying a direct sampling approach.
To estimate the probability terms in (2), we simulate network (1) in response to a randomly chosen, quenched I(t) for 10, 000 time units and 2000 "trials", distinguished by different ICs. Here, we wish to choose ICs from a distribution that best describes random network states, while being agnostic about its past. As discussed in [8], we assume that system (1) possesses an ergodic stationary probability measure µ(θ), which is the steady state solution of the Fokker-Planck equation associated with (1). Thus, µ is the probability measure describing how likely we are to find the network in a particular state at any moment in time, given any input I with identical statistics. We emphasize that µ serves only as an initial distribution, and that ensembles of "trial" trajectories as described above will have a very different distribution, as they are conditioned on a fixed input I(t). (See [8,22,23] for more details about this distinction).
To sample from µ, we first select seed ICs uniformly over the state space, and evolve each of these for a "burn" period of 50 time units, for which different inputs are presented. The resulting endpoints of these trajectories represent a new IC ensemble that approximates µ. From then on, all ICs are integrated using the same input I(t) and we use this solution ensemble to study variability of spike-responses.
From these simulated network trajectories, we extract the binary spike output of neurons across many trials (see Figure 1 (b) for a single neuron example). Normalized cross-trial counts of S KL words in consecutive, nonoverlapping L-windows serve as estimates of the probabilities P (S KL (t l )|I) in equation (2).

SINGLE-CELL VARIABILITY
We begin by computing noise entropy in the spike responses of single cells in the network. Using the estimation techniques described above, we compare the effect of chaos to that of commonly used independent noise models on noise entropy. This complements similar analysis in [8], which used a different metric of spike reliability from trial to trial.
We begin by randomly selecting a cell in our network and extract its binary spike output across many simulated trials (see Figure 1 (b)). Using this data, we estimate H 1L noise for word lengths up to L = 20 and plot the results in Figure 1 (c) as a function of 1/L. A system with finite autocorrelation timescales is expected to produce entropy rates that behave extensively as L becomes sufficiently large. This is readily apparent in the linear decreasing trend in H 1L noise as L grows, until a point where the estimate quickly drops due to insufficient sampling. Following [18], we use the point of least fractional change in slope to extrapolate this extensive trend and obtain an estimate for lim L→∞ H 1L noise (intersection with ordinates in Figure 1 (c)).
Our estimate of lim L→∞ H 1L noise is 1.12 bits/tu. We note that a "purely random", homogeneous poisson spike train with the same firing rate (0.8 spikes/tu) would have noise entropy H 1L noise of 3.67 bits/tu. Thus, while chaotic dynamics produce variable spiking in single cells, the resulting noise entropy is much less than that of a totally random response, a fact also evident from the spike rasters in Figure 1 Part of the reason for this difference is simply the presence of the stimulus; inputs from other cells in the chaotic networks also play a role. To isolate the network effect, we repeat the sampling process above by simulating our chosen cell in isolation, keeping the input I i intact but replacing the incoming spike trains it receives from upstream cells by two surrogate ensembles meant to isolate distinct statistical features of network activity. (i) Homogeneous poisson surrogates: independent, poisson dis-tributed spike trains with rate matching the mean firing rate of corresponding upstream cells. (ii) Inhomogeneous poisson surrogates: produced by independently drawing a binary random variable in each ∆t-bin, according to the time-dependent probability given by the normalized spike count of the corresponding network train across all original trials. For each new simulated trial, we draw independent surrogates. Figure 1 (c) shows a 66% increase in noise entropy rate for the homogeneous surrogates, and about 30% for the inhomogeneous case.
Overall, we have shown that single, stimulus-driven cells in chaotic networks produce spike-response entropy significantly lower than that expected for single, stimulus-driven cells receiving poisson background inputs, as in many statistical models. We next seek to characterize spike entropy in the joint responses of multiple cells.

MULTI-CELL VARIABILITY
Our network is connected -albeit sparsely (κ N ) -and it is not clear in advance how coupling interactions will impact the entropy rate of groups of cells. As a first step, we repeat the noise entropy estimate described above for a randomly selected pair of connected cells up to L = 10, and extrapolate lim L→∞ H 2L noise from this data. The black lines in Figure 1 noise /2, normalized to units of bits per time-unit per neuron for comparison with H 1L noise . Due to combinatorial explosion of possible spike patterns as more neurons are considered, we were unable to compute such estimates for K greater than 2. Nevertheless, it appears from the K = 2 case shown that interactions between neurons conspire to lower response noise entropy per neuron, if only by a small margin.
However, this margin could easily be missed. For a given neuron pair (i, j), consider the difference between the sum of independent cell entropy rates and their joint pair rate: From 45 random pairs of neurons, we obtain the average δ ij = 0.012 bits/tu. This implies a relative difference of the order of O(10 −2 ) when estimating the entropy rate of pairs of cells using their marginal, single-cell response distributions. We will see later these small differences compound significantly when considering the network as a whole (cf. [12]).
To quantify the extent of these interactions over space and time, we compute the Pearson correlation coefficient c ij (t l ) between the spiking probability of two cells i and j in time bin t l . That is, we measure the cells' instantaneous noise correlation. Figure 2 (a) shows a typical histogram of c ij (t l ) across all neuron pairs of a network with N = 500 for a fixed t l , where pairs with zero spiking probability were discarded. We can see that at a fixed moment, correlations are weak and most cells are uncor-related. Moreover, these correlations are not static: a high correlation between two cells in one time bin does not guarantee that they will be correlated in another. This is illustrated by Figure 2 (b), showing a histogram of c ij (t l ) across 10000 time-units between two randomly chosen connected cells.
We emphasize that this weak and highly dynamic correlation structure might easily be dismissed as negligible experimentally. If one would choose a single pair of cells and measure the temporal average of c ij (t l ) over 500 time units, one obtains an average of the order of 10 −5 (over 4950 cell pairs tested), and standard deviation of the order of 10 −2 (across the 4950 cell pairs.) In other words, each individual cell pair appears to be almost completely uncorrelated -at least on average. Below, we will show that the weak, transient dependencies that are in fact present among neurons nevertheless have a very strong impact on network-wide noise entropy. To summarize, measures of entropy and correlations indicate that there are noticeable but weak dependencies in the spiking activity of connected pairs of cells. Scaling up from such dependencies to accurately describe the joint activity of an entire network is a notoriously difficult problem. We take an approach based on RDS in what follows. This approach will quantify the entropy H N L noise of the network as whole, as networks size N grows.

A BENCHMARK FOR NETWORK ENTROPY
To benchmark H N L noise , we first describe the joint network entropy that would be naively predicted by direct extrapolation from single cells. In other words, this is the estimate one would obtain by ignoring statistical interactions between neurons. Notice that unlike cell pairs, spiking statistics of single neurons are expected to be unchanged by network size N with fixed in-degree κ. Moreover, the entropy of a multivariate distribution is always greater or equal to the sum of the marginal distributions' entropies. If H 1 denotes the average of lim L→∞ H 1L noise over all neurons, then it follows that N H 1 ≥ lim L→∞ H N L noise . Figure 2 (c) shows this estimate as a function of N . The slope H 1 was sampled over 20 neurons in a N = 500 network using the same extrapolation technique as in Figure 1 (c). We verified by spot checks that single cell activity in networks of different sizes agree with this extrapolation (see markers in Figure 2 (c)). Next, we leverage dynamical properties of our network to estimate how much reduction in entropy can be expected from the joint activity of entire networks in comparison to this naive extensive bound.

DYNAMICAL ENTROPY PRODUCTION
In what follows, we use symbolic dynamics to map between the phase space of our network and the set of binary spike trains. Consider trajectories θ(t) = (θ 1 (t), ..., θ N (t)) of model (1), evolving on the Ndimensional torus T N . Recall that a spike from cell i occurs when θ i (t) = 1, and will lead to S i l = 1 in the corresponding time bin. Notice that the phase response curve Z(θ i ) modulates the effect of any input on neuron i -whether that input comes from the signal I(t) or from network activity -and that it vanishes at θ i (t) = 1. This implies that a neuron becomes insensitive to any inputs when it is about to spike. Indeed, the Taylor expansion of neuron i's dynamics about θ i = 1 is constant up to quadratic order: Based on this observation we make the approximation that for ∆t small enough, neuron i spikes in the time bin [t, t + ∆t] if and only if θ i (t) ∈ [1 − 2∆t, 1) (see Appendix for verification).
Thus equipped, consider the partition of T N : Γ * = {γ 0 , γ 1 } N , built of Cartesian products of intervals γ 0 = [0, 1 − 2∆t) and γ 1 = [1 − 2∆t, 1) across all θ i 's. At any time t l = l∆t, the Γ * -address of θ(t l ) determines the binarized spiking state of the network in time bin [t l , t l + ∆t]: θ i (t l ) ∈ γ 0 ⇒ S i l = 0 and θ i (t l ) ∈ γ 1 ⇒ S i l = 1. In order to describe L-long spike trains in terms of Γ *addresses, we must understand how solutions θ(t) evolve with respect to Γ * . To this end, consider the discretized dynamics given by the transition maps Φ t;I that send T N onto itself according to the flow of (1) from t to t + ∆t. If θ(t) is a solution of (1), then Φ t;I (θ(t)) = θ(t + ∆t) where ∆t refers to the resolution of our binary spike trains S N L . Note that the maps Φ t;I depend on both t and I, are generally smooth with smooth inverses (diffeomorphisms) [25], and together form a discrete RDS. For detailed geometric properties of the RDS defined by system (1), we refer the reader to [8].
For what follows, it is convenient to reverse time and study spike trains and trajectories starting in the distant past leading up to t = 0. This representation is statistically equivalent to forward time since our networks have statistically stationary dynamics [8]. Consider now the l- It follows that this new partition offers a one-to-one correspondence between its member sets and the space of all S N L spike trains. Note that many sets in this partition will be empty since not all spike sequences are accessible by the network. In fact, the number of non-empty sets remaining in ∨ L−1 l=0 Φ −l 0;I Γ * as L → ∞ represents the number of allowed infinite spike sequences. Furthermore, for a given S N L and its associated set B(S N L ) ∈ ∨ L−1 l=0 Φ −l 0;I Γ * , the probability of observing spike pattern S N L can be stated as an initial state probability in the distant past: P (S N L |I) = P (θ(−L∆t) ∈ B(S N L )).
As discussed above and in [8], we assume that our RDS possesses an ergodic stationary probability measure µ. Recall that we assume random ICs forming our distinct trials are drawn from µ. It follows that lim L→∞ P (S N L |I) = µ(B(S N L )). Thus, if we let it follows that For any dynamical system, the expression (4) measures the amount of uncertainty produced by chaotic dynamics if we can only observe the system with the precision given by the partition Γ * . This concept is generalized by the Kolmogorov-Sinai entropy h µ , also called dynamical or metric entropy [26,27], defined by where the supremum is taken over all finite partitions Γ. This quantity is related to the Lyapunov spectrum λ 1 ≥ λ 2 ≥ ... ≥ λ N of a dynamical system which measures rates of exponential divergence or convergence between trajectories. Lyapunov exponents λ i are expected to be well defined for our RDS in the sense that they generally on system parameters on system parameters such as coupling strength and the mean and variance of inputs, but not on specific realizations of the inputs I(t) [28]. The authors of [29] showed that although the join of a partition Γ depends on I, h µ does not and that under some ergodicity assumptions, the following entropy formula holds: If λ i are the Lyapunov exponents of the original system (1) computed over time-units instead of ∆t timesteps, we get from (4), (5), (6) and (7) the following upper bound for noise entropy rate : (8) which has units of bits per time-unit.
To evaluate this bound, we numerically compute the exponents λ i of system (1) and find that, as originally observed in [4,5] for autonomous networks, our driven system has a size invariant Lyapunov spectrum (see Figure 2 (d)), which is insensitive to particular choices of random coupling matrix A (see Appendix for details). This leads to a spatially extensive behaviour of the bound H KS , as shown in figure 2 (c).
Intriguingly, H KS is much smaller than estimates from H 1 . This reveals a central result for our driven chaotic networks: joint spike patterns are (at least) an order of magnitude less variable than what would be predicted by observing the spike train statistics of single cells, despite averaged noise correlations across neurons that are very low.

NOISE ENTROPY PRODUCTION AS A FUNCTION OF INPUT STATISTICS
Previous studies showed that the level of sensitivity emerging from chaotic network dynamics can be controlled by carefully chosen inputs (see [6,7] for different contexts). We verify if this is the case for our network.
We first identify a range of input statistics -the mean η and fluctuation amplitude ε -that are comparable in that they all produce the same firing rate as for the "standard" parameter set used above (η = −0.5, ε = 0.5). These parameters lie along the level curve in Figure 3 (a). Note that the curve is parametrized so that η grows while ε decreases; thus, as we travel along it, we gradually shift the dynamics from the excitable, fluctuation-driven regime (η < 0) to an oscillatory, mean-driven one (η > 0). In particular, the last point evaluated corresponds to a purely autonomous regime (ε = 0) where the input I has no fluctuating component.   Figure 3 (d) shows that, for the two extremes of the level curve, network noise entropy continues to be much smaller than that predicted from single cells, and that single-cell noise entropy appears to follow the same trends as H KS . We conclude that spike pattern variability emerging from chaos is not a fixed property of a network, but can be strongly modulated by the mean and variance of network inputs.

DISCUSSION
Biological neural networks may operate in a chaotic regime, with irregular activity driven by a balance of fluctuating excitatory and inhibitory interactions. This network chaos is under vigorous study, fueled in part by possible roles for chaos in generating "target" spatiotemporal patterns [30] and in enabling useful temporal processing of inputs [31,32]. Here, we address a complementary question -how much variability (or "noise") will chaotic dynamics add to network responses?
We compute bounds on network spike-response entropy that give novel answers. In particular, we show that the noise entropy of multi-cell spike responses is at least an order of magnitude lower that would be naively extrapolated from from single-cell measurements, under the assumption that spike variability is independent from cell to cell. The direction of the comparison between noise entropy of single cell and multi-cell spike responses agrees with intuition provided by the shape of the Lyapunov spectrum, which indicates time-dependent chaotic attractors of lower dimension than phase space. Thus, the phase space dynamics of each neuron are not independent. What we quantify explicitly is the order-ofmagnitude size of the effect, as it is manifested in the binary spiking outputs of the system -a fact which might seem especially striking given that pairs of spike trains appear to be very weakly correlated on average.
If one considers the level of noise entropy as an indicator of potential information contained in spike patterns, we show that balanced networks may be able to encode inputs stimuli using spike timing if these inputs contain strong enough temporal structure. This mechanism takes root in the complex noise-interactions that chaos induces between neurons. The extensive nature of this phenomenon suggests that this mechanism is scalable with network size. Moreover, the strong dependence of entropy on the input signal's mean and variance indicate that a network can operate in different "regimes" modulating the repeatability of spike patterns. This is in addition to known advantages of balanced networks, such as efficiently tracking changes in common, mean inputs with firing rates [1] -which may encode coarser statistics about inputs at the population level.
To formalize these notions, future work could seek to compute the mutual information between an input ensemble and a system's response. In order to estimate this quantity, one needs to compute the total entropy [24] of spike patterns, which captures how many distinct spike outputs can be produced by the network, for any input I. This quantity can be thought of as noise entropy marginalized over the set of possible inputs, and therefore depends on both network connectivity and single neuron attributes.
Finally, we expect that the H KS bound can be adapted to other neuron models provided a state space partition linking dynamics to spike patterns can be derived. This could prove to be a powerful tool to enquire about potential encoding schemes as a function of many network attributes such as spike-generating dynamics, connectivity, learning rules and input correlations.

ACKNOWLEDGMENTS
The authors thank Fred Wolf, Yu Hu and Kevin K. Lin for helpful insights. We also thank two anonymous reviewers for comments and suggestions that improved the manuscript. This work was supported in part by an NSERC graduate scholarship, an NIH Training Grant from University of Washington's Center for Computational Neuroscience, the Burroughs Wellcome Fund Scientific Interfaces, the NSF under grant DMS CAREER-1056125 and NSERC Discovery and CIHR operating grants. Numerical simulations were performed on NSF's XSEDE supercomputing platform.

Numerical simulations
Throughout the main text, we use data from numerical simulations of the network model described by (1). All simulations were implemented using a standard Euleur-Maruyama solver with time-steps of 0.005 time-units. We found that using smaller time-steps did not alter our results. The solver was developed using the Python/Cython programming language using the Mersenne Twister random number generator and postprocessing (spike binning and empirical noise entropy estimates) was carried out in MATLAB. Large simulations were performed on the NSF XSEDE Science Gateways supercomputing platform.

Lyapunov spectrum estimates
Although the Lyapunov exponents λ 1 ≥ λ 2 ≥ ... ≥ λ N of (1) do not depend on a particular choice of I or initial conditions (IC), computing them analytically is a very hard, if not an impossible, problem. Therefore, we use numerical estimates. While numerically integrating a solution of (1) above, we simultaneously evolve the linear variational equationṀ where J(t) is the Jacobian of (1) evaluated along the simulated trajectory. Here, M is a N by N matrix where M (0) is the identity. M (t) is orthonormalized at each time-step and the growth factors of each orthogonal vector obtained from the process are extracted to build estimates that converge toward the λ i 's, as described in [33]. This process was repeated for ten random choices of the input I and the initial states; trajectories were integrated for 5000 time-units. We verified that all reported λ i 's have a standard error less than 0.002 using the method of batched means [34] (batch size of 100 time-units). Figure 4 (a) shows converging estimates of the first 60 Lyapunov exponents over the initial 50 time-units. In addition, we find that distinct realizations of connectivity matrix A = {a ij } did not significantly affect the Lyapunov exponent estimates -and hence the sum of all positive ones leading to the Kolmogorov-Sinai entropy h µ . To illustrate this, Figure 4 (b) shows estimates of three λ i 's for three distinct systems, where input choice I, IC and A are all different.

Relationship between state space partitioning and spiking patterns
The derivation of the H KS bound relies on the simple assumption that neuron i will spike within ∆t timeunits if and only if θ i (t) ∈ γ 1 = [1 − 2∆t, 1]. We found that for simulated trajectories of 1000 time-units from network (1), only about 0.01% of all spikes violated the spiking assumption for ∆t = 0.05. This number dropped to zero for ∆t = 0.01. Such values are evidence that errors in relating spike train entropy estimates to entropy production in state space will be slight. As an additional check, we next compare the spiking statistics and entropy estimates for the main model (1) with those for an analogous dynamical system, for which our partition-based spiking assumption holds exactly, by design. Consider the piecewise model analogous to system (1): in which we replace the functions F and Z by the following piecewise-defined terms: It is easy to see that the partition-based spiking assumption holds exactly for the network defined by (10). However, notice that for ∆t > 0, bothF andZ are discontinuous functions of S 1 and that as a result, the Jacobian of (10) is ill-defined. Nevertheless, for practical purposes, we can simulate system (10) and approximate its Lyapunov spectrum, since there is only one discontinuity point per neuron and the probability of a finite-duration, discretized trajectory landing on such points is nil. The purpose of model (10) is to assess the differences arising between the dynamics of our full ("normal") model, given by Eqn. (1), and the alternate ("piecewise") model above for which the spiking assumption is exact. We fix ∆t = 0.05 as in the main text and begin by comparing single cell dynamics for the "normal" and "piecewise" models. Figure 5 shows a simulated single cell trajectory from each model, with identical input I i and identical incoming spike trains (extracted from a separate network simulation). This setup mimics the activity a single cell would receive when embedded in a network. Notice that apart from small discrepancies that sometimes arise between spike times, the two trajectories agree almost perfectly. When differences do arise, they are quite small. From a simulation yielding about 3000 spikes from both models, most corresponding spikes from the normal and piecewise models were indistinguishably close, down to the numerical solver's time-step. The maximal difference was about 0.02 time-units, smaller than a ∆t time-bin. Figure 5 (b) shows the first 60 Lyapunov exponents of a network with size N = 500, simulated with both the normal (1) and piecewise (10) models. Since Lyapunov exponents depend on the Jacobian of a system, we expected the piecewise model to yield smaller exponents: its derivative is zero on the intervals [1 − 2∆t, 1). Nevertheless, this discrepancy is minimal and amounts to a difference of about 0.002 bits per neuron per time-unit in the slope of the H KS estimates shown in Figure 5 (c). Finally, we empirically estimate the noise entropy bound H 1 , as described in the main text, for the piecewise model (10). Its value differed from the normal model estimate by about 0.01 bits per neuron per time-unit, well below the standard error of the mean of estimates from both models, as can be seen in Figure 5 (c).
In light of these tests, we are confident that the main result of the paper -a computable bound on spike-train noise entropy that is much lower than what would be extrapolated from single cells -is a robust phenomenon for networks of the type modeled by (1), rather than a consequence of a (seemingly tiny) approximation error.