Excitatory/inhibitory balance emerges as a key factor for RBN performance, overriding attractor dynamics

Reservoir computing provides a time and cost-efficient alternative to traditional learning methods. Critical regimes, known as the “edge of chaos,” have been found to optimize computational performance in binary neural networks. However, little attention has been devoted to studying reservoir-to-reservoir variability when investigating the link between connectivity, dynamics, and performance. As physical reservoir computers become more prevalent, developing a systematic approach to network design is crucial. In this article, we examine Random Boolean Networks (RBNs) and demonstrate that specific distribution parameters can lead to diverse dynamics near critical points. We identify distinct dynamical attractors and quantify their statistics, revealing that most reservoirs possess a dominant attractor. We then evaluate performance in two challenging tasks, memorization and prediction, and find that a positive excitatory balance produces a critical point with higher memory performance. In comparison, a negative inhibitory balance delivers another critical point with better prediction performance. Interestingly, we show that the intrinsic attractor dynamics have little influence on performance in either case.


Introduction
Reservoir Computing (RC) is a promising field for Machine Learning (ML), as the nonlinear reservoir requires no learning and the readout layer only needs linear regression [Maass et al., 2002, Jaeger andHaas, 2004], reducing time and computational cost [Schrauwen et al., 2007].Furthermore, it has potential for real-world implementations as physical reservoirs and dedicated Neuromorphic chips do not always possess the ability to adapt [Benjamin et al., 2014, Merolla et al., 2014, Tanaka et al., 2019].Around the same time in 2002, two models were developed: the Liquid State Machine (LSM) [Maass et al., 2002] and the Echo State Network (ESN) [Jaeger, 2001] (rectified version).These approaches differ in their neural models, with LSM using time-event-based neurons and ESNs using Artificial Neural Networks (ANN) with continuous activation functions [Jaeger, 2001].The Random Boolean Network (RBN) [Bertschinger and Natschläger, 2004], with binary neurons, is a particularly promising model for LSM and allows for a direct relationship between the reservoir design and its performance in a task [Bertschinger and Natschläger, 2004, Natschläger et al., 2005, Snyder et al., 2012].It is widely used to model and implement reservoirs [Rosin, 2015, Burkow and Tufte, 2016, Echlin et al., 2018, Komkov et al., 2021].
Studies on the RBN have demonstrated the existence of a phase transition in the dynamics of the reservoir for specific connectivity parameters.
Close to the critical regime, an increase in performance in solving various tasks has been reported (boolean logic operations [Bertschinger and Natschläger, 2004], bit-parity check [Bertschinger and Natschläger, 2004], prediction of Mackey-Glass time series [Canaday et al., 2018]).As of now, most studies in the field of RC rely on phase diagrams to exhibit a statistical relationship between connectivity, dynamics, and performance [Bertschinger and Natschläger, 2004], [Büsing et al., 2010], [Snyder et al., 2012], [Krauss et al., 2019a], [Metzner and Krauss, 2022].These results have been obtained by considering a limited number of reservoirs (from one [Metzner and Krauss, 2022], to 10 [ Bertschinger and Natschläger, 2004], up to 100 [Krauss et al., 2019b]), and with a limited resolution in terms of the control parameter, due to the computational cost of these phase diagrams.
While phase diagrams are essential to comprehend the full range of the computational capabilities these systems can offer, one crucial point is rarely discussed.Since the reservoirs are randomly generated, there might be huge differences between them even though the statistics of their connectivity are the same.Indeed, close to the critical point, reservoir steady-state activities exhibit a wide range of dynamics as discussed by statistical studies [Kinouchi and Copelli, 2006], [Del Papa et al., 2017], [Krauss et al., 2019b], and attractor classification [Seifter and Reggia, 2015], [Bianchi et al., 2016], [Krauss et al., 2019b], [Krauss et al., 2019a], [Metzner and Krauss, 2022].
This article aims at studying the variability of reservoir dynamics, performance, and their correlation.We consider randomly generated RBNs with a single control parameter related to the inhibitory/excitatory balance [Krauss et al., 2019a], tuned with high resolution to perform reliable statistical analysis.We study the excitatory/inhibitory balance, attractor dynamics, and performance, and show that the relationship between the three is more complex than previously thought.In line with the work of [Metzner and Krauss, 2022] on ESN, our research reveals that the RBN also possesses two critical points.Depending on whether the balance is in the majority excitatory or inhibitory, we show that reservoirs respectively exhibit optimal performance in either memory or prediction.
The article is organized as follows: in section 2, we describe the model and prove that it is controlled by the ratio of the standard deviation and mean of the weight distribution (noted σ ‹ ), which we use to perform all subsequent analyses.In section 3.1, we show that the sign of σ ‹ produces two critical regimes.In section 3.2, we classify the activity of free-running reservoirs into four classes according to their attractor dynamics for these two critical regimes.We show that each reservoir can be associated with its most dominant attractor.In section 4.1, we evaluate the relationship between connectivity, dominant attractor, and performance in memory and prediction tasks.We then investigate the relationship between the performances of the two tasks, critical regimes, and dominant attractors in section 4.2.This allows us to derive specific recommendations for simplifying the random generation process of reservoirs.Finally, we discuss our findings in section 5 and their implications for future works in section 6.

Model
The model consists of one input node, the reservoir itself, and an output node (Fig. 1).Half of the neurons inside the reservoirs are connected to the input, and the other half to the readout.Thus information between the input and the readout has to pass through the reservoir.The following subsections describe each component and how they are interconnected.
FIG. 1 -Schematics of the model.The input node (left) randomly projects synaptic weights to half of the reservoir (center) (green); the reservoir is composed of random recurrent connections (blue); the readout (right) receives input from the other half of the reservoir (orange).

The reservoir
Phase transitions occur stricto sensu only in infinite systems, and critical phenomena are easier to observe in large systems [Lavis et al., 2021].As such, we use an RBN model of size N " 10, 000 neurons, which is considerable compared to similar studies in the literature [Natschläger et al., 2005, Büsing et al., 2010, Metzner and Krauss, 2022].The binary state x i ptq P t0, 1u of the neuron i at the time-step t (with t P N), is given by: where θ is the Heaviside step function: θpxq " 1 if x ą 0 and θpxq " 0 otherwise.Each neuron receives the same number of non-zero connections K " 16, in the range of values shown to display sharp phase transitions [Büsing et al., 2010].The non-zero recurrent weights of the reservoir w ij (blue arrows in Fig. 1) are i.i.d. and drawn from the Normal or Gaussian density probability function N pµ, σq.u i ptq is the external input of the neuron i at times t.

Input node
The input layer reduces to one node, receiving the time series uptq.The input at times t of a given neuron i is: Where the input weight w in i , of neuron i (green arrows in Fig. 1) is drawn from a uniform distribution within r´0.5, 0.5s, and half of the weights are set to zero.According to Eq. 1, if the amplitude of the input far exceeds the total contribution of the recurrent weights, then the input mostly controls the dynamics.Our choice of parameters corresponds to an input of zero average and " 0.14 the standard deviation, which is rather low compared to the recurrent weights.We show in part 4.1 that this choice makes the dynamics mostly controlled by the recurrent weights, which is the intended behaviour.

Readout
The adaptation mechanism is in the output layer only, which reduces here to one linear node with a sigmoid activation function f pxq " 1 1`e ´x .As such, the output of the network is given by: Since all experiments consist in reproducing a unidimensional time series, the output y is a scalar as well.The column vector ⃗ x represents the state of the reservoir neurons, while the output weights W out (orange arrows in Fig. 1) are stored in a row vector of size N , with half of them set to zero.Lastly, the scalar c is the bias.The training is performed with a mean square error (MSE) loss function.Since we had a focus on collecting high-quality data regarding the link between connectivity and performance, we chose the ADAM optimizer [Kingma and Ba, 2015] over the more standard Ridge regression [Burkow and Tufte, 2016] often used in the literature.
The implementation is made with the PyTorch library, and parameters α " 0.001, and 4000 epochs (See Supplementary Materials 6.3.4 for more details).

Connectivity: the control parameter σ ‹
To study the reservoir dynamics, one needs the proper definition of a control parameter.Previous work on the RBN often focuses on the average and variance of the recurrent weight matrix ( [Bertschinger and Natschläger, 2004], [Natschläger et al., 2005]).In the following, we demonstrate the existence of only one control parameter σ ‹ defined by: Where µ is the mean of the weights and σ their standard deviation.Here we study the reservoir in the absence of external excitation, u i ptq " 0 in Eq. ( 1).Let us consider two reservoirs with the same architecture, the same initial state, and with respective weights matrices W and λW with the scalar λ ą 0. Since λ ą 0, then θpλxq " θpxq, @x.Thus, according to Eq. ( 1) for u i ptq " 0, the two networks are always in the same state.Thus pλµ, λσq gives rise to the same time evolution as pµ, σq.The two corresponding reservoirs are totally equivalent.We face two cases depending on µ: • When µ " 0, choosing λ " 1{σ leads to the conclusion that all reservoirs p0, σq are strictly equivalent to the reservoir p0, 1q.Hence reservoirs with µ " 0 are independent of σ.
• When µ ‰ 0, choosing λ " 1{|µ| leads to the weights of the second reservoir distributed with a mean of ˘1 and a standard deviation σ{|µ|.Hence we define the control parameter of the RBN as in Eq. (4).
Eq. ( 4) characterizes the distribution of the weights: the mean is the sign of σ ‹ , and the standard deviation is its absolute value.Other distribution characterizations directly relate to σ ‹ .For instance, it is controlling the balance b between excitation and inhibition, defined by [Krauss et al., 2019a] as: b " pS `´S ´q{S (5) With S " S ``S ´" KN the total number of synapses, S ´the number of inhibitory synapses (w ij ă 0), and S `the number of excitatory synapses (w ij ą 0).By taking a normal weight distribution, the number of excitatory synapses is given by: By substituting Eq. (7) in Eq. ( 6), we find b " Erfr1{p ?2σ ‹ qs, with Erf the error function.Thus, by controlling the weight distribution, our control parameter σ ‹ drives the excitatory to inhibitory balance and thus the reservoir dynamics, in line with [Krauss et al., 2019b] and FIG. 2 -Excitation/inhibition balance b as a function of the absolute value of the connectivity parameter σ ‹ , as defined in Eq. 4, for σ ‹ ă 0 ( ) and σ ‹ ą 0 ( ).When the average of weights is positive (σ ‹ ą 0), whereas the reverse is true when the average of weights is negative.[Metzner and Krauss, 2022].Fig. 2 shows the relationship between b and σ ‹ .The case µ " 0 corresponds to b " 0 (perfect balance between excitation and inhibition) and σ ‹ Ñ 8, for any value of σ.For σ ‹ ă 0, b P r´1, 0s, i.e. there is a majority of inhibitory synapses while for σ ‹ ą 0, b P r0, 1s, hence a majority of excitatory synapses.
To finish, the spectral radius ρpW q of the weight matrix W is a particularly relevant quantity in the context of the ESN, a continuous version of reservoirs, where the critical point corresponds to a spectral radius of one.However, in the case of discontinuous activation functions, such as the one we have with the RBN, it has been shown that the Echo State Property (ESP) cannot be achieved.The spectral radius alone fails to characterize the dynamics and performance of these reservoirs [Oztuik et al., 2007], [Alexandre et al., 2009], [Tieck et al., 2018], [Balafrej et al., 2022].In supplementary material 6.3.1, we explicitly discuss the link between ρ and the mean and variance of the weight matrix and show ρ is of no particular interest in the study of the dynamics.
As a consequence, in the following, we will use σ ‹ as the unique control parameter (in the range of values displayed in supplementary material S6.3.2).

Statistics of dynamics at the critical points
The importance of neural networks dynamics in understanding their performances has been widely explored ([Bertschinger and Natschläger, 2004], [Büsing et al., 2010], [Krauss et al., 2019a], [Metzner and Krauss, 2022]).The purpose of this part is to investigate the relationship between connectivity and dynamics through two statistical analyses: 1. Activity statistics: In section 3.1, we analyze the statistics of the neural activity as a function of the control parameter σ ‹ .We demonstrate the existence of two critical points and characterize them.
2. Reservoirs attractors: In section 3.2, we classify the steady state time evolution of the network activity into four distinct attractors and study the influence of the initial state and random weight generation.We show that reservoirs possess a dominant attractor independent of initial conditions.
3.1 Statistics of the activity of free-evolving reservoirs

Methodology:
The first experiment is a free evolution of reservoirs in the absence of input, i.e. u i " 0. We define the network activity as Aptq " ř i x i ptq{N , with N the number of neurons in the reservoir, and A P r0, 1s.A is also the proportion of excited neurons: A " 0 if the network is extinguished (x i " 0 @i), A " 1 if the network is saturated (x i " 1 @i).At the initial state, we randomly force 20% of neurons to an up state (x i " 1), i.e.Apt " 0q " 0.2.After a transient regime of 1000 time steps, the reservoir reaches a steady state where we perform statistics.In the following, A will refer to the activity measured in that steady state (see supplementary material S6.4 for a more formal definition).For each value of σ ‹ , we perform statistics on 100 randomly generated reservoirs (see supplementary material S6.2 for more details on the experiment).

Analysis:
In the following, a bar over a variable p.q represents an average over time for a given reservoir, while the brackets x.y represents an average over different randomly generated reservoirs.In the first analysis, we calculate the time-average steady activity Ā for a given reservoir and its time-variance δA 2 , where we define δA " A ´Ā.We average these quantities over the reservoirs to give x Āy and xδA 2 y for each value of σ ‹ .Next, we evaluate the average and variance over reservoirs of the binary entropy H b , or BiEntropy [Croll, 2014] of the timedependent activity.Compared to the Shanon entropy, the advantage of this metric is that it can discriminate ordered from disordered strings of binary digits.It has been used in machine learning [Mamun et al., 2016], [Zhou and Zeng, 2022], but to our knowledge, this is the first time in reservoir computing.The binary entropy varies between 0 for fully ordered bit-streams and 1 for fully disordered ones.We compute the BiEntropy of the binarized time dependence of the steady activity for each reservoir (for the exact definition of all the metrics, see S6.5).

Results:
The time-average activity x Āy as a function of σ ‹ is shown in Fig. 3A, for both signs of σ ‹ (Fig. 2).The green dashed line represents the value obtained for µ " 0, i.e. σ Ñ 8.The perfect balance in excitation (b " 0) results in half of the neurons being activated x Āy " 0.5.The variance xδA 2 y vs σ ‹ is shown in Figs.3C.For the lowest values of |σ ‹ |, the reservoirs are frozen (zero variance) either extinguished (for σ ‹ ă 0) or saturated (σ ‹ ą 0).This corresponds to reservoirs being respectively purely inhibitory (b " ´1) or excitatory (b " 1).Already at the level of the statistics of the activity, there is a clear difference between both signs of σ ‹ : for σ ‹ ă 0 there is a threshold in σ ‹ (vertical dashed line at σ ‹ " ´0.7) above which the average activity and its variance rise abruptly and simultaneously.In contrast, for σ ‹ ą 0, there is a wide region where no dynamic is detected (zero variance), yet the network is not saturated but its activity decays continuously.The variance starts rising at σ ‹ ą 4 (vertical dash-dotted line).
The average BiEntropy xH b y vs. |σ ‹ | is plotted in Figs.3B for σ ‹ ă 0 and 3D for σ ‹ ą 0 (left scale, black stars on both plots).These two plots zoom in the vicinity of the phase transition, as statistics are stationary elsewhere.There is a continuous transition between a fully ordered phase (xH b y " 0) and a fully disordered one (xH b y " 1).Since the BiEntropy is a measure of order, these results suggest that the transition we observe is related to the apparition of chaos in the reservoir above a critical value of σ ‹ ( [Lewin and Bak, 1993], [Bertschinger and Natschläger, 2004], [Seifter and Reggia, 2015], [Kusmierz et al., 2020]).The variance of the BiEntropy xδH 2 b y is shown in Figs.3B for σ ‹ ă 0 and 3D for σ ‹ ą 0 (right scale, orange circles on both plots).It is zero when either in the ordered or disordered phase and spikes at the transition.Its maximum coincides with xH b y » 0.5: the variance of BiEntropy captures the edge of chaos as a balance between order and disorder.More striking, it also coincides with the position at which the variance of the activity rises (vertical dashed lines).The peak of xδH 2 b y thus provides a clear definition of the position of two critical points: σ ‹ c » ´0.66 and σ ‹ c » 4.0, which correspond respectively to critical balances b c » ´0.87 (94% of inhibitory synapses) and b c » 0.19 (60% of excitatory synapses).Moreover, the transition between order and disorder is much wider for σ ‹ ą 0. This asymmetry between both signs of σ ‹ is a property of our model since θp´xq ‰ ˘θpxq in Eq.( 1).From now on, we will refer to the critical points as the point where the maximum of BiEntropy variance is obtained, and we will define the critical regions as the regions with xδH 2 b y ‰ 0.

Discussion:
Similar to [Krauss et al., 2019b] and [Metzner and Krauss, 2022], the existence of phases separated by critical points as a parameter is varied is reminiscent of the phase diagrams drawn in thermodynamics.If we associate the state of a neuron, 0 or 1, to the state of an Ising spin, either down or up, then x Āy corresponds to the average magnetization per spin of the network and xδA 2 y to the variance of its fluctuations, i.e. magnetization noise.At equilibrium, it is proportional to the magnetic susceptibility according to the fluctuation-dissipation theorem [Callen and Welton, 1951].The total magnetization plays the role of an order parameter, and the transition order is obtained by considering discontinuities, as a function of temperature, of the order parameter and its derivatives with respect to the external field [L.D. Landau, 1980].Here we observe that the average activity is always continuous as a function of σ ‹ .At the same time, xδA 2 y is continuous for σ ‹ ą 0 but shows a discontinuity at the critical point for σ ‹ ă 0. This strongly suggests that the two "phase transitions" are of a different type.

Dominant attractor of reservoirs
In the previous section, we considered the average behaviour of reservoirs: for a given value of σ ‹ we averaged over many realizations of the distribution of synaptic weights.However, from a practical point of view, one wants to use one network to work with different inputs.This raises two questions: that of the reservoir-to-reservoir variability (do all reservoirs behave similarly?)and that of the sensitivity of a given reservoir to initial conditions.We address these questions in this section.

Methodology:
We submitted our reservoirs again to a free evolution without input (u i ptq " 0).For each value of σ ‹ , we created 100 reservoirs with randomly tossed weight matrices.Each reservoir is run 100 times, with a different random initial state, of activity Apt " 0q " 0.2 (for more details, see Statistics of reservoirs in S6.1).

Analysis:
We classify the attractor obtained in the steady-state activities, as proposed in [Krauss et al., 2019b].We categorize the activity signals into one of the four types of attractors (see S7 for a grounded justification of each category): • Extinguished activity: The steady-state activity Aptq is always zero.This means that the initial activity died during the transient phase and that the reservoir could not propagate it further in time.For simplicity, we will sometime refer to it as dead attractor.
• Cyclic attractor: Aptq is periodic with a periodicity larger than one time-step.
• Irregular attractor: Aptq is neither constant nor periodic within the duration of the simulation.Note that since the RBN is finite, discrete, and deterministic, given enough time, any sequence of states should eventually repeat, taking at most 2 N time steps.
We determine the attractor obtained at the steady state for each reservoir and initial condition.
We then compute the distribution of attractors for each value of σ ‹ obtained overall the initial conditions of all reservoirs.The statistics are thus computed on 10,000 steady activities for each σ ‹ .Away from the critical point, a dominant colour is observed, meaning that reservoirs exhibit a dominant attractor.Steady activities are predominantly extinguished and fixed on the left side of the critical point (A, D) and irregular at the right (C, F).Close to the critical points (B and E), there is an increase in the diversity of attractors, as previously observed [Karimipanah et al., 2017].c |, attractors are irregular, while the ordered phase is characterized by fixed or dead attractors.We note an asymmetry between both sides of the transition: irregular attractors appear only in the disordered phase.From the point of view of the attractors, both signs of σ ‹ lead to similar behaviours, except again, that the transition region is much wider for σ ‹ ą 0.

Discussion:
Our results suggest that the critical points enhance sensitivity to the initial states and configuration of the weights, explaining the reservoir-to-reservoir variance and increase in dynamic diversity.Reservoirs around the negative σ ‹ c (Fig. 4.B) possessed a distribution of attractors with far more variety than the one with positive σ ‹ c (Fig. 4.E), further reinforcing the idea that the sign of σ ‹ produces two distinct types of critical regimes.We quantified this in the supplementary material S7.1 by computing the entropy of reservoir attractor distributions plotted in Fig. S3.We interpret that result by suggesting that inhibition might be a key factor for enhancing dynamic diversity.
For the purpose of reservoir design, our findings suggest that with both critical points, most reservoirs possess an attractor obtained predominantly in most trials, independent of the initial state.The statistics of dominant reservoir attractors are presented in supplementary Fig. S3, and found to be similar to the one in Fig. 3.The presence of vertical colour lines in Fig. 4A-F means that, in most cases, the behaviour of the reservoirs does not depend on the initial state, even in the critical region (this is more thoroughly shown in the supplementary material S7.1).As a consequence, a dominant attractor can be associated with each reservoir, irrespective of the initial condition.

What drives performances
In this section, we examine whether there is a relationship between reservoir dynamics in the absence of input, as explored in the previous section, and its ability to perform two demanding tasks: memory and prediction.This is done in two steps: • Connectivity, attractor, and performance: In section 4.1, we analyze the performance obtained separately in each task, depending on the control parameter, for each domi-nant attractor category.We show that the key factor driving performance is the excitatory/inhibitory balance.
• Attractor and cross-task performance: In section 4.2, we analyze all reservoir performances independently of the control parameter.For each reservoir, we study the relationship between the performance obtained in each task and their dominant attractor.This allows us to deduce how to generate a reservoir for the best general purpose.
From now on, and for ease of notation, a reservoir with a dominant attractor obtained during free evolution (defined in previous section 3.2) will be referred to as either: a extinguished, fix, cyclic, or irregular reservoir (e.g., an extinguished reservoir refers to a reservoir with an extinguished dominant attractor).

Performance in memory and prediction tasks
Close to the critical points, we obtained various dominant attractors for a single value of σ ‹ .This raises an important question regarding the relationship between the dominant attractor of a reservoir and its performance.Specifically, it is worth investigating whether the dominant attractor influences the reservoir's performance.If this is the case, grouping attractor categories by discrete performance levels may be possible based on a single value of σ ‹ .

Methodolody:
We evaluate the performance of the networks to execute two fundamental tasks: memory and prediction.Each reservoir receives an input uptq, and the readout target is T ptq " upt `δq, equal to the input shifted in time by δ time steps.δ ă 0 corresponds to a memory task, and δ ą 0 to a prediction task.For each value of σ ‹ , we use 100 reservoirs, and each reservoir is run five times, with a different random tossing of the input weight matrix (more detail on the training procedure in S6.3.4).
The first task consists of memorizing a purely random signal (i.e., uncorrelated white noise), and since there is absolutely no correlation in the input, only memorization is involved.Fig. 5.A illustrates this task for one value of δ " ´6, with white noise as input u, and the target T .
For the second task, we explore the ability of the reservoir to predict a time series, δ " 10 time steps in the future.The input is the well-known Mackey-Glass time series, as it is a common benchmark of this type of task ( [Hajnal and Lörincz, 2006], [Goudarzi et al., 2016], [Canaday et al., 2018], among others), notably testing the ability to infer non-linear dynamics.The signal regularity is controlled by the parameter τ , see Fig. 5.B, ranging from periodic with τ " 5, to chaotic for τ " 28 (more information on the experiments in S6.3.3).

Analysis:
The performance of a reservoir is measured by computing the correlation product Corrpy, T q between the output y and the target T .A perfect match corresponds to a correlation of one, while a random output gives a zero correlation.An individual reservoir performance score is then obtained by averaging over the initial conditions.Each individual reservoir is associated with its dominant attractor, and the statistics of the performance of reservoirs are performed separately for each attractor.

Results:
The average performance is plotted as a function of |σ ‹ | in Fig. 5C,E for the memory task and in Fig. 5D,F for the prediction task.The left column (C and D) corresponds to σ ‹ ą 0, and the right column (E and F) to σ ă 0. The colour of the lines corresponds to the attractor.
For σ ‹ ă 0 (Fig. 5.C and D), performance increases over a very wide range of σ ‹ , both for memory and prediction.This range includes the critical region (gray hatched area) but is vastly broader.Thus, being within the critical region is absolutely not mandatory to perform well.A shaded area in Figs. 5 indicates the spreading of the results.There is none in plots C and D, meaning that all reservoirs perform exactly the same for a given σ ‹ .Moreover, as σ ‹ is increased through the critical region, the dominant attractors change (see zooms in plot C and D), but surprisingly, there is no discontinuity in the performance.Indeed, even though the four attractor categories are present inside the gray area, their respective performance all align.This strongly suggests that the dynamics of the reservoir, as measured in the absence of input, is irrelevant for the performance.Only the value of σ ‹ matters.In both tasks, the average performance decreases monotonically with increasing difficulty via τ and δ.In the memory task, we register a dip in performance close to the critical point.This goes against the common assumption that the edge of chaos is optimal for memory [Natschläger et al., 2005].In the prediction task, the peak of performance roughly coincides with the critical region, except for the greater difficulty, where the peak is slightly on the left.
The picture is very different for σ ‹ ą 0 (Figs.5E and F).First, the region in which some level of performance is observed is comparable to the critical region observed in free-running reservoirs.Second, there is substantial variability in performance across different reservoirs, as indicated by the large shaded areas.Despite this variability, there is an overall dependence of performance on σ ‹ .The average performance of distinct dominant attractor categories is much noisier.However, despite being more noisy, the average performance of the distinct attractor categories aligns again, so there is still no evidence that the attractor category has any significant impact on the reservoir's performance.We observe that performance decreases as the difficulty of the memorization task increases, but interestingly, this trend appears to be inverted in the prediction task.
Once again, the two signs of σ ‹ give rise to different behaviour.In particular, networks with σ ‹ ą 0 memorize better and are less reliable than those with σ ‹ ă 0 but have poorer prediction capability.Yet, in all cases, attractors do not seem to be correlated to performance, as the top performance can be found in any of the four attractor categories.

Discussion:
Our results somewhat challenge the common assumption that the edge of chaos is optimal for performance and suggest that this is true for reservoirs with a majority of excitation but not necessarily with a majority of inhibition.Reservoirs with negative σ ‹ exhibit very reliable performances with very low reservoir-to-reservoir variability over a range in σ ‹ much broader than the critical region.Since reservoirs behave the same, in practice, it is sufficient to generate one, with σ ‹ at the left of the critical region.However, if the goal is optimal memorization, it is wiser to choose σ " 0.4 in the critical region and try different reservoirs until finding a good one, which requires training and testing.

Cross-task performance
Beyond studying the performance in memorization and prediction separately, as often done ( [Bertschinger and Natschläger, 2004], [Büsing et al., 2010]), here we aim at answering the following question: are the reservoirs intrinsically good or bad, or does it depend on the task?In other words, are there general-purpose reservoirs and specialized ones?

Analysis:
We analyze the absolute value of the performance of all reservoirs independently of the control parameter.For each reservoir, we study its performance in the memory task as a function of its performance in the prediction tasks (see section 4.1 methodology).For this, we fixed values of δ (memory) and τ (prediction), and we chose three levels of difficulty: simple (τ " 5, δ " ´2), average (τ " 20, δ " ´6) and difficult (τ " 28, δ " ´10).

Results:
Fig. 6 displays the performance of reservoirs as coloured dots.As in the previous section, each colour corresponds to the dominant attractor of the reservoir.The vertical axis represents the performance in the prediction task (Mackey-Glass), and the horizontal axis that of the memory task (white noise).The columns correspond to the sign of σ ‹ , negative on the left, positive on the right.The rows correspond to the degree of difficulty, from simple (top) to difficult (bottom).
For σ ‹ ă 0 (left column), there is an apparent relationship in performance between the two tasks.The degree of difficulty roughly acts as a scaling factor on the curves, but the correlation is always clear.The reservoirs which are good at predicting do memorize well as well.There is, however, a reentrant region of irregular reservoirs (red) which are the best at memorizing but are not optimal for predicting, in particular for the intermediate degree of difficulty (see the red loop on Fig. 6B).This point corresponds to the maximum on the right of the dip observed in Fig 5C .Interestingly, in all difficulties, reservoirs at the critical point (encircled dots) create a narrow area with a good overall performance.This picture refines our previous analysis of the impact of attractors on performance, as it seems that extinguished reservoirs can perform slightly better at memory than the others, and the same for the irregular reservoirs at prediction.Nonetheless, the difference between attractors is rather small, and it could be argued that it is insignificant.
The picture is very different for σ ‹ ą 0 (right column).Performances are distributed as clouds of points.In Fig. 5, we observe a large reservoir variability in both tasks.Some reservoirs are suitable for one task and bad for the other one.For intermediate and difficult tasks, some reservoirs outperform the best ones with σ ‹ ă 0 in both memorization and prediction.Reservoirs at the critical point are found on the right of the plots, i.e. they promise good memorization but are nonetheless widely spread, especially in prediction (vertical axis).Overall, the distinct attractors occupy the space in overlapping and indistinguishable clouds; this confirms the previous analysis that attractors do not play a role in performance.

Discussion:
Correlations in performances are very different for both signs of σ ‹ .Choosing a reservoir with σ ‹ " ´0.66 ensures a good, general-purpose reservoir but with suboptimal performance.In contrast, going into the positive side of σ ‹ may lead to the best reservoirs in a given task or even FIG.6 -The performance of all reservoirs in the prediction tasks (Mackey-Glass) as a function of their performance in the memory tasks (white-noise), for σ ‹ ă 0 (A, B, C) and σ ‹ ą 0 (D, E, F).Reservoirs are again classified according to their dominant attractor (see supplementary material S7.2).Each dot represents an individual reservoir performance averaged over 5 initial conditions.In all plots, dots with black edges display reservoirs taken at the critical regimes.We chose three different pairs of values for the parameters of the tasks, τ and δ, each representing a different difficulty level: 1. Simple (A and D) the lowest difficulty in both tasks, τ " 5 and δ " ´2; 2. Average (B and E) average difficulty for intermediary values of τ " 20 and δ " ´6; 3. Difficult (C and F) difficult task for higher values of τ " 28 and δ " ´10.
better general purpose reservoirs, but this comes at a price: those gems cannot be found by the statistical analysis we have performed on their free running activity.

Conclusion
One of the main issues in the field of RC is the lack of principled methodology [Rodan and Tino, 2011] for reservoir design.This article aimed to quantify the impact of the random weight generation process to better understand the relationship between connectivity, dynamics, and performance.We demonstrated that the only control parameter is the ratio σ ‹ " σ{µ through a Gaussian weight distribution, which indirectly regulates the excitatory/inhibitory balance.We found two critical points and observed that reservoirs typically possess a dominant attractor, regardless of their initial states.
We investigated the relationship between the performance, the control parameter, and the preferred attractor in memory and prediction tasks.Our results reveal that σ ‹ , hence the excitatoryinhibitory balance b, has a strong impact on performance in the two considered tasks while the attractor dynamics have none.We showed how to select a control parameter region that ensures good performance, thus providing a very efficient way to obtain high-performance reservoirs.This region corresponds to high attractor diversity.For σ ‹ ă 0, the critical region is narrower and does not necessarily coincide with the top of performance, while for σ ‹ ą 0, the critical region corresponds to the performing region.
For the tasks, we showed that negative σ ‹ values produced superior results in prediction, with reliable performance and low reservoir-to-reservoir variability.Therefore, it is sufficient to perform free-running and pick a single value of σ ‹ , preferably close to the critical point σ ‹ c .In contrast, positive σ ‹ values were found to have higher performance in memory tasks but with greater volatility.Since a given σ ‹ value can lead to diverse performance outcomes, generating random reservoirs and testing them during training to select the best performers is still necessary.Given enough trials, however, our findings suggest that σ ‹ ą 0 can generate the bests general-purpose reservoirs.

Future work
We tested the impact of dynamics on performance in two types of tasks: memory and prediction, for various time series.It would be interesting to test if and how the balance and attractor dynamic impact other types of tasks and inputs, notably classification, as it is also a standard task in machine learning.Moreover, extending the cross-task performance analysis to classification could potentially reveal interesting insights about its performance.
Still, one surprising result is the limited impact of the intrinsic attractor dynamics on performance.One could test the robustness of this result by refining the attractor category, and future work may reveal greater performance sensitivity to attractor dynamics.For example, the extinguished category included all reservoirs with activities dying before 1000 time steps.Refining the analysis could involve correlating performance with the average time before free-running reservoir activity dies out.Similarly, cyclic reservoirs could be refined by analyzing their period [Kinoshita et al., 2009], while some irregular activities may be considered cyclic when run for more extended periods.Moreover, it is possible that combining other types of analysis, such as correlation in space and time [Metzner and Krauss, 2022], avalanche distribution size [Siddiqui et al., 2018], basins of attraction [Del Giudice et al., 1998, Chinarov and Menzinger, 2000, Kinoshita et al., 2009], the number of attractors [Cabessa and Villa, 2018], and study of the reservoir topology [Kinoshita et al., 2009, Masulli andVilla, 2015], could provide better categorization of dynamics, with ultimately better predictive power of performance.
Finally, it would be of particular interest to see if our finding regarding the impact of the excitatory/inhibitory balance and dominant attractors also applies to other models, such as the quantum Ising spins, also used in the context of RC, which exhibit analogous phase transitions [Martínez-Peña et al., 2021], and improved memory and prediction of time series in its vicinity [Kutvonen et al., 2020].
FIG. S1 -The spectral radius ρ, of Gaussian weight matrices, computed for different values of the mean µpW q, and as a function of the standard deviation σpW q.Each dot represents the average computed on 10 matrices, and errorbars represent two standard deviations.
The property ρpλW q " |λ|ρpW q for any real λ leads to ρpλµ, λσq " |λ|ρpµ, σq.In the case µ " 0, taking λ " 1{σ gives ρp0, σq " α|σ| with α " ρp0, 1q a parameter independent of the weight matrix.We thus find that while the dynamics of the network is independent of σ for µ " 0 (see Fig. 3), ρ can take any value.In this case, there is no link between ρ and the dynamics of the network.
While it may be tempting to replace ρ by its renormalized value β, both quantities suffer from the same weakness: they do not depend on the sign of σ ‹ , whereas the dynamics of the network strongly do.TAB.S2 -Values of σpW q for the first experiment, µ " 0.1.

Performance of tasks
All experiments in this section test the performance in a task, where learning is required for the readout layer.For all tasks, the following framework is applied.Assuming that F is the reservoir, the relationship between the input time series uptq and the output layer y is yptq " F puptqq.The goal here is to learn the target T ptq, such that yptq " T ptq and the target is set to: Here, δ is an integer which represents a number of time steps.This parameter serves the general purpose of setting the type of task: • Memory task for δ ă 0, the reservoir output must reproduce the input received in the past.
• Prediction task for δ ą 0, the network output must produce an input not yet seen by the reservoir.
The (integer) parameter δ also sets the difficulty of the task: the higher in absolute value, the more demanding it is.White noise memory (section 4).uptq is a zero mean, unit variance white i.i.d noise.Successive inputs are uncorrelated, so prediction is not involved, only memory.We report results for δ P t´14, ´10, ´6, ´2u (see figure 4.A for an illustration).
Prediction of Mackey-Glass series (section 4).uptq is the Mackey-Glass time series [Hajnal and Lörincz, 2006], which is common to benchmark reservoir computational capabilities.It is given by the following dynamical equation: We choose a " 0.9, b " 0.2, c " 0.9, d " 10 and x 0 " 0.1.The dynamic of this equation can be controlled by varying τ : as τ increases, the time series evolves from periodic (τ " 5) to chaotic (τ " 28), with a continuous increase in complexity in between.Results are for a fixed δ " 10 and τ " t5, 15, 20, 28u (see figure 4.B for an illustration).

Training
The two tasks follow the same protocol for training the readout: • The network receives the input u for duration D " 2000 time steps.
• The first 500 time steps are discarded and considered transients.This value is empirically obtained, adapted from the signal at hand, and coupled to a convergence test for permanent regime detection.
• The training is performed on the following 1500 time steps on the concatenated in-time reservoir outputs, using the optimization procedure described in 2.3.
Each experiment consists in 40 values of σ ‹ , 100 reservoirs per σ ‹ value, and each network is run 5 times with different randomly tossed inputs (i.e., 40, 000 simulations).Each training is performed for 4000 epochs (with a total of 640, 000, 000 training epochs).

Performance
The metrics of performance are given by the Pearson correlation coefficient between the output and the target (each of length 1500).A perfect match corresponds to a correlation of 1 while 0 means an output of the network is not better than random.

Measure of networks dynamics
To evaluate the reservoir dynamics, we used the most straightforward measure one can imagine: the activity, which we defined as the sum of all spikes at each given time step, see supplementary material 6.5.We will take some time to make the reader appreciate the usefulness of such an approach.The sum of binary spikes is mathematically equivalent to how the magnetic field of spins in the ISING models [Ising, 1925] is computed.In ISING, the electron spins are modelled as miniature magnets that can take binary values r´1, 1sq, which flip with some probability depending on temperature and the coupling with other spins.The ISING model is one of the closest physical models to neural networks.In fact, it has been widely used to design neural networks, for example, in the famous Hopfield network [Hopfield, 1982] or the Boltzmann machine [Sherrington and Kirkpatrick, 1975].So one could argue that our activity signal is close to what Magnetoencephalography (MEG) is for brain activity [Hämäläinen et al., 1993].Since this analysis does not depend on the micro-constituents, we make the case that the methodology performed in this article is easily transferable to other neural models and to other fields, such as neurosciences, physics, and the study of complex dynamical systems.Activity: we define Aptq, the averaged activity of the network at time t, and for simplicity we will refer to it as activity.It is the normalized sum of all neural states x i ptq: Steady state of reservoir: In the main text (part 3.1), we defined the steady state as A for ease of notation, here for the sake of clarity, we define the steady activity A s of free-running reservoirs during D " 2000 time steps, as A s " Apt ě D{2q, the activity of the last D ´D{2 " 1000 time steps of free running simulations.
6.5 Metrics 6.5.1 Permanent regime statistics: here are two statistical analyses we perform on the measure of the steady state activity A s of a given reservoir, where p.q represents the time-average over a quantity, and δAptq " Aptq ´Ā: • Permanent regime time-average:

Aptq
• Permanent regime time-variance: Average over reservoirs of permanent regime statistics: in the first experiment (see results in section 3.1), for each value σ ‹ , we compute the average (over reservoirs generated with 100 • Average over reservoirs of the permanent regime BiEntropy: • Variance over reservoirs of the permanent regime BiEntropy: NB: given that neuron states are binary, one could wonder why employing a binarized version of a continuous variable like A. The reason is twofold: 1. the number of neurons in the network is N " 10000.Therefore, computing H b on all neurons would be extremely costly.As such, reducing the number of neurons would be crucial, though it would necessitate a criterion for selection.This poses the risk of missing important information or introducing biases.2. This would not be applicable in many real-world applications where access to the micro-constituents of the reservoir is difficult or even impossible.As such, we ensure our methodology is easily transferable to other areas comprising non-invasive studies of the brain.

Classification of attractors
In this section, we provide a more grounded explanation of the choice of attractor categories.
In the main text (part 3.2), we developed a scheme to categorize activities by their respective attractor.Fig. 4 showed the histograms of attractors as a function of σ ‹ , and in this section, we provide a refined version of these statistics by adding two more categories of attractor: • Saturated attractors: The steady activity is saturated when all neurons are active at all times.
• Non-trivial: Any signal whose category of dynamics changes over time and that does not fit in previously mentioned types is considered non-trivial.This comprises cases where during the time window considered, activity suddenly changes from one type of dynamic to another.
We show in Fig. S2 examples of the various activities belonging to each category, No-Activity and Fix (A), Cyclic (B), Irregular (C), and lastly, Non-Trivial (D).In the following paragraphs, we explain in more detail some specificities of the fixed point attractor category and the reason why did not treat the Non-trivial case separately from the irregular ones.First, we must mention that the fixed point attractor category could, in theory, encompass the extinguished and saturated cases as well.This is because they all fit inside the definition of a time derivative of zero.The reason why we separated inactive reservoirs from the two others comes from percolation theory [Coniglio et al., 1976]: formally, a reservoir has not percolated if the activity does not spread to infinity in space and/or time.While at the percolation threshold, activity will start to propagate indefinitely.We make the case that the percolation threshold, which does not coincide here with the critical points, constitutes another type of transition [Cohen et al., 2010], from inactive, to active, hence the distinction.This is visible in Fig. S2.E, as the fixed point attractors appear a bit before the cyclic ones, around σ ‹ " 0.5.Second, in Fig. S2.F, we can see that the transition from Saturated to Fix attractor happens very early (in terms of σ ‹ ), compared to the rest of the attractors.So early, in fact, that it is not part of the phase transition.As you might recall, the analysis performed in part 3.1 revealed that this region has a zero variance of both activity and BiEntropy.We conclude that the transition from Saturated to Fix is not related to a change of dynamics but only to a change in amplitude.As a result, we have chosen not to differentiate the two.
Thirdly, as one can clearly see in both Fig. S2.E and F, the Non-trivial dynamics are very rare.As such, it is worth mentioning that in the analysis performed in S7.2, where we categorize reservoirs depending on their dominant attractors, not a single reservoir exhibits a dominance of Non-Trivial dynamics.This is important because it means this category of attractors is irrelevant for finding correlations between dominant attractors and performance.

Diversity of reservoir attractor distributions
We compute the Shanon entropy H s , with the goal of quantifying how varied are the attractor distributions of given reservoirs.Typically, if a reservoir activity always falls into one attractor, irrespective of the initial state, the Shanon entropy will give 0. On the other hand, the maximum entropy is obtained for a uniform distribution where each attractor is obtained 1{4 of the time, and H max s " ´logp1{4q.The Shanon Entropy is therefore normalized by its maximum value and averaged over the 100 reservoir of each value of σ ‹ .
To quantify reservoir dynamics diversity, we plotted the average normalized entropy of reservoir attractor distributions, xH s {H max s y, against the control parameter for σ ‹ ă 0 (A) and σ ‹ ą 0 (B).The shaded area represents one standard deviation.Lower values of this quantity indicate a stronger dominance of one attractor.Outside of the critical region, initial conditions are irrelevant, and reservoirs always converge to a single attractor.The region where xH s y increases correspond precisely to the region where the BiEntropy, shown in Fig. S3, is nonzero.In this region, different reservoirs may lead to different attractors.There is significantly more entropy, i.e. more competition between attractors, for σ ‹ ă 0.
The higher variance in H s also implies increased reservoir-to-reservoir variability.This variability is more pronounced for σ ‹ ă 0, with the average entropy exceeding 0.25 near the critical point and the shaded area approaching 0.5.For σ ‹ ą 0, reservoirs generally exhibit stronger dominant attractors, as the standard deviation never surpasses 0.25.Despite these differences, the normalized entropy remains consistently below 0.5 for all values of σ ‹ , irrespective of the sign.This value approximately corresponds to an attractor distribution where one attractor dominates 80% of the initial conditions, indicating that most reservoirs possess a peaked distribution with a predominant attractor.

FIG. 3 -
FIG. 3 -Statistics of the activity of free running reservoirs in the steady state as a function of |σ ‹ |.Each dot represents the statistics over 100 reservoirs ran once.Average over reservoirs of time average activity x Āy (A), and average over reservoir of time variance xδA 2 y (C), for σ ‹ ă 0 (İ) and σ ‹ ą 0 (▲).In all plots, the gray vertical lines represent the critical values of the control parameter for σ ‹ c ă 0 ( ) and σ ‹ c ą 0 ( . .).B and D zoom on the region of interest close to the critical points: average over reservoirs of BiEntropy xH b y (‹, left scale) and BiEntropy variance xδH 2 b y (‚, right scale), for σ ‹ ă 0 (B) and σ ‹ ą 0 (D).

Fig
Fig. 4A-F provide examples of attractors, encoded in the colours, for each reservoir (x-axis) and each initial condition (y-axis) for different values of σ ‹ .The left column (blue-bordered boxes) corresponds to values below the critical point (vertical blue lines on Figs.4G,H), the center column (gray-bordered boxes) to values at the critical point (vertical gray lines), and the right column (red-bordered boxes) to values above the critical points (vertical red lines).The upper row displays negative σ ‹ values, while the lower row features positive σ ‹ values.Away from the critical point, a dominant colour is observed, meaning that reservoirs exhibit a dominant attractor.Steady activities are predominantly extinguished and fixed on the left side of the critical point (A, D) and irregular at the right (C, F).Close to the critical points (B and E), there is an increase in the diversity of attractors, as previously observed[Karimipanah et al., 2017].Fig.4.G and H show the statistical distribution of all obtained attractors versus |σ ‹ |.As expected from the previous analysis, there is no attractor diversity on the far left and right of the plots, as we obtain one primary attractor.Dead (blue line) or fixed (green line) attractors are found for low values of |σ ‹ |, and their proportion decays slowly across the transition.Within the critical region coexist all attractors in various proportions.Chaotic attractors start to appear precisely at the transition (vertical gray lines), while the domain where cyclic attractors exist Fig. 4A-F provide examples of attractors, encoded in the colours, for each reservoir (x-axis) and each initial condition (y-axis) for different values of σ ‹ .The left column (blue-bordered boxes) corresponds to values below the critical point (vertical blue lines on Figs.4G,H), the center column (gray-bordered boxes) to values at the critical point (vertical gray lines), and the right column (red-bordered boxes) to values above the critical points (vertical red lines).The upper row displays negative σ ‹ values, while the lower row features positive σ ‹ values.Away from the critical point, a dominant colour is observed, meaning that reservoirs exhibit a dominant attractor.Steady activities are predominantly extinguished and fixed on the left side of the critical point (A, D) and irregular at the right (C, F).Close to the critical points (B and E), there is an increase in the diversity of attractors, as previously observed[Karimipanah et al., 2017].Fig.4.G and H show the statistical distribution of all obtained attractors versus |σ ‹ |.As expected from the previous analysis, there is no attractor diversity on the far left and right of the plots, as we obtain one primary attractor.Dead (blue line) or fixed (green line) attractors are found for low values of |σ ‹ |, and their proportion decays slowly across the transition.Within the critical region coexist all attractors in various proportions.Chaotic attractors start to appear precisely at the transition (vertical gray lines), while the domain where cyclic attractors exist

FIG. 5 -
FIG. 5 -Performances for two tasks: white noise memory (C, E); and Mackey-glass prediction (D, F).A and D: examples of signals for each task with their respective parameters.A: White noise memory task, which consists in remembering the input (gray), to reproduce it in output (dark red) with a negative delay δ (shown example corresponds to δ " ´6).B: Mackey glass is controlled by the parameter τ (see methodology 6.3.3 for more details), ranging from periodic to chaotic.C, D, E, F: the average performance Corrpy, T q between the output y and target T , plotted over |σ ‹ |, for each dominant attractor category: no-activity, fix, cyclic or irregular.For each value of σ ‹ we have 100 reservoirs.The solid line then represents the average over reservoirs belonging to the same attractor category; individual reservoir performances are averaged over 5 initial conditions.The shaded area represents one standard deviation.Higher correlations indicate better performance.The hatched gray area represents the critical regions, as defined in section 3.1.C and D: the performance in the white-noise memory task; three values of δ are tested: ´2 (‹), ´6 (‚), ´10 (İ).E and F: the performance of Mackey-Glass prediction (δ " `10); three values of τ are tested: 5 (‹), 20 (‹), 50 (İ).C and D: Performance for σ ‹ ă 0, with inside each plot a zoom on the critical region.E and F: performance for σ ‹ ą 0.

Fig
Fig. S4 displays the statistics of the dominant attractors of each of the 100 reservoirs generated by σ ‹ values.As one can note, the statistics are also unchanged from Fig. 4. Taken together, these results indicate strong statistical robustness, as averaging over reservoirs is almost equivalent to averaging reservoirs themselves.
Below are the parameter values for the experiment of free running (see methodology in S6.2):