Extending Stability Through Hierarchical Clusters in Echo State Networks

Echo State Networks (ESN) are reservoir networks that satisfy well-established criteria for stability when constructed as feedforward networks. Recent evidence suggests that stability criteria are altered in the presence of reservoir substructures, such as clusters. Understanding how the reservoir architecture affects stability is thus important for the appropriate design of any ESN. To quantitatively determine the influence of the most relevant network parameters, we analyzed the impact of reservoir substructures on stability in hierarchically clustered ESNs, as they allow a smooth transition from highly structured to increasingly homogeneous reservoirs. Previous studies used the largest eigenvalue of the reservoir connectivity matrix (spectral radius) as a predictor for stable network dynamics. Here, we evaluate the impact of clusters, hierarchy and intercluster connectivity on the predictive power of the spectral radius for stability. Both hierarchy and low relative cluster sizes extend the range of spectral radius values, leading to stable networks, while increasing intercluster connectivity decreased maximal spectral radius.

presented to a network where the reservoir consisted of laterally inhibited clusters. They observed that each cluster attenuated to a subcomponent of the input signal, improving performance. The Scale-Free Hierarchical ESNs (SHESN) of Deng and Zhang lack explicit decoupling and instead employ a hierarchically clustered reservoir architecture. These SHESNs perform better in predicting the Mackey-Glass sequence, which is often used as a benchmark for non-linear systems.
Deng and Zhang further noted that SHESNs appeared to be more robust to the choice of spectral radius as they remained stable for a larger range of spectral radius values. They observed stable behavior for r = 6, much higher than in similar ESNs that only displayed stable responses for r < 1. While this does not imply that the echo state property has been violated by increasing r > 1, it does, however, indicate that clustered ESNs may extend the range of permissible values for stable and transient states. In their report, only one set of reservoir parameters was examined for SHESNs and so how r varied with reservoir architecture was not quantified. Thus, it is unclear how the transition from a homogeneous reservoir to a hierarchically clustered network affects the stability of the system; in particular, is stability affected more by the presence of hierarchy or of clusters, or by a combination of both? Furthermore, is the transition from stable to unstable behavior independent of the degree of clustering and does the activity transition through an intermediate regime, such as those observed elsewhere (Ozturk and Principe, 2005)?
To address these issues, we use hierarchical ESNs (HESNs), a modified version of SHESNs, as a case study. We begin with a conventional ESN and introduce hierarchical clusters, charting how stability depends on both reservoir architecture and the spectral radius

IntroductIon
Echo State Networks (ESN) are a type of reservoir networks that have been demonstrated to be successful at predicating non-linear signals, especially those with strong spatiotemporal components (Skowronski and Harris, 2007;Tong et al., 2007;Verstraeten et al., 2007). Proposed by Jaeger (2001), they exploit a reservoir of analog units with random but fixed connections where training affects only weights that project from the reservoir to the output population units. This makes reservoir networks computationally cheap to train in comparison to methods such as backpropagation. The stability of ESNs is assured by fulfilling the echo state property, which ensures that the initial conditions are washed out at a rate independent of the input and prevents the accumulation of activity (Jaeger, 2001). Criteria for fulfilling the echo state property are outlined for a specific subclass of ESNs in Buehner and Young (2006) and for leaky integrator units .
Although their strength derives from the homogeneity of the reservoir and its ability to generate rich non-linear dynamics, the connectivity of ESN reservoirs is random and therefore suboptimal. Previous reports investigated the idea of optimizing the network by modifying the reservoir connections, either by pruning connections (Dutoit et al., 2009) or training connections within the reservoir (Sussillo and Abbott, 2009). Other work has focused instead on modifying network architecture by either including a hierarchy of ESNs to extract features (Jaeger, 2007) or introducing reservoir substructures (Deng and Zhang, 2007;Xue et al., 2007). Both of the latter papers reported an improvement in performance against conventional ESNs for specific tasks with interesting additional properties: For the decoupled ESNs (DESN) presented by Xue et al., an input signal composed of sinusoidal terms was to determine the relative influence of each. To identify the relative contribution of clusters and hierarchy, we compare reservoirs that are clustered but lack hierarchy against those that have both.

MaterIals and Methods network Model
We first consider a generic ESN (Jaeger, 2001), which consists of an input population u, reservoir population x and output population y. These populations are coupled by connection weight matrices W in for input to reservoir units, sparse connection matrix W res for connections between reservoir units and W out for reservoir to output units. Note that all matrices are directed, so generally W ij ≠ W ji . Additionally, ESNs can include feedback connections projecting from y back to x as well as connections directly from the input to output populations. Here, we chose to exclude both as we wanted to consider only the dynamics of the reservoir and their impact on the output population.
The governing equations, modified from the original description in Jaeger (2001), are given by where f and g represent activation functions which are typically monotonic and bounded, such as sigmoidals. We chose both f and g as tanh and scaled and shifted the incoming signal from the input population in order to place it into the optimal operating range of the network. Here, the input signal always consisted of a sequence of impulses, each of unit amplitude. Noise n a is added to the reservoir units in order to stabilize the network during the training process. Unless otherwise stated, noise was uniformly distributed for | | . n a ≤ − 10 6 The input and output populations were chosen to consist of 5 units each. The reservoir populations were set to be of size 50-1000 units (see following subsection for more details). These sizes were chosen for computational tractability. Connections in both W in and W res were randomly and independently assigned with connection probabilities conn in and conn res , respectively. The connectivities were set as W in being fully connected with weights drawn with uniform probability from [−1, 1]. Since this study is concerned with reservoirs for feedforward ESNs, training was only performed for tasks where the output units were relevant, such as the calculation of memory capacity. As for all ESNs, W in and W res remain fixed for the duration of the network simulation so training, when performed, did not affect the structure of the reservoir.

IntroductIon of hIerarchy
Different models of hierarchical networks exist which focus on different aspects of their structure: a clustered or modular architecture (Ravasz et al., 2002), existence of hubs (Sporns et al., 2007;Mueller-Linow et al., 2008), repetition of motifs across different scales (Ravasz and Barabasi, 2003;Sporns, 2006) or a combination of these features. For this study, we focus on a network model based on the SHESNs (Deng and Zhang, 2007), where sigmoidal units were replaced by ESN reservoirs. The resulting reservoir architecture differs from flat modular networks due to the presence of backbone units, which are units within a cluster that connect to backbone units in other clusters, in contrast to intracluster (local) units that only make connections within their own cluster ( Figure 1A). Although no universally accepted definition of hierarchy exists, the majority of models Each cluster contains four local units (white circles) that only project within the cluster and one backbone unit (dark gray circles) that provide connections between clusters. Dashed lines indicate trainable weights. Each unit has a sigmoidal transfer function, bounded on [−1, 1]. (B) Non-hierarchical HESN with similar topology and two backbone units per cluster. By increasing the number of backbone units per cluster, the HESN becomes non-hierarchical and increasingly more homogeneous. (C) Average out-degree distribution P out (k) for HESN with arbitrary configuration (200 units, 20 clusters, conn inter = conn intra = 0.7) against different values for backbone unit per cluster (bbpc), calculated over 50 independent realizations of reservoir. The connectivity does not follow a power law distribution and so is not scale-free, unlike the original SHESN model. a fixed total number of intercluster connections while changing their distribution. This was possible by specifying a larger number of backbone units per cluster and then decreasing the intracluster connectivity to compensate.
In conventional ESNs, the largest eigenvalue of W res (spectral radius r) is used as an indicator of stability of network dynamics. Traditionally, r less than unity is sufficient for zero-input networks to ensure stable dynamics; Buehner and Young (2006), however, outlines an additional criterion necessary to ensure stability, as stability is also observable for ESNs when r > 1 (Ozturk and Principe, 2005;Verstraeten et al., 2007). We examined the effect of increasing r on stability and during our simulations the spectral radius was varied from 0.05 to 2 in increments of 0.05. For each cluster configuration, we determined r max , which we defined as the largest permissible r for which the network remained stable under the criteria outlined in the following subsection.

IdentIfIcatIon of network behavIor
Different possibilities exist for characterizing a specific reservoir configuration (see Lukosevicius and Jaeger, 2007;Schrauwen et al., 2007 for discussion). Of specific interest to us was how sensitive the system was to noise and how reproducible its dynamics were. Here, we considered an approximation of the maximal Lyapunov exponent to determine stability, and the distribution of reservoir activity values upon multiple stimulation with the same sparse stimulus to assess reproducibility. We also calculated the network memory capacity to provide a measure for the functional performance.
A measure of stability is the Lyapunov exponent λ, which relates the divergence ∆ between two trajectories after introducing a small perturbation δ to the elapsed time via ∆ = e λt . For stable systems, the deviation between the two trajectories should converge to 0, corresponding to a negative Lyapunov exponent. The existence of a positive Lyapunov exponent signifies that the trajectories exponentially diverge from each other with time, indicating that the system is sensitive to initial conditions and is thus likely to be chaotic. While the Lyapunov exponent cannot be easily calculated analytically for non-autonomous systems, different approaches for determining approximations of Lyapunov exponents for reservoir networks have been outlined, either based on experimental observation of a trajectory's divergence following a perturbation (Skowronski and Harris, 2007) or theoretically by considering the deformation of an infinitesimally small hypersphere as the center follows a trajectory . Here, we implemented an approximate numerical method, using the former approach (Wolf et al., 1985;Skowronski and Harris, 2007) and estimated the pseudo-Lyapunov exponent by introducing a perturbation δ to the reservoir state at time t δ . Two identical networks, f and p, were created and driven with identical input and n a = 0. At t δ , only the second network p was perturbed, after which the simulation continued to run until t end . The resulting divergences was then measured as the Euclidean distance d x . The networks were always driven by input consisting of a train of impulses, whose average rate was set to be equal to the rate used for the reservoir activity distribution task below. The perturbation δ was applied to the entire reservoir state, uniformly distributed with |δ| = 10 −6 with t end = 100 timesteps. As this technique does not lend itself to estimating the examined in the literature focus on structures with a minimum of two levels of hierarchy. However, our aim was to establish the impact of hierarchy within a reservoir and contrast this effect against a structured but non-hierarchical architecture. Thus our modified SHESNs are minimally hierarchical networks, a property that is lost when the number of backbone units per cluster exceeds 1.
Four important changes were made in our networks as compared to the generation process outlined originally by Deng and Zhang to avoid some of the limitations present in their model with respect to the purpose of this case study: (1) Backbone units were originally detailed as being fully connected to one another. To enable us to decouple the intercluster and intracluster connectivities and examine the effect of varying them independently, we allowed the network of backbone units to have sparse connectivity; (2) To ensure that our network was topology independent, we did not assign a spatial location to the reservoir units; (3) To predetermine the network connectivities and cluster sizes, we kept the number of units per cluster constant within a network, using a seeding method outlined in more detail below; (4) To examine the transition from highly clustered to homogeneous architectures, we allowed more than one backbone unit per cluster which additionally allowed us to inspect non-hierarchically clustered networks ( Figure 1B). These changes introduced three new parameters: intracluster connectivity conn intra , intercluster connectivity conn inter and number of backbone units per cluster bbpc.
The degree-distributions P out (k) and P in (k) (Albert and Barabási, 2002) followed a bimodal ( Figure 1C) rather than the scale-free distribution observed for SHESNs. This can be attributed to our restriction of uniform cluster sizes and the fixed values for conn intra and conn inter . For this reason, we differentiate from SHESNs and instead refer to our model as hierarchically clustered ESNs (HESN).
A network was generated by first specifying the total reservoir size R, the number of clusters n, and the number of backbone units per cluster b. Each cluster was then created for size R/n and connectivity probability conn intra , followed by the generation of the intercluster connectivity matrix. This was performed by identifying the first b units within each cluster as backbone units and then defining a matrix of size (bR/n) with connection probability conn inter . The connectivity weights were drawn randomly from a uniform probability on [−1, 1]. The connectivity matrix of the reservoir W res was then rescaled to set the largest eigenvalue to be equal to the defined spectral radius.
To establish the effect of reservoir size on cluster configuration, we defined several configurations of cluster number and size while keeping the total number of units within the reservoir constant. Reservoir size was limited to 1000 units for computational tractability. Note that a conventional ESN is obtained when cluster size is equal to either 1 or the total reservoir size. As a default configuration for HESNs, we assumed bbpc = 1 and conn inter = conn intra = 0.7. Connection probabilities were always incremented in steps of 0.05. bbpc was generally set to be multiples of 5, except if cluster size was below 10. In these instances, all possible values for bbpc were tested. These additional criteria allowed us to manipulate the reservoir structure and independently vary its degree of hierarchy and clustering. Importantly, we were able to test hierarchically clustered reservoirs against non-hierarchically clustered reservoirs by retaining sparse input sequence repeated every 50 timesteps, and observed three response types across all simulations: (i) the network showed some activation, but this quickly died with exponential decay (Figure 2A). The corresponding reservoir activity values had both low mean and low standard deviation. (ii) The network either never stabilized or was unstable after the presentation of the first stimulus ( Figure 2C), with large mean reservoir activity values with low standard deviation; or (iii) transient behavior, which was similar to stable behavior except Lyapunov exponent by combining estimates of the same system (Wolf et al., 1985), we classified each network configuration by calculating the Lyapunov exponent for 100 independent realizations and determining the mean.
To relate this somewhat abstract notion of stability to a quantitative description of reservoir dynamics, the distribution of reservoir activity values in response to a sparse stimulus was also considered. This was determined by first defining a stimulus consisting of fixed pattern of input impulses presented multiple times at a constant interstimulus interval of either 10, 50, 100, or 1000 timesteps. We calculated the total reservoir activity for each timestep and normalized it by the number of reservoir units before computing the mean and standard deviation. This was repeated for 100 independent realizations of the same network configuration with n a = 0.
The memory capacity MC for different HESN configurations was determined by presenting input signals to the network to be reproduced with increasingly longer delays between presentation and retrieval. MC is obtained by summing the correlation coefficient between the output signal and the input signal for each delay var( ( ))var( ( )) , 2 as described in Jaeger (2002) and Verstraeten et al. (2007). Here, we calculated MC for k = 1-20 timesteps for an input signal consisting of train of exactly one delta input present at each timestep, distributed equally across all five input channels.

results
The aim of the current study was to establish the impact of reservoir substructures -particularly hierarchical clustering -on the stability of ESNs. We begin by analyzing the effect of hierarchical clustering within reservoirs for a fixed spectral radius, considering underlying factors such as the number of backbone units per cluster and intercluster connectivity. We then test whether clustering by itself is sufficient to alter the upper bound of permissible values for the spectral radius. Lastly, we consider the memory capacity for various network configurations.

Impact of clusters on stabIlIty
We aim to determine the effect of hierarchical clustering on the limits for spectral radius values for stability and began by systematically examining the impact of cluster on dynamics for different cluster configurations while the spectral radius remained constant. We considered a simple HESN with a fixed r = 1.2, conn inter = conn intra = 0.7 and assumed only one backbone unit per cluster.
Various network configurations were examined by first fixing the total reservoir size and then increasing the number of clusters ( Table 1). We determined the reservoir activity distribution, using very For n = 2 clusters, stable dynamics occur where responses fade within a highly reproducible and fixed time-period, illustrating that the dynamics are input driven only. Shown is for both the normalized total activity rate (top) and reservoir activity (bottom). (B) With n = 20, transient dynamics were similar to stable but were characterized by activity decaying to a non-zero baseline. Baseline activity values for identical network configurations displayed high variability across trials. (C) Unstable dynamics for n = 50, where the system never returns to zero activity after the first input.  2,4,5,8,10,20,25,40,50,100,125,200,250,500 The pseudo-Lyapunov exponent λ was also calculated for the same set of network configurations ( Figure 3C). Only negative values were observed, indicating that all configurations were nonchaotic for r = 1.2. However, the magnitude of the exponent varied and revealed that configurations of intermediate cluster size had the lowest value for λ . This result also demonstrates that the responses of networks with a negative maximum pseudo-Lyapunov exponent do not necessarily decay to 0, and that ongoing activity in feedforward ESNs (Figures 2B,C) does not necessarily imply a chaotic regime.
To examine how these trends for stability and chaoticity developed as the spectral radius increased, we analyzed different cluster configurations for a HESN with a fixed reservoir size and considered both distribution of reservoir activity values and the largest pseudo-Lyapunov that the reservoir activity does not return to 0, but rather to some nonzero baseline activity ( Figure 2B). Although the non-zero baseline remained more or less constant within a trial, its value varied between trials, leading to a standard deviation of normalized reservoir activity values that was far higher than of either stable or unstable reservoirs, where the standard deviation of reservoir activity was <0.05.
The stability of a network did not depend on absolute reservoir size, absolute cluster size or the number of clusters alone, but rather on the combination of these factors (Figures 3A,B). Unstable responses occurred in large networks with a small number of larger clusters, while stable responses were observed for configurations with smaller clusters. Transient responses occurred in a broad region between stable and unstable responses and for configurations of intermediate cluster size. Jarvis et al.
Hierarchy in Echo State Networks exponent r max for three different parameters related to reservoir architecture: reservoir size R, the number of backbones per cluster bbpc and intercluster connectivity conn inter .
To determine how the span of r for stable networks was affected by total reservoir size, the simulation was repeated for R = 100, 200, and 500 ( Figure 4A). As the reservoir size increased, r max strongly decreased. The location of the peak value for r max , however, remained relatively constant when plotted against relative cluster size (n/R) and occurred for clusters sizes 2-5% of the total reservoir size.
Increasing the number of backbone units per cluster makes the reservoir more homogeneous, which should have a greater impact on the spectral radius range for networks with smaller clusters. Increasing the number of backbone units per cluster to bbpc = 5 for R = 200 ( Figure 4B) yielded a decrease in r max and caused the distribution to become progressively more uniform. The location of the peak value for r max appeared to remain unchanged for bbpc ≤ 2. To test whether increasing bbpc affects reservoir activity and the pseudo-Lyapunov exponent, we examined the various size networks previously tested using a fixed r = 1.2 and bbpc = 5 (Figures 5A-C). We observed that many previously stable networks now displayed higher values for both mean reservoir activity and the pseudo-Lyapunov exponent, confirming that this effect was a general property of increasing bbpc. exponent λ (Figures 3D-F). We set R = 200 and kept all other parameters identical to those used in the previous task, but increased r from 0.05 to 4 in 0.05 increments. Our results clearly demonstrate that both the mean reservoir activity value and the pseudo-Lyapunov exponent were lower for the same spectral radius value when clusters were present, with the lowest values occurring for reservoirs with 20-50 clusters. This confirms that the addition of hierarchical clusters extends the permissible range of spectral radius values and is consistent with the trend observed in Figure 3C, with cluster configurations of intermediate size displaying the largest range of negative Lyapunov exponents.
Increasing the input population from 5 to 50 units for bbpc = 1 led to more network configurations with unstable responses (not shown), corresponding to an extension of the region of instability in Figure 3A toward networks with a larger number of clusters. We hypothesize that the larger input population provides more activation to the reservoir, which saturates network activity. Thus, networks with larger clusters are able to resist network activity saturation better than networks with smaller clusters.

spectral radIus range
To establish the effect of other network parameters on the range of permissible spectral radius values, we determined the maximal spectral radius value that resulted in a negative pseudo-Lyapunov (B) Increasing bbpc drastically decreases the peak value of r max , while (C) decreasing conn inter only slightly increases r max across different relative cluster sizes. Note that the default network configuration for this task is signified by red in all plots. The cumulative memory capacity ∑ = k t k MC 1 was determined for increasing delays t from 1 to 20 while r was fixed at 0.7 (Figure 8). MC increased with decreasing the number of clusters, with a maximum of 8.04 for n = 1. The higher plateau level indicates that there was no contribution to MC for delays higher than 10 timesteps.
It has been observed for ESNs that sparser connectivity results in lower error rates. If we treat each cluster as being comparable to a single unit within an ESN reservoir, then the performance of HESNs should increase with lower intercluster connectivity conn inter . Since high performance requires stable network configurations, we expect some dependence of r max on conn inter . Our findings reflect this, with r max increasing for network configurations with larger numbers of clusters as conn inter was decreased ( Figure 4C). The location of peak r max at n = 20, however, remained unchanged by decreasing conn inter .
As varying the spectral radius only changed the values but not the structure of the distribution of eigenvalues, we expected that any change in r max for increasingly clustered networks is likely due to differences in the eigenvalue distribution, rather than the spectral radius itself. We therefore analyzed the eigenvalue spectra of 20 realizations of each cluster configuration previously examined. Eigenvalue spectra for configurations with a small number of clusters were homogeneous (Figure 6A), similar to those obtained for ESNs. The spectra became increasingly more focused around the origin as clustering was increased ( Figure 6B). However, the eigenvalue spectra partially dispersed again for reservoirs with a high number of clusters, e.g., n = 500, and appears similar to Figure 6A, but now with approximately 40% of the eigenvalues located at the origin. The distribution of eigenvalues sorted by distance from the origin summarizes this for all cluster configurations ( Figure 6C). The distributions increasingly deviated from a homogeneous distribution as the number of clusters increased, due to the progressively more significant contribution of the intercluster connections.

hIerarchIcal clusters
Hierarchy is determined by the degree distribution while clustering is determined by the degree count. To establish the relative influence of each on network dynamics, we compared four network configurations with varying bbpc and conn inter values. Specifically, we chose values for both parameters that would lead to identical degree counts but different degree distributions ( Table 2).
Our results in the previous section demonstrate that r max is decreased by increasing bbpc (Figure 4B) or decreasing the intercluster connectivity. These trends were confirmed while testing the degree counts of networks (Figure 7). While we observed no significant difference for reservoirs with five clusters or less, as the number of clusters increased, so did the difference between r max for hierarchical and non-hierarchically clustered networks. The range of r max was higher for the hierarchically clustered networks than for the nonhierarchically clustered networks. We can conclude therefore that it is the degree distribution and thus hierarchy, rather than degree count and clusters, that exerts the larger influence on the range of r max .

MeMory capacIty
So far, we considered only measures related to the stability using measures related to reservoir dynamics. To reconcile the affect of clustering with measures related to the performance of a ESN, we calculated the memory capacity MC of HESNs while varying the number of clusters n, intercluster connectivity conn inter , and bbpc separately to establish their individual influence as r is increased (R = 200, conn intra = conn inter = 0.7, and bbpc = 1, unless otherwise specified). For each network configuration, MC was calculated as the mean of twenty independent realizations. To visualize how the distribution of eigenvalues varied as clustering was increased, eigenvalues were sorted by their distance from the origin and plotted for r = 1 (C), with each curve representing the mean of 20 realizations for each cluster configuration. Reservoirs with n > 25 have a non-uniform eigenvalue distribution with more than 30% of their eigenvalues concentrated at the origin. Non-hierarchical to 9.8 (bbpc = 5) for n = 40. Overall, increasing bbpc led to a more homogeneous distribution of MC values, reflecting that a network with n = 1 can be approximated by networks with high n and high bbpc values. MC was compared for conn inter = 0.2, 0.3, 0.7, and 1, which were chosen for comparison with previous simulations. Here we examined MC across the four cluster configurations with r increasing from 0.05 to 4 in increments of 0.05 (Figure 9B). MC was maximal for both n = 1 and 5 when r was slightly larger than 1. As expected, MC was independent of conn inter , since conn inter has little to no influence in networks with few clusters. While MC was low in networks with n = 25 and 40, their MC was also maximal for r > 1 but, importantly, decreased as conn inter increased. This is likely because intercluster connections became more significant as the number of clusters increased and thus the dominance of intracluster connections decreased in favor of intercluster connections.

dIscussIon
ESNs with structured reservoirs have been suggested to perform better in predicting non-linear signals (Deng and Zhang, 2007;Xue et al., 2007). Furthermore, clustered reservoirs appear to extend the range of permissible spectral radius values and result in more robust reservoir networks. Our results demonstrate that the addition of clusters within a reservoir does increase the range of spectral radius values that can be chosen, but that this increase does not occur uniformly. Specifically, we charted how the upper limit for the range of the spectral radius r that resulted in stable dynamics depended on the reservoir structure and showed: (1) r max varied with the number and size of clusters; (2) the range of r max was flattened as the number of backbones per cluster was increased and the reservoir becomes more homogeneous; (3) decreasing the intercluster connection probability conn inter increased r max but did not affect the location of the peak value for r max . These findings demonstrate that several factors interact to set the upper limit for spectral radius within HESNs. As we could examine only a small subset of all possible combinations of parameter values, however, varying multiple parameters simultaneously may have unpredicted effects on reservoir dynamics.
The criterion for stability within ESNs is that the echo state property is fulfilled; for purely feedforward ESNs, r < 1 is a sufficient but not necessary condition as it ensures that the eigenvalues are scaled such that the system always contracts for any possible input (Jaeger, 2001). This property is evident in our plots, as the estimated Lyapunov exponent and the activity levels are independent of clusters when r < 1. However, increasing r beyond 1 does not directly violate the echo state property and there has been recent interest in methods to identify the upper bound (Buehner and Young, 2006). As our results and those from others (i.e., Ozturk and Principe, 2005;Verstraeten et al., 2007) demonstrate, it is easily possible to choose r to be larger than 1 and still avoid unstable states. Understanding the interaction between different structural properties should assist in the identification by quantifying bounds for the spectral radius for different reservoir structures, although we were unable to determine a universal criterion for the echo state property in the networks we examined, We examined the effect of varying bbpc and conn inter for four cluster configurations with n ≤ 40 to ensure we could use the same cluster configurations with bbpc = 5. Based on curves obtained for Figure 8, we selected n = 1, 5, 25, and 40 as a representative cross-section. For 0.05 ≤ r ≤ 4 using bbpc = 1, 2, and 5 and the cluster configurations as described above (Figure 9A), increasing bbpc had no difference for networks where n ≤ 5. The effect of increasing bbpc, however, was most obvious for networks with larger n, with the maximum MC increasing from 7.4 (bbpc = 1) The corresponding increases in r max are consistent with trends described above. The relative influence of hierarchy and clustering can be observed by considering the two networks with the same degree count but different distributions: bbpc = 1, conn inter = 1, and bbpc = 5, conn inter = 0.2. For n > 5 clusters, the former configuration has higher values of r max , indicating a greater dependence of r max on hierarchy rather than clustering. hierarchically clustered reservoirs against cluster configurations that had the same number of intercluster connections that were, however, distributed over more than one backbone unit per cluster. Our findings indicate that, while the choice of cluster configuration does influence the value of r max , the range of spectral radius values for stable ESNs is affected more by the distribution of intercluster connections onto cluster units, rather than their number. Therefore, we conclude that the presence of hierarchy, rather than clustering per se, is responsible for the extended range of permissible r values. This characteristic is likely due to the ability of hierarchically clustered networks to reduce the impact of unstable behavior: if one cluster becomes unstable, its influence on other clusters is minimized by a low number of backbone units, which act as bottlenecks. This behavior has been observed for tri-state networks (Kaiser et al., 2007). Consistent with this, increasing the number of intercluster connections decreased network stability. This is supported by analysis of reservoir unit activity traces in networks with transient responses, where the activity trace showed one cluster that was strongly active.
One question of special interest was whether there was "golden ratio" of cluster sizes: Given a total reservoir size, is there an optimal cluster size for that will maximize r max ? For the reservoir sizes we examined, the configuration that resulted in the maximal r max for R = 200 was 25 clusters of 8 units each, corresponding to cluster sizes of 4% of reservoir size. For R = 100 and 500, these values were 20 clusters of 5 each (5%) and 50 clusters of 10 units (2%), respectively. These results suggest that cluster sizes of 2-5% of the total reservoir size are optimal. The benefit of having such small cluster sizes can be seen by considering a network with only a few large clusters: if one cluster becomes unstable, the instability of that cluster drives the baseline activity rate higher, reducing the elasticity of the reservoir. As successive clusters become unstable, the baseline rises and the computational potential of the reservoir decreases. For large cluster sizes, this increase happens quite quickly; in contrast, the loss of a small cluster results in smaller increments and therefore a slower transition to instability. Importantly, this interpretation holds true only when clusters have little impact on the activity of other clusters and so is In homogeneous ESNs, larger r values result in a slower decay of the network's response to an impulse (Jaeger, 2001) and strongly affect performance (Venayagamoorthy and Shishir, 2009). It is thus important to choose r appropriately. For example, a HESN with 200 units and four clusters was unstable when r = 1.2, but was stabilized when the number of clusters was increased. Thus, configurations that would previously have been dismissed due to instability can now be reconsidered, allowing greater flexibility in the design of clustered reservoirs. We were also able to demonstrate that the presence of clusters is reflected in the memory capacity MC. Clustered reservoirs displayed lower peak MC values; importantly however, MC values decreased more slowly as r increased for clustered reservoirs when compared to non-clustered reservoirs for r > 1.5. These factors together imply that clustered ESNs are more robust over a broader range of spectral radius values than traditional ESNs.
We were also able to map the transition between stable to unstable responses through an intermediate regime, where the reservoir activity decayed to a non-zero baseline after the first stimulus was presented, resembling the transient activity identified by Ozturk and Principe (2005). The baseline of reservoir activity was generally constant for each realization of an HESN. For all reservoir configurations, the mean reservoir activity increased as r grew larger. The level of ongoing activity can be thought of as corresponding to the elasticity of the system: as the ongoing activity increases, the potential energy of the system decreases. The gradient at which this transition to instability occurs as the spectral radius is increased, therefore, greatly affects the maximal spectral radius that can be chosen. Our findings indicate that this transition was slowest for 40-50 clusters in reservoirs with 200 units (Figures 4D,E), which is reflected in the slow decrease in MC for the corresponding network configurations.

effect of hIerarchIcal clusters on dynaMIcs
One question raised implicitly following the original observation of an extended range of r max was whether it was increased by the introduction of hierarchy, or whether merely a clustered topology was sufficient. As hierarchically clustered reservoir are characterized by having only one backbone unit per cluster, we compared allows us to conclude that, although the location of the transition to a chaotic regime depends on input and perturbation magnitude, reservoir stability remains strongly determined by not only the presence but also the size of clusters within the reservoir.

conclusIon
The total reservoir size, number of clusters, backbone clusters per unit, and intercluster connectivity all affect the range of permissible spectral radius values and we demonstrated that their interplay determines the upper boundary for the spectral radius. Specifically, the range of permissible spectral radius values strongly depended on the ratio of cluster size to total reservoir size. Hierarchy, rather than clustering alone, had the largest impact on the range of spectral radius values for stable networks. Furthermore, increasing the intercluster connectivity extended the range of spectral radius values for stable ESNs, while increasing the number of backbone units per cluster had the opposite effect.
The transition from stable to unstable dynamics was characterized by responses with varying levels of ongoing activity, even in the absence of any stimulus. The amount of ongoing activity increased as the spectral radius was raised, leading to progressively more unstable reservoir dynamics. Importantly, the rate at which this transition occurred was slowest for hierarchically clustered reservoirs with clusters of size in the range 2-5% of the total reservoir size. We suggest that this is due to backbone units acting as bottlenecks that partition the reservoir, which minimizes the influence of any unstable units by limiting their impact to their own cluster. This effect is optimal when clusters sizes are small enough that the loss of any single cluster does not greatly affect the overall reservoir dynamics, leading to a graceful degradation of reservoir performance, but large enough for them to still contribute to reservoir dynamics. We conclude that hierarchy is a crucial feature for extending the range of permissible spectral radius values within feedforward ESNs.

acknowledgMents
We would like to thank H. Jaeger for extensive and valuable feedback and T. Gürel for helpful comments and discussion. This work was supported by the German BMBF (FKZ 01GQ0420, FKZ 01GQ0830) and by the EC (NEURO, No. 12788).
only observed in hierarchically clustered networks, due to the presence of bottlenecks which allow the HESN's performance to degrade gracefully.

QuantIfyIng esns
One problem that has often been discussed is how to measure the "goodness" of a given ESN . While more primitive measures, such as the error rate, can be easily applied, they are task dependent and therefore highly subjective. Most importantly, they do not supply any additional information as to how the reservoir structure impacts on performance. Other measures that examine reservoir activity, such as cross-correlation of reservoir unit activity or entropy of reservoir states (Ozturk et al., 2007), have been used to relate performance to reservoir dynamics. Here, we used the normalized reservoir activity, a metric based on mean reservoir activation following a sparse input applied at a fixed frequency. While this measure has limited use, it was well-suited to indicate the level of ongoing activity, which was of particular interest to us.
A more sophisticated measure is the Lyapunov exponent, which indicates how chaotic a system is. However, this measure has strong shortcomings associated with it. Generally, Lyapunov exponents are difficult to estimate for non-autonomous systems, such as stable feedforward ESNs, which must be externally driven. Correspondingly, any estimate of the Lyapunov exponent is sensitive to the input used, making generalization difficult. On the other hand, there is no clearly better method to estimate the Lyapunov exponent: while the theoretical approach used by Verstraeten et al. (2007) allows for simultaneous consideration of all trajectories in the vicinity of the operating point, it can only estimate positive Lyapunov exponents. As we explicitly wanted to observe how quickly a network configuration moved from stable to chaotic behavior, we chose to approximate the Lyapunov exponent numerically. However, the disadvantage of numerically approximating the Lyapunov exponent is that it is inexact and time-consuming to calculate. Furthermore, determining the Lyapunov exponent by applying a reservoir perturbation is also subjective to the magnitude of the applied perturbation. We were able to demonstrate that for perturbations both two orders of magnitude larger and smaller than δ = 10 −6 , the same dependency of stability on the number of clusters was still present and with the same structure. This observation