Optimal Hierarchical Modular Topologies for Producing Limited Sustained Activation of Neural Networks

An essential requirement for the representation of functional patterns in complex neural networks, such as the mammalian cerebral cortex, is the existence of stable regimes of network activation, typically arising from a limited parameter range. In this range of limited sustained activity (LSA), the activity of neural populations in the network persists between the extremes of either quickly dying out or activating the whole network. Hierarchical modular networks were previously found to show a wider parameter range for LSA than random or small-world networks not possessing hierarchical organization or multiple modules. Here we explored how variation in the number of hierarchical levels and modules per level influenced network dynamics and occurrence of LSA. We tested hierarchical configurations of different network sizes, approximating the large-scale networks linking cortical columns in one hemisphere of the rat, cat, or macaque monkey brain. Scaling of the network size affected the number of hierarchical levels and modules in the optimal networks, also depending on whether global edge density or the numbers of connections per node were kept constant. For constant edge density, only few network configurations, possessing an intermediate number of levels and a large number of modules, led to a large range of LSA independent of brain size. For a constant number of node connections, there was a trend for optimal configurations in larger-size networks to possess a larger number of hierarchical levels or more modules. These results may help to explain the trend to greater network complexity apparent in larger brains and may indicate that this complexity is required for maintaining stable levels of neural activation.


INTRODUCTION
Complex systems operate within a critical functional range (Bak et al., 1987), sustaining diverse dynamical states on the basis of their intricate system architecture. Criticality is associated with the phase transition between ordered and chaotic dynamics, and systems tuned to the critical point produce power-law distributions in their dynamics. Recent studies indicate that brain networks also operate close to a critical point. Evidence for this comes, for example, from the observation of neuronal avalanches (i.e., bursts of activity separated by longer periods of relative rest) with a power-law size distribution in cortical slices (Beggs and Plenz, 2003), and from time series analysis of EEG data (Freeman et al., 2000) showing that the power spectral density of background activity follows a power law. Critical activity has also been demonstrated in human brain functional networks (Kitzbichler et al., 2009). While its functional signifi cance is still not well understood, it has been suggested that critical dynamics may enhance information processing capabilities of neuronal networks (e.g., Bertschinger and Natschläger, 2004). This idea is supported by work showing that the dynamic range in an excitable network is optimized at criticality (Kinouchi and Copelli, 2006). Given these fi ndings, it is desirable to obtain a better understanding of the conditions for criticality in complex excitable networks.
Two central topological features of brain networks, in particular of the cerebral cortex, are their modular and hierarchical organization. A modular hierarchical organization of cortical architecture and connections is apparent across many scales, from cellular microcircuits in cortical columns (Mountcastle, 1997;Binzegger et al., 2004) at the lowest level, via cortical areas at the mesoscopic scale, to clusters of highly connected brain regions at the global systems level (Hilgetag et al., 2000;Breakspear and Stam, 2005;Kaiser, 2007). The precise organization of these features at each level is still sketchy, and there is exists controversy about the exact organization or existence of modules even at the level of cortical columns (Rakic, 2008;Smith, 2010). Nonetheless, current data and concepts suggest that at each level of neural organization clusters arise, with denser connectivity within than between the modules. This means that neurons within a column, area or cluster of areas are more frequently linked with each other than with neurons in the rest of the network.
The spreading of activity has been modeled for cortical networks (Kötter and Sommer, 2000) and other complex networks with a non-random organization (Pastor-Satorras and Vespignani, 2001). In a previous study of activity spreading through different topologies of excitable networks (Kaiser et al., 2007a), we showed that patterns of limited but sustained activity are well supported by the organization of hierarchical multi-modular networks, but not random or simple small-world networks (Watts and Strogatz, 1998) of the same size. In addition, such properties arose without the need for explicit inhibitory feedback or external input, demonstrating the signifi cant role of network topology in sustaining and limiting neural activation (Latham and Nirenberg, 2004;Roxin et al., 2004).
While our previous study (Kaiser et al., 2007a) demonstrated that a wide range of initial parameters in hierarchical modular networks could result in LSA, it did not clarify whether this range was due mainly to the multi-modular organization of the network or its hierarchical structure. In the present study, we investigated the relation of different hierarchical network confi gurations to the range of LSA more extensively. The principal type of hierarchical structure, an interconnected set of modules with encapsulated sub-modules without explicit hub nodes, as well as the settings for the dynamic mechanisms were preserved from our previous model. A fi xed number of nodes was activated at the beginning of each dynamical simulation. Other nodes became activated when at least k of their directly connected node neighbors were active at the same time. Each active node deactivated in the following time step with probability v. Note that this model only assumes initial activation at time step 0, but no ongoing external input or internal random activation.
Our hierarchical topological model refl ects general features of brain connectivity at the large and mesoscopic scale, in particular the modularity of neural networks across scales. Nodes in the model are intended to represent cortical columns (Mountcastle, 1997) rather than individual neurons. Connections between columns are modeled as exclusively excitatory, since it is appears to hold that there are no long-distance inhibitory connections within the cerebral cortex (Latham and Nirenberg, 2004). However, nodes can also deactivate (controlled via the model's deactivation probability) due to intrinsic inhibition or exhaustion, as observed for cortical tissue after prolonged fi ring, for instance in epileptic seizures (Milton and Jung, 2003).
Two main parameters were explored in the hierarchical networks, the number of hierarchical levels and the number of sub-modules at each level (cf. Figure 1). These parameters were varied, while other topological features, such as the probability that any two nodes are connected, or alternatively, the average number of connections per node, were kept constant. We explored whether optimal hierarchical confi gurations existed, in which the proportion of tested cases with LSA was at a maximum. In addition, we tested whether the parameters for such optimal confi gurations changed with network size; that is, whether small networks, representing the approximate number of columns as in a small rodent (rat) brain, had different optimal settings than larger cortical networks that might refl ect the number of column nodes in larger mammalian (cat) and primate (macaque) brains. The network was generated beginning with the highest level and adding modules to the next lower level with random connectivity within modules. The resulting networks were similar to the ones produced by an alternative procedure (Sporns, 2006), but differed in the generating algorithm (we present pseudocode here, the actual Matlab algorithm is available online at http:// www. biological-networks.org): for i from 0 to h − 1 for all hierarchical levels proportion of the adjacency matrix occupied by one module at the current hierarchical level number of nodes in a module N c = N/N i number of modules at the current level for j from 1 to N c c i = random graph with N i random N i ×N i graph with and edge density p i edge density p i r 0 = 1 + (j − 1) N i fi rst node of the module r 1 = r 0 + N i − 1 last node of the module CIJ(r 0 to r 1 , r 0 to r 1 ) = c i part of the matrix CIJ is replaced with c i for i from 1 to N CIJ(i, i) = 0 remove connections across the diagonal (loops) The algorithm involves the following steps: Starting with an empty adjacency matrix CIJ, modules at hierarchical level i are added starting with modules at the hierarchical level h = 0 (global network). The ratio of matrix elements that represent potential connections within any module A i depends on the hierarchical level i and the number of sub-modules per module m. Based on A i and the number of edges at that hierarchical level, E i , we can determine the probability p i that any two nodes within a module are connected (edge density within modules). Next, the number of nodes in each module N i is calculated leading to the total number of modules N c at that level. Then, each module given as a random graph with edge density p i is inserted in the adjacency matrix CIJ. Finally, edges along the diagonal (loops) are removed from the network. Due to this removal, the total number of edges might be slightly lower than the desired number of edges. In these cases, additional edges are added randomly to the network to generate the desired total number of edges and edge density (not shown in the pseudocode; however, see Matlab routine online).
We explored hierarchical networks with different numbers of hierarchical levels, h (scales), and numbers of sub-modules at each level, m (granularity). A network without hierarchical levels forms a random network, with one level a "fl at" modular network, two levels a network with modules and sub-modules, and so on (Figure 1).

SPREADING MODEL
A basic spreading model (Newman, 2005) was modifi ed to simulate the propagation of activity through the network. This dynamic model was identical to the one used in Kaiser et al. (2007a).
The simulation operated in discrete time steps, with nodes being in one of two states, active or inactive.

MATERIALS AND METHODS
Calculations were performed on a 16-core HP ProLiant server using the Linux version of Matlab R2009a (Mathworks Inc., Natick, MA, USA). Scripts are available at http://www. biological-networks.org and are part of CARMEN (http://www.carmen.org.uk).

ANATOMICAL CONSTRAINTS
We investigated if the topology of optimal hierarchical networks, leading to a maximum parameter range of LSA, varied with brain size. For this approach, the number of nodes was set to the number of columns estimated to exist in one cortical hemisphere in different species. The number of columns was estimated from the surface size of one cortical hemisphere in rat (Nieuwenhuys et al., 1998), cat (Nieuwenhuys et al., 1998), and macaque (Felleman and van Essen, 1991) under the assumption of each (macro-)column occupying 1 mm 2 . Real columns might be smaller and we elaborate on the role of column size differences across areas and species in the Section "Discussion". We explored three networks with different surface sizes for one hemisphere, rat-like (300 nodes; 3 cm 2 surface), cat-like (4,150 nodes; 41.5 cm 2 surface), and macaque-like (11,000 nodes; 110 cm 2 surface). Note that these are very simple estimates based on the assumption that columns in different species are comparable in the basic circuit layout even though the absolute number of neurons may vary (Herculano-Houzel et al., 2008). Either edge density or average number of edges per node (average degree 〈k〉) was kept constant across network sizes. The edge density was set to 1.2%, corresponding to the one chosen in a previous study (Kaiser et al., 2007a) and is close to values of 0.48% for a model of the rat cortex.
The constraint of a constant average node degree was motivated by comparative studies showing largely constant numbers of connections per neuron across many species (Hellwig, 2000;Schüz and Braitenberg, 2002;Binzegger et al., 2004;Striedter, 2004;Changizi and Shimojo, 2005;Schüz et al., 2006). At the column level, it can also be reasoned that columns are mostly connected with adjacent columns on the cortical surface (short-distance; intra-areal) and only a few columns in different areas (long-distance; inter-areal connections). Due to this presumed homogenous arrangement of cortical networks, the number of connections per column should be independent of the total number of columns in the network. The average degree was set to the arbitrary but fi xed number of 50. Note that the actual values for edge density or average degree might differ from the ones chosen here without changing the principal conclusions of this study, as results across different networks were compared qualitatively.

GENERATING HIERARCHICAL NETWORKS
Alternative approaches exist for generating a hierarchical network with m sub-modules per module and a total number of levels h. As a default, we settled on a strategy in which the total number of edges E was distributed to the different levels (see Figure 1) with E i edges on level i, so that each level received the same number of edges: E i = E/(h + 1). This model, which was used throughout the study, preserved a constant number of edges when the number of levels or sub-modules within modules was varied.
We used a simple threshold model for activity spreading where a number i of randomly selected nodes was activated in the fi rst step. At each subsequent time step, inactive nodes became activated if at least k neighbors were currently active (neighbors of a node are nodes to which direct connections exist). Activated nodes could become inactive with probability v in the next time step, or otherwise stayed active.
An additional parameter was the extent of localization of the initial activation, i 0 . For initialization, i (i ≤ i 0 ) nodes among the nodes 1 to i 0 were selected randomly and activated in the fi rst time step. The networks nodes were numbered consecutively. For instance, for a network where the largest modules at the highest level contained 100 nodes and where each module contained 10 sub-modules with 10 nodes each, by setting i 0 to 10, 20 or 100, the fi rst sub-module, the fi rst two sub-modules, or the fi rst module, respectively, were activated during initialization. Thus, i determined the number of initially active nodes while i 0 controlled the localization of initial activations, with smaller values resulting in more localized initial activity.

CALCULATING THE AVERAGE RANGE OF LIMITED SUSTAINED ACTIVITY
We systematically explored the network activation resulting from different settings of the initial node activation and localization parameters. Persistent contained activity in hierarchical networks (e.g., intermediate-level trace in Figure 2A) existed for a wide range of initial localization and activation parameters (indicated by gray fi lled circles in Figure 2B).
We also explored if the results were robust for variations in the dynamic model parameters k and v, by using a Monte Carlo approach in which, for each pair of k and v, spreading simulations with randomly chosen parameters i (number of initially activated nodes) and i 0 (localization) were tested ( Figure 2B). A trial was considered to show sustained activity if at least one but at most 50% of the nodes were activated at the end of the simulation (after 200 steps). In our experience, activity did not further die out or spread through the whole network if such an activity level was reached at the end of the simulation. For each pair of spreading parameters k and v, the average proportion of cases for which sustained activity occurred was charted ( Figure 2C). This proportion is specifi ed by the ratio of gray fi lled circles relative to all data points in Figure 2B. The threshold k ranged from 1 to 9 (step size 2), while the deactivation probability v ranged from 10 to 90% (step size 20%). Therefore, the average ratio over all entries in Figure 2C refl ected the size of the parameter space for a given network topology which could give rise to LSA, taking into account the initialization parameters i and i 0 as well as the dynamic model parameters v and k.
We tested the proportion of cases with sustained activity for different hierarchical confi gurations. These confi gurations varied in the number of hierarchical levels, from 0 for random networks to 4, and the number of sub-modules into which each module was divided for creating the next-lower hierarchical level. For each confi guration, different values for the threshold k and the deactivation probability v were tested, in that for each (k, v) pair, 200 runs were performed and the network state was observed after 200 time steps leading to a classifi cation as dying-out, sustained, or spreading activity. For these 200 runs, the number of initially activated nodes i and the localization parameter i 0 was chosen randomly (see range in Figure 2B). The average proportion of sustained activity cases for each confi guration was plotted as gray-scale value for Figure 4 and the subsequent fi gures.

INACCESSIBLE PARAMETER RANGE
For all network sizes, a variety of hierarchical confi gurations could not be realized, due to the limited network size (regions indicated by horizontal lines in Figure 3 and subsequent fi gures). Inadmissible confi gurations were those where the smallest module at the bottom level would have contained more edges than there were edges possible between module members; that means where N c (N c − 1) < E c (N c : number of nodes; E c : number of edges in the smallest module). Note that the number of sub-modules per module was varied in steps of two (2, 4, 6, 8,…). Variations by a different step size might have produced more clearly apparent differences in inaccessible confi gurations between small (300 nodes) and large (11,000 nodes) networks.

FIGURE 2 | Determining the parameter range of limited sustained activity (schematic overview). (A)
For several trials (shown here: 30 runs), it was tested whether activity spread through the whole network (here: activating 80% of all nodes), died out (all nodes becoming inactive), or was sustained at an intermediate level (here: activating 10 or 20% of all nodes). Note that even during complete spreading, not all the nodes were constantly active, due to the inactivation probability v specifi ed in the dynamic model. (B) Simulations were run for different combinations of the number of initially activated nodes i and the localization parameter i 0 . For each run, the simulated activity died out (•), spread through the whole network (o) or was sustained within a limited compartment of the network (•). (C) The parameter space of simulations was further explored for different combinations of deactivation probability v and activation threshold k. Gray levels for each parameter combination in the diagram refl ect the percentage of cases giving rise to LSA (from subplot B). The average value across all entries was taken as the fi nal measure of the parameter range of LSA for a particular network topology. It refl ects the average proportion of limited sustained activation cases obtained across all parameter settings for a given hierarchical modular network.

EXPIRING, LIMITED SUSTAINED AND COMPLETELY SPREADING ACTIVITY PATTERNS
How does activity change over time for different parameter settings? In Figure 3 we give examples for different outcomes in a network with 512 nodes, two hierarchical levels, and eight modules with eight sub-modules per module. Each sub-module contains eight nodes. Modules are represented by gray shading where the individual gray levels represent sub-modules. Blue dots indicate that a node is active at a certain time step.
For expiring activity ( Figure 3A), initial activity quickly died out as active nodes became de-activated and not enough active neighbors existed to sustain the activity. For LSA (Figure 3B), modules and sub-modules became activated indicating that a critical number of neighbors of a node were active and able to (re-) activate a node. For completely spreading activity ( Figure 3C), activity that was initially contained in one module or several sub-modules managed to spread to other parts of the network and quickly led to complete network activation. This time-course of an early focus of activity with a rapid spread to the whole network may be compared to the generalizations of seizures in epilepsy patients. Note that the blue lines in Figure 3A as well as the large blue areas in Figures 3B,C also contain nodes which are not active (see inset of Figure 3B); however, these nodes are not visible in the fi gure due to the dot size and image resolution.

TOPOLOGICAL AND SMALL-WORLD PROPERTIES OF HIERARCHICAL NETWORKS
For all tested network sizes, the generated hierarchical networks (h ≥ 1) possessed characteristics of small-world networks (Watts and Strogatz, 1998), in that the clustering coeffi cients (the average frequency with which neighbors of a node are directly connected) were much higher than for same-size Erdös-Rényi random networks (Erdös and Rényi, 1960), whereas the characteristic path lengths (the average number of connections on the shortest path between any two nodes) remained comparable to those for random networks of the same size (Tables 1 and 2). Note that networks with only one hierarchical level represent the special case of simple modular networks.
The characteristic path length for the case of constant edge density ( Table 1) was particularly high for the 300-node network. This is due to the low edge density of 1.2%; small networks with low edge density exhibit fewer alternative pathways than larger random networks with the same edge density. Therefore, the path length decreases when more edges are added, as for   Figure 1C). C and C rand : clustering coeffi cients of the hierarchical and random networks. L and L rand : Characteristic path lengths of the hierarchical and random networks. SW: small-world index (C/C rand )/(L/L rand ). the larger networks. This behavior resembles the behavior of random networks where the characteristic path length L ∼ ln N/ln 〈k〉, where N is the number of nodes and 〈k〉 is the average node degree (Albert and Barabási, 2002;Costa et al., 2007). All networks, however, show features of small-world networks (Watts and Strogatz, 1998). The clustering coeffi cient for random networks, C rand , was the same for all network sizes. For random networks, the clustering coeffi cient is the same as the edge density; that means, the probability that neighbors of a node are connected is the same as the probability that any two nodes are connected. As the edge density is kept constant for all network sizes, C rand remains constant at that value as well. The extent of a small-world organization can be characterized by the smallworld coeffi cient SW = (C/C rand )/(L/L rand ) (Humphries et al., 2006;Humphries and Gurney, 2008). The index SW is around 2 indicating a small-world organization of these networks. Whereas SW is 2.2 for a small network size of 300 nodes, it remains at a lower level of 1.8 for larger networks. For constant average node degree 〈k〉 (Table 2), the average path length increases with network size. Smaller networks with 300 and 512 nodes show a considerably lower path length compared to constant edge density. Again, all networks displayed features of small-world networks (Watts and Strogatz, 1998). The small-world index SW, however, was lower for small network sizes of 300 and 512 nodes compared with the scenario of constant edge density.

Variation of sustained activity and topological measures
As a fi rst test, we explored the link between hierarchical organization and the parameter range of LSA for different confi gurations of a network with 512 nodes and, on average, 50 connections per node ( Figure 4A). The parameter range for LSA tended to increase with the number of sub-modules at each hierarchical level (along rows in Figure 4A). The maximum range of LSA occurred for one hierarchical level and the largest possible number of sub-modules per module.
In this and all following plots (Figures 5 and 6), regions with horizontal lines indicate hierarchical confi gurations that cannot be realized, as some modules would need to contain more edges than can be fi tted between members of that module. These cases were detected whenever N c (N c − 1) < E c (N c : number of nodes; E c : number of edges in the smallest module); that means the number E c of edges that needed to be established was higher than the number of possible edges in a module, N c (N c − 1). Note also that the gray levels indicating proportion of cases were normalized so that white regions represent the minimum and black regions the maximum value for each plot.
Due to the network generation algorithm, modules at the lowest level of the hierarchy had the largest edge density (cf. Figure 1). We used this effect to test if LSA patterns were facilitated by more densely connected bottom modules in the network ( Figure 3B). Interestingly, there existed no clear relation of the density with sustained activation: whereas both maximum edge density and sustained activity probability increased with the number of submodules for a network with two hierarchical levels (Figures 4A,B), the relation was less clear for larger numbers of hierarchical levels.
How are small-world properties linked to the different hierarchical confi gurations? The characteristic path length ( Figure 4C) appeared to show lower values when two or more hierarchical levels existed in the network, but the values were in a narrow range of 1.91-1.92 for one hierarchical level. The clustering coeffi cient ( Figure 4D) increased with the number of levels and the number of sub-modules per module. The characteristic path lengths of the hierarchical networks were comparable to those of Erdös-Rényi random networks (Figure 4E) whereas the clustering coeffi cient was higher than in random networks ( Figure 4F). As the normalized path length is around 1, the SW index has a similar value as the normalized clustering coeffi cient. Given large SW indices, the networks possessed features of small-world networks (Watts and Strogatz, 1998). higher hierarchical levels ("decreasing parcellation") or (b) creating more sub-modules for higher hierarchical levels ("increasing parcellation"). Confi gurations with a high proportion of LSA cases for decreased as well as increased numbers of sub-modules for each hierarchical level remained comparable with the original calculation. However, for the "increasing parcellation" type, the overall proportion of LSA cases increased, extending the maximum probability of sustained activity from 0.23 to 0.42 (cf. Figure 8).

Number of cases close to 50% activation threshold for classifi cation as sustained.
In additional simulations, we tested how close the fi nal activity was to the threshold used for classifi cation as a case of LSA. Indeed, fi nal activity levels close to the 50% threshold could occur. However, fi nal activity levels were around 10-20% for most confi gurations producing a high number of LSA cases. This indicates that confi gurations leading to a high proportion of LSA cases were not substantially affected by the threshold (cf. Figure 9).
Topologies leading to expiring, limited sustained, and completely spreading activity. As a default, we investigated the distribution of LSA cases depending on the hierarchical network organization. In additional simulations, we also explored the distribution of the other two possible simulation outcomes: activity dying out before

Control calculations
We tested several parameters that were used for generating hierarchical networks. Networks consisted of 512 nodes and, on average, 50 connections per node. The Appendix contains a full description of these control calculations including additional fi gures.

Varying the number of edges for different hierarchical levels.
By default, the number of edges for each hierarchical level was set to be equal, that means, E i was the same for each hierarchical level i. Here, we tested sustained activity patterns for varying numbers of edges per level. We considered two cases: (a) a decrease of the number of edges with each hierarchical level or (b) an increase of the number of edges with each hierarchical level. The absolute level of sustained activity was lower in case (a) and higher in case (b) compared to the original setting (cf. Figure 7).

Varying the parcellation for different hierarchical levels.
By default, we used the same parcellation of modules at each level; that means if a module consisted of two sub-modules for the highest level, this condition would be the same for all other levels of the network hierarchy as well. Here, we also tested varying the parcellation into sub-modules depending on the hierarchical level. Again, we tested two cases: (a) creating fewer sub-modules for  the end of the simulation and fi nal activation of more than 50% of the nodes, which was classifi ed as complete spreading. For a random network (zero hierarchical levels), both dying-out and complete spreading occurred in 50% of the cases. However, when more than one hierarchical level was present, complete spreading activity occurred more often than dying-out. For one hierarchical level, outcomes depended strongly on the number of sub-modules per module. The cases with expiring activity formed 50% of the cases for two sub-modules, but decreased with the number of submodules (cf. Figure 10).
Varying the edge density. The pattern of sustained activity remained comparable to the default settings even when the edge density differed from the original value of 10% for a network with 512 nodes and an average node degree of 50. The maximum proportion of cases with LSA varied between 0.172 for decreased edge density (5%) to 0.238 for increased edge density (20%). The relative distribution of case for networks with one hierarchical level was similar across edge densities (cf. Figure 11).

SCALING OF OPTIMAL HIERARCHICAL CONFIGURATIONS FOR LSA WITH NETWORK SIZE
Two different scaling scenarios were explored. In the fi rst one, the global edge density of the networks was kept constant (at 1.2%) while the average number of connections per node varied; in the second scenario, the average node degree was kept constant (at 50 connections per node) while the networks' edge density varied.

Constant edge density
In the fi rst approach, the probability that any two nodes (representing cortical columns) in the network were connected was, on average, 1.2%. Connection density was larger within modules and lower between modules; however the global average remained constant, independent of the hierarchical confi guration.
Given a constant setting for testing neural activation across network sizes, the "rat-size" network showed sustained activity for a wide variety of hierarchical confi gurations ( Figure 5A). Surprisingly, for the larger "cat-size" network, a smaller variety of hierarchies existed that could generate LSA ( Figure 5B). However,  for these confi gurations, sustained activity occurred in up to 25% of the explored parameter settings, whereas it occurred only in up to 3% of the tested settings for the rat-like network (cf. scale of activation range). For the even larger "macaque-size" network, the maximum range of LSA (up to 4% of tested parameter settings) was as low as for the "rat-size" network ( Figure 5C), while overall, the variety of hierarchical confi gurations that resulted in LSA was also lowest for the macaque-like network.
These results indicate that the number of possible hierarchical confi gurations (resulting from combinations of the number of levels and number of sub-modules) leading to LSA decreased with increasing network size. Only a few hierarchical confi gurations appeared suitable for producing LSA in all network sizes. Such confi gurations typically combined an intermediate number of levels with a large number of sub-modules (Figures 4C and 5B). Interestingly, the combination of a large number of hierarchical levels with a small number of sub-modules proved ineffective for supporting LSA in the larger-size network ( Figure 5C). For large networks the best strategy for achieving sustained activity was provided by an arrangement of two hierarchical levels containing the largest possible number of modules and sub-modules.

Constant average node degree
The results obtained under the constraint of a constant edge density (see section Constant edge density) suggested that confi gurations for LSA were harder to attain in large as well as small networks. Only networks of an intermediate size appeared to result in a large variety of hierarchical networks possessing a wide parameter range for LSA. In an alternative approach, we also tested optimal confi gurations of networks of different sizes under the constraint that the average number of connections per column, rather than the probability that any two columns are connected (edge density), was kept constant. For this approach, the number of edges was set to 50 times the number of nodes, leading to an average node degree of 50 in all networks, albeit with variation for individual nodes.
Under these conditions, LSA in the "rat-size" networks arose mostly in networks with one hierarchical level, and for an increasing number of sub-modules ( Figure 6A). Both the "cat-size" and the "macaque-size" networks possessed a similar, large range of hierarchical confi gurations showing sustained activity (Figures 6B,C). All networks demonstrated that cases of LSA increased with the number of sub-modules per module. Whereas the range of hierarchical confi gurations differed between the "rat-size" and the "cat-size" or "macaque-size" networks, the maximum range of sustained activity was comparable for all sizes with 15-30%. A constant number of connections per node, therefore, permitted a wide range of optimal hierarchical confi gurations for LSA even if the network size increased.

DISCUSSION
This study investigated an essential precondition of criticality in neural systems, the capability of neural networks to produce LSA patterns following an initial activation. We addressed this question by simulating the spreading of neural activity and systematically varying model parameters and network topology in hierarchical modular networks, which are inspired by the organization of biological neural networks across scales. Our previous study (Kaiser et al., 2007a) demonstrated that hierarchical cluster networks possess a large parameter range leading to LSA, in contrast to random and non-hierarchical small-world networks. Here we expanded this analysis by varying the number of levels and sub-modules in hierarchical networks and scaling their size within two alternative scenarios, constant edge density or constant average node degree. This study demonstrated, fi rst, that LSA patterns are supported by a variety of parameter settings for hierarchical modular networks, combining different numbers of hierarchical levels with varying numbers of sub-modules per level; second, that for the same network size and the same number of sub-modules, networks with a larger number of levels resulted in a wider range of LSA, while for the same number of hierarchical levels a larger activity parameter range was produced by increasing the number of sub-modules; and third, that a high level of sustained activity was attainable across network sizes for a constant average node degree, but not for constant edge density.
The present results provide a proof of concept for three points. First, hierarchical network confi gurations lead to different levels of sustained activity independent of global topological properties, such as characteristic path length or clustering coeffi cient. Therefore, the identifi cation of an optimal network confi guration associated with a maximum level of LSA is a suitable target for evolutionary graph optimization. Second, only specifi c hierarchical network arrangements can be realized for a limited network size. Even for the human brain with an estimated number of 125,000 columns per hemisphere (Jones and Peters, 1984) under the assumptions made in section "Anatomical constraints", only a small fraction of potential hierarchies can be realized within the current framework. For the "human-size" network, three hierarchical levels with up to 18 sub-modules and four levels with up to 8 sub-modules are possible. These limits are beyond the ones of the "macaque-size" network which maximally allowed six and four sub-modules for three and four hierarchical levels, respectively. Therefore, simple combinatorics suggest that it is easier to vary the number of modules at each level than to increase the number of levels for larger brains. Third, the number of confi gurations which lead to sustained activity decreases with network size if the edge density remains constant, but remains large even for large network sizes if the average number of connections per node is kept constant. This model fi nding corresponds to the observation from comparative studies that the number of connections of a neural node (e.g., the number of synapses of a neuron) rather than the ratio of connections (e.g., being connected to 10% of all neighbors) largely remains constant across species with different brain sizes (Ringo, 1991;Schüz and Demianenko, 1995;Zhang and Sejnowski, 2000;Striedter, 2004). Moreover, for constant average node degree, the optimal confi gurations in larger networks tended to possess more hierarchical levels, suggesting a benefi cial contribution of the more intricately structured topology in larger neural networks to dynamical stability. There are indications from compilations of biological neural connectivity (e.g., www.cocomac.org) that support this model fi nding. For example, cluster analyses suggest that primate cortical connectivity is structured on more levels than connectivity in smaller brains, if one considers that there exist primate visual "streams" (Young, 1992;Hilgetag et al., 2000), that is, subdivisions of the visual network module that are apparently absent 2008), potentially leading to additional constraints on hierarchical network organization. Moreover, the internal organization of columns (self-loops) was not explicitly part of the modeled networks. However, it was represented through the node activation rule: an active node could remain active for the following time step given a suffi cient number of active neighbors and potential collaterals going back to the neuron itself. For each time step, the deactivation probability determined whether an active node became inactive. The strength of self-loops was therefore implicitly represented in this deactivation probability with lower likelihood of deactivation for more frequent self-loops. Second, specifi cally organized populations of inhibitory neurons within columns might additionally infl uence global network dynamics. Thus, future models could incorporate more detailed biologically realistic mechanisms for reducing activity at the neuronal level, instead of the presently employed phenomenological deactivation probability. Third, the parcellation of modules into sub-modules for each level was treated as symmetric, that means, when a module is split into sub-modules, each module has the same size. It will be important to test asymmetric parcellation of a module into smaller and larger sub-modules in future studies. Finally, the model considered neural network behavior in the absence of external inputs, except for the initial activation. Therefore, the current fi ndings may particularly apply to situations where there is limited external input to the brain, such as during sleep or early development. The results also relate to the organization of neural dynamics associated with the "default mode" or "resting state" of the brain (e.g., Raichle and Snyder, 2007). The role of external inputs should be addressed in future studies, which could also investigate if there is a difference in the processing of external stimuli by networks that are optimal for LSA and those that are not. In addition, the edge density at the lowest level ( Figure 4B) could in some cases be higher than 50% which is unlikely in biological neuronal networks.
In this study, we varied the network size to represent networks of columns of a hemisphere in a "rat-size" (300 nodes), "cat-size" (4,150 nodes), and "macaque-size" brain (11,000 nodes). For large networks and constant edge density, two hierarchical levels with the maximum possible number of sub-modules per module appeared to provide the best strategy for achieving sustained activity (cf. Figure 5). These multi-level confi gurations often resulted in a high density of connectivity within modules at the lowest level. Such high edge densities are theoretically possible, but only realized to some extent in biological neural systems. At the global level of human fi ber tract connectivity between brain regions, for example, edge densities around 46% can be reached . Within columns, the connection frequency between any two neurons is around 16% (Douglas and Martin, 2007) but around 35% for neurons from the same cell lineage (Yu et al., 2009). Given constant edge density, the range of feasible hierarchical confi gurations -that is, the degrees of freedom for evolving neural network architectures -appeared to decrease with larger network sizes. This was due to the fact that the number of neighbors of a node increased with network size. Since the probability for connecting a node to other nodes (the edge density) remained constant, nodes were connected to a larger number of nodes when the number of potential neighbors increased with network size. The larger number of in the rat  or cat network (Scannell et al., 1999;Zamora-López et al., 2010) Moreover, there are generally more modules in larger brains, if cortical areas can be considered as modules.
The fi nding of increased hierarchical structure in larger networks may appear counterintuitive given that there are limits on the number of hierarchical levels even in large networks, as discussed above. However, an appropriately large number of levels may be a necessary constraint for sustaining activity. If the number of modules in a large network was increased without increasing the number of levels, then, in principle, it would be easy to activate each module. However, activation of the global network may be prevented by dispersion of the activity across the entire network, which means that there may not exist enough projections into each of the individual modules to activate them. Similarly, if there are few large modules, activity may be dispersed within the modules. In order to establish a balance between the number and size of modules in large networks, additional levels need to be created, as confi rmed by the modeling results.
The hierarchical network topology we explored refl ects the distributed multi-level modularity that is considered a central feature of biological neural networks. Neural networks show strong modularity across many levels of scale, ranging from cellular neuronal circuits and neural populations organized in cortical columns (Mountcastle, 1997) to communities of closely linked areas at the systems level (Hilgetag et al., 2000;Breakspear and Stam, 2005). Smaller modules are nested within larger ones, such as columns within an area, which itself is a module in a large-scale brain division, such as the visual system. Another important feature of complex networks that has been discussed widely is the existence of hub, that is, nodes possessing a signifi cantly larger number of links than the majority of nodes in a network (Albert and Barabási, 2002;Ravasz et al., 2002). However, it is diffi cult to identify nodes in the brain that integrate modules across scales (with the potential exception of unspecifi c neuromodulatory systems, such as the serotonergic system) and act as global hubs. While there are hub-like nodes in neural networks (Kaiser et al., 2007b;Sporns et al., 2007), they may not act globally, such that most projections in the network originate from, or converge on, a central node. This topology is different from "centralistic" networks where most nodes are linked to hubs (Ravasz et al., 2002) and which may be more suitable for representing large-scale biochemical networks. However, the detailed investigation of biological neural topologies needs to be continued, since modeling studies have shown a strong impact of topology on network dynamics. For instance, networks which contain hubs may support different modes of activity propagation than hub-less modular networks (Müller-Linow et al., 2008;Hütt and Lesne, 2009).
The present study was set up under several simplifying assumptions, in order to provide general insights into the relationship between hierarchical neural topology and activation patterns. This approach resulted in a number of model limitations. First, nodes representing columns were assumed to be uniform building blocks, whereas actual column organization (layer structure and number of neurons) in the brain might differ across regions (Hutsler et al., 2005) as well as species (Herculano-Houzel et al., connected neighbors in larger networks also meant that the (absolute) threshold for activating a node was more easily reached. In such cases, activity was harder to contain and more likely spread through the whole network.
Using the constraint of a constant average degree, on the other hand, enabled a wider range of hierarchical confi gurations with up to 30% sustained activity cases across network sizes (cf. Figure 6). Such scalability with network size might be benefi cial both for ontogenetic and phylogenetic development. Using a constant number of connections per node, rather than a constant edge ratio, across species appears to have several benefi ts (Changizi, 2001). First, reducing edge density is necessary due to the limited volume available for white matter fi ber tracts (Ebbesson, 1980;Karbowski, 2001;Striedter and Northcutt, 2006). For constant edge density, a brain with two times as many columns would contain four times as many connections, quickly increasing brain volume. Second, the setting of a constant number of connections per node provided a setup for sustained activity in different brain network sizes. This might mean that sustained activity can occur for different brain sizes during evolution, if they are appropriately hierarchically structured. Third, hierarchical structuring may also provide the functional stability of LSA in the developing brain. At early stages of ontogenetic development, neural networks generally have few modules and few nodes. During development, more modules, nodes, and hierarchical levels are established (Robinson et al., 2009). Therefore, sustained activity can occur continuously through different stages of development and brain network growth. However, the hierarchical organization is not the only mechanism that can sustain activity during development; early neuronal mechanisms include, for example, spontaneous activity such as retinal waves (Sernagor et al., 2006;Hennig et al., 2009) or the early excitatory function of gamma aminobutyric acid (GABA).
Finally, whereas several earlier studies have explored spatial (e.g., brain volume) or topological (e.g., characteristic path length, Kaiser and Hilgetag, 2006) constraints on brain organization, the present study focused on dynamic constraints, specifi cally the necessity of brain dynamics to subsist at a sustained yet limited level of activity. Alternative or additional dynamic constraints that may be relevant for this phenomenon could be synchronous activity (König et al., 1995;von der Malsburg, 1995;Masuda and Aihara, 2004), or functional attributes such as multi-modal integration, functional complexity (Sporns et al., 2000), information propagation, or processing speed. An accessible parameter range for sustained limited activity is a necessary condition for criticality, but does not in itself guarantee it. Criticality has been interpreted as an abolishing of length scales, that is, the coexistence of dynamical processes at all scales. We saw examples for this phenomenon in activation patterns at LSA where modules and sub-modules of different sizes were activated together. Non-LSA conditions, by contrast, produced only the trivial states of activating all or none of the network nodes. It will be particularly interesting to see how networks optimized with respect to functional diversity are related to networks having optimal range for LSA. A possible link between these two properties was suggested by an earlier analysis showing that the number of signifi cantly repeating activation patterns is maximized at the critical point (Haldeman and Beggs, 2005).

APPENDIX CONTROL EXPERIMENTS
We tested several additional simulation parameters in the following sections. As in section "Optimal hierarchical confi gurations for LSA in a small network", networks consisted of 512 nodes and 50 connections per node.

Varying the number of edges for different hierarchical levels
In the default settings, the number of edges E i was set to be the same for each hierarchical level i. Additionally, we tested sustained activity patterns where the number varied. We considered two cases: (a) a decrease of the number of edges with each hierarchical level or (b) an increase of the number of edges with each hierarchical level. The change followed a function where the number of E i at level i was given by E i = s i E c /C, where s is a scaling factor of 2/3 for decreased and 3/2 for increased number of edges per level. The parameter E c = E/L is the number of edges in a network with E edges and L levels, which was used for the original calculation and C = L (s L+1 − s)/(s − 1) is a normalization factor to ensure that the total number of edges remains E.
As shown in Figure 7, confi gurations with a high proportion of LSA for decreased as well as increased numbers of edges per hierarchical level remained comparable with the original calculation. The absolute level of sustained activity was lower in case (a) and higher in case (b) compared to the original settings. In addition, sustained activity cases also occurred for two or more hierarchical levels when the number of edges was increased ( Figure 7C).

Varying the parcellation for different hierarchical levels
In the default settings, we used the same split-up of modules for each level; that means, if a module consisted of two sub-modules at the highest network level, this condition would be the same for all other levels of the network hierarchy as well. Here, we tested varying the parcellation into sub-modules depending on the hierarchical level. Again, we tested two cases: (a) creating fewer sub-modules for higher hierarchical levels ("decreasing parcellation") or (b) creating more sub-modules for higher hierarchical levels ("increasing parcellation"). The change followed a function where the number of parcellations, sub-modules per module, m i at level i was given by m i = s i m where s is a scaling factor of 0.9 for decreased and 1.1 for increased parcellation. The parameter m was the same as for the original calculation.
As can be seen from Figure 8, confi gurations with a high probability of sustained activity for decreased as well as increased number of sub-modules for each hierarchical level remained comparable with the original calculations. However, whereas the absolute proportions for decreased parcellation were similar, for increased parcellation the overall probabilities increased, extending the maximum proportion of sustained activity cases from 0.23 to 0.42. Therefore, the absolute parameter range of sustained activity was the same for case (a) and almost twice as high for case (b), compared to the original setting.
Note that both an increased parcellation and a larger number of edges led to a higher edge density of modules at the highest (fi ne-grained) level (maximum edge density, Figure 3B). This could explain why the ratio of sustained activity cases was higher for these confi gurations.

Number of cases close to 50% activation threshold for classifi cation as sustained
In these simulations, we tested how close the fi nal activity after 200 time steps came to the threshold used for classifi cation as sustained activity case.
As seen in Figure 9, levels close to the 50% threshold did occur. However, fi nal activity levels were around 10-20% in all cases for which a high number of sustained activity cases was reported. This observation indicates that confi gurations with high proportions of sustained activity cases were not affected by the threshold. On the other hand, confi gurations with a high proportion of cases with complete spreading only had few cases of sustained activity. Due to higher activity levels for such confi gurations, sustained activity cases were close to the 50% threshold for classifying sustained activity (black regions in Figure 9).

Topologies leading to expiring, limited sustained and completely spreading activity patterns
The main simulations of this project investigated the proportion of LSA cases depending on hierarchical network organization. Here, we also considered the distribution of the other two possible simulation outcomes: activity dying out before the end of the simulation and fi nal activity in more than 50% of the nodes, which was classifi ed as complete spreading. Figure 10 shows the dependence of all three outcomes on hierarchical network organization.
For confi gurations resulting in a small number of LSA cases (white regions in Figure 10B), both dying-out and complete spreading occurred in 50% of the cases (note the different gray level setting due to re-scaling). However, when more than one hierarchical level was present, complete spreading occurred more often than dyingout. For one hierarchical level, outcomes depended strongly on the number of sub-modules per module. The cases in which activity expired formed 50% of cases for two sub-modules, but decreased with the number of sub-modules. The maximum proportion of complete spreading, around 70% of the cases, occurred for 4-10 sub-modules per module. The maximum values for LSA occurred for 12-20 sub-modules per module.

Varying the edge density
In the main simulations, the edge density for a network with N = 512 nodes and an average node degree 〈k〉 of 50 was 0.098, that means, around 10%. Both parameters, edge density and average node degree are related: the edge density in a directed network is given by d = E/[N (N − 1)] whereas the average node degree is 〈k〉 = E/N, meaning that d = 〈k〉/(N − 1). How does variation of edge density for the same network size infl uence the range of LSA? To answer this question, we compared edge densities which were half (5%) of or twice (20%) of the value of the original calculations.
As shown in Figure 11, the pattern of sustained activity remained comparable to the original calculations even when the edge density varied. The maximum level of LSA varied between 0.172 (for decreased edge density) to 0.238 (for increased edge density; original maximum level: 0.238). The relative distribution for one hierarchical level was similar across different edge densities. For two and three hierarchical levels, two additional confi gurations with high levels of sustained activity occurred for decreased edge density. These additional confi gurations were impossible for higher numbers of edges to be realized when edge densities were around 10% or above.