Clustering Predicts Memory Performance in Networks of Spiking and Non-Spiking Neurons

The problem we address in this paper is that of finding effective and parsimonious patterns of connectivity in sparse associative memories. This problem must be addressed in real neuronal systems, so that results in artificial systems could throw light on real systems. We show that there are efficient patterns of connectivity and that these patterns are effective in models with either spiking or non-spiking neurons. This suggests that there may be some underlying general principles governing good connectivity in such networks. We also show that the clustering of the network, measured by Clustering Coefficient, has a strong negative linear correlation to the performance of associative memory. This result is important since a purely static measure of network connectivity appears to determine an important dynamic property of the network.

the mammalian cortex. Thirdly, the connections are directed, so if the connection from i to j exists, this does not imply that the reciprocal connection from j to i also exists.
With this configuration, two extreme cases are widely known and commonly studied. The first case is a completely local network, or lattice, whose nodes are connected to those nodes that are closest to it. An example of a local network is the cellular neural network (CNN), where units are connected locally in 2-D (Brucoli et al., 1996). Alternatively the network can be connected in a completely random manner, where the probability of any two nodes being connected is k/N, independently of their position. Locally connected networks have a minimum wiring length, but perform poorly as associative memories, since pattern correction is a global computation and full local connectivity does not allow easy passage of information across the whole network. In randomly connected networks, on the other hand, information can readily move through the network, and consequently pattern correction is much better and in fact cannot be improved with any other architecture. However, in real biological networks such as cerebral cortex, completely random connectivity suffers from the restriction of high wiring cost and therefore may not be the desirable choice. An optimized connectivity is one that gives a performance comparable to random networks, but with more economical wiring. The existence of such connectivity, as well as its construction strategy, has been of great interest in recent research. One family of network connectivities, the so-called small-world networks (Watts and Strogatz, 1998), has been suggested to exhibit this optimization.
Our first connection strategy is adapted from the method proposed by Watts and Strogatz (1998). The Watts-Strogatz smallworld network starts from a local network, where each unit has incoming connections from its k nearest neighbors (to simplify results, networks studied in this paper have no self-connectivity). The local network is then progressively rewired by randomly reassigning a fraction (p) of the afferent neurons. The network is transformed into a completely random network when p reaches 1 (Figure 1). In the rest of the paper we refer to networks constructed in this way as small-world networks.
Another connection strategy investigated by us exhibits Gaussian connectivity, where each unit has k incoming connections from neurons whose distances form a Gaussian distribution (see Figure 7). The connectivity of such a network is parameterized by the SD of its distance distribution, σ. We have shown in earlier work (Calcraft et al., 2007) that both small-world and Gaussian networks can perform optimally as associative memories and that tight Gaussian distributions give very parsimonious networks with efficient use of connecting fiber.
The biological context of this connectivity comes from the mammalian cortex, which is thought to have a similar connectivity between individual neurons (Hellwig, 2000).
The modularity of mammalian cortex, commonly referred to as the hypothesis of cortical columns (Mountcastle, 1997), is also considered in our investigation. Three different types of modular connectivity are studied. The first one, named the Fully Connected Modular network, contains m internally fully connected subnetworks, defined as modules. Initially there is no interconnection between modules, thus each module can be treated as an isolated fully connected associative memory model. The subnetworks establish connections by randomly rewiring the intra-modular connections to form random connections from anywhere across the whole network. A fraction p denotes the proportion of rewired connections. The network has the same type of connectivity as the Watts-Strogatz small-world network, when p reaches 1. Figure 2 gives an example of networks with this connectivity.
Another modular network we have investigated is called the Gaussian-Uniform Modular Network. In such a network, the intra-modular connections within each module have a Gaussiandistributed connectivity with a SD of σ internal , which is proportional to the number of internal connections per unit, k internal . Each unit in the network also has a number of external connections (defined by k external ) from units in other modules. The connectivity of the external network is uniformly random. Although k internal and k external vary in different configurations, the total number of connections per unit, k = k internal + k internal , is maintained so that the performances of different networks can be compared. In this example, the network contains 30 units and four afferent connections to each unit. Initially all units are locally connected, as p = 0. Then a proportion of connections of each unit are randomly rewired (p = 0.5). As the rewiring rate increases, the network becomes a completely random network (p = 1). Note that the connections are formed on a one dimensional line but are drawn in 2-D figures for better visualization. Chen et al. Connectivity in associative memories Frontiers in Computational Neuroscience www.frontiersin.org learning threshold T increases, irrespective of the value of k. However, increasing T also means a longer training time, and the improvement is insignificant for very high values of T. Here we set T = 10, based on results from our previous study (Davey and Adams, 2004).
For threshold unit models we use the standard bipolar +1/−1 representation. For the spiking networks we use 1/0 binary patterns, as they can be easily mapped onto the presence or absence of spikes.
Threshold unit models and integrate-and-fire spiking neuron models require different update dynamics. For threshold unit models, we use the standard asynchronous dynamics: units output +1 if their net input is positive and −1 if negative. As the connectivity is not symmetrical there is no guarantee that the network will converge to a fixed point, but, in practice these networks normally exhibit straightforward dynamics (Davey and Adams, 2004) and converge within hundreds of epochs (one epoch is a full update of every unit in the network in a fixed order). We arbitrarily set a hard limit of 5000 epochs, at which we take the network state as final even if it may not be at a fixed point. Networks with spiking neurons, on the other hand, were numerically integrated, and their state was evaluated after 500 ms (see below).
Due to the intrinsic model complexity, associative memory models with spiking units require careful tuning so that the performance is comparable to threshold unit models. We use a leaky integrate-and-fire neuron model which includes synaptic integration, conduction delays, and external current charges. The membrane potential V of each neuron in the network is set to a resting membrane potential of 0 mV if no stimulation is presented. The neuron can be stimulated and change its potential by either receiving spikes from other connected neurons, or by receiving externally applied current. If the membrane potential of a neuron reaches a fixed firing threshold, T firing , which is set to 20 mV, the neuron emits a spike and the potential is reset to the resting state (0 mV) for a certain period (the refractory period, set to 3 ms). During this period the neuron cannot fire another spike even if it receives very strong stimulation.
A spike that arrives at a synapse triggers a current; the density of this current (in A/F), I ij (t), is given by where i refers to the postsynaptic neuron and j to the presynaptic neuron. τ s = 2 ms is the synaptic time constant, t spike is the time when the spike is emitted by neuron j, and t spike + delay ij defines the time when the spike arrives at neuron j. Two delay modes were implemented in our study. The fixed-delay mode gives each connection a fixed 1 ms delay. In the second mode, the delay of spikes in a connection is defined by delay ms where d ij is the connection distance which is defined as the minimum number of steps between neuron i and j along the ring, since the distance between neighboring neurons is 1, as described in Section "The Network Model for Associative Memory." The formula is a rough mapping from a one dimensional structure to a realistic 3-D system.

The neuron model, learning, and dynamics
The best connection strategy for an associative memory network may be dependent on details of the neuron model. For instance, the wiring length of a connection may have no significant impact on simple threshold neuron models, but is important in a spiking model simulating geometry-dependent time delays. To reveal the intrinsic principles that may govern network construction, this paper investigates connection strategies in various associative memory networks with either the traditional threshold unit models or the more biologically realistic, leaky integrate-and-fire spiking neuron models.
Each model needs to be trained before any measure of performance can be obtained. Canonical associative memory models with threshold units, for example the Hopfield net, commonly use a one-shot Hebbian learning rule. This, however, does not perform well if the networks are sparse and non-symmetric . In this paper, we adapt a more useful approach using standard perceptron learning.
A set of random, bipolar, or binary vectors is presented as training patterns, where the probability of any bit on the pattern being on (+1), referred to as the bias of the pattern, is 0.5. The patterns can be learned by the following learning rule: Begin with zero weights Repeat until all units are correct Set state of network to one of the training pattern ξ p For each unit, i, in turn: Calculate its net input, h i p .
If  Each network has 30 units and four afferent connections to each unit. The network is initialized as six discrete modules with fully connected internal networks (left, with rewiring rate p = 0). To connect these modules, internal connections are then rewired randomly across the whole network (right, with rewiring rate p = 0.5). Note that the regularity of the network is maintained during the rewiring (each node always has four afferent connections). The EC of the network is therefore the highest pattern loading for which a 60% corrupted pattern has, after convergence, a mean similarity of 95%, or greater with its original value.
The EC measure needs modification to suit the spiking model. We adopt the concept of memory retrieval from Anishchenko and Treves (2006). For p training patterns, the memory retrieval, M, is defined by is the overlap of the network activity and pattern μ at time t, more particularly the cosine of the angle between both vectors, The memory retrieval measure was designed for sparse patterns where the chance overlap is important. In our case the patterns are unbiased so the chance overlap is close to 0. The memory retrieval M ranges between 1 and −1, in which a high value indicates better performance. The 95% pattern similarity in the threshold unit model corresponds to a 0.9 memory retrieval value M in the spiking neuron model, thus we use these two values as the EC criteria in this investigation.

connectivity Measures
In recent years connectivity measures from Graph Theory have been used in the investigation of both biological and artificial neural networks. We use two of these measures to quantify our connectivity strategies.
A common measure of network connectivity is Mean Path Length. The Mean Path Length of a network G, L(G), is defined as: The change of membrane potential is defined by In this equation, −(V/τ m ) is the leak current density, while τ m = 50 ms is the membrane time constant. I external is the external current density which will be discussed later. The internal current is summed by indicates the presence of a connection from j to i, and C ij = 0 otherwise. W ij is the weight of the connection from neuron j to i trained by the above learning rule.
In networks trained with a learning threshold T = 10, a single spike from an excitatory (positive-weight) connection generates a postsynaptic potential (EPSP) of about 3 ∼ 4 mV on average, and inhibitory (negative-weight) connections generate similar voltage deflections of negative sign (IPSPs). Since the proportion of excitatory and inhibitory connections is balanced (the network is trained with unbiased patterns), it requires the excess of six to seven spikes at excitatory connections to trigger the firing of a unit. Scaling down the connection weights to reduce the EPSPs also reduces the IPSPs proportionally, resulting in little or no improvement on the network's associative memory performance. The whole network is silenced, however, when all weights are scaled down to less than 20% of their original trained values, due to the current leak in the neuron model.
External currents are injected into the network in order to trigger the first spikes in the simulation. Each current injection transforms a static binary pattern to a set of current densities. Given an input pattern, unit i receives an external current if it is on in that pattern, otherwise the unit receives no external current. An external current has a density of 3A/F and is continually applied to the unit for the first 50 ms of simulation. This mechanism guarantees that the first spiking pattern triggered in the network reflects the structure of the input pattern. After the first spikes (about 7-8 ms from the start of a simulation), both internal currents caused by spikes, and the external currents, affect the network dynamics. Spike activity continues after the removal of external currents, as the internal currents caused by spike trains become the driving force. The network is then allowed to run for 500 ms, after which time the network activity has stabilized, before its spiking pattern is evaluated.

Measures of MeMory perforMance
The associative memory performance of the threshold unit network is measured by its effective capacity (EC; Calcraft, 2005). EC is a measure of the maximum number of patterns that can be stored in the network with reasonable pattern correction still taking place. In other words, it is a capacity measure that takes into account the dynamic ability of the network to perform pattern correction. We take a fairly arbitrary definition of reasonable as the ability to correct the addition of 60% noise (redrawing 60% of the pattern bits) to within a similarity of 95% with the original fundamental memory, that is, 95% of the bits are identical. Varying these 2% figures gives differing values for EC but the values with these settings are robust for comparison purposes. For large fully connected networks the EC value is about 0.1 of the conventional capacity of the network, but for networks with sparse, structured connectivity EC is dependent upon the actual connectivity pattern. of such a network with threshold units is known to be 2k = 500 for unbiased random patterns, although in practice the capacity reduces due to pattern correlations. As mentioned in Section "The Neuron Model, Learning, and Dynamics," we set the learning threshold at T = 10 and the maximum number of updating epochs at 5000. For each network connectivity, results are averages over 10 runs. Errors bars are too small to be visible and therefore are omitted from the figures. The characteristics of different networks and their EC performances are tabulated in Section "Appendix." The first column of each table contains the values of the independent variable that was varied to construct different networks with the same connection strategy.

path length as a predictor of perforMance
As pattern correction is a global computation intuitively one might expect networks with shorter mean path lengths to perform better than those with longer mean path lengths. Our results, shown in Figure 3A to some extent confirm this. The two networks with only local connections (to the right of the graph) and thereby long path lengths, do perform poorly. However path length does not discriminate amongst the majority of network architectures. This is mainly due to the fact that mean path length saturates very quickly once a few long-range connections, which act as global shortcuts, are introduced to a local network. The Tables in Section "Appendix" show that almost all the networks examined here have very similar mean path lengths of around two steps, in spite of the great variation of the independent variables.

clustering coefficient as a predictor of perforMance
Once again intuition suggests that performance could be related to clustering. In a highly clustered network global computation could be difficult: with information staying within clustered subnetworks and not passing through the whole network. This is confirmed by our results in Figure 3B. The networks plotted on the left side of the graph, which are highly clustered, show poor performance. But perhaps what is most noticeable about these results is the obvious linear correlation between the clustering coefficient and performance. So the clustering of a network, regardless of the details of its architecture (small-world, modular, Gaussian) completely predicts its ability to perform as an effective associative memory. A purely static property of the connection matrix appears to determine the ability of a sparsely connected recurrent neural network to perform pattern completion. We will return to why this might be the case in the discussion.

the integrate-and-fire network Model
We now examine how well our results generalize to the more complicated integrate-and-fire model. Because the EC evaluation of this network takes much longer to compute than for the threshold unit model, we reduced the number of incoming connections per unit, k, to 100. For the same reason, only the Watts-Strogatz Small-World network, the Gaussian network, and the Fully Connected Modular network were simulated using integrate-and-fire models. It is interesting to study whether the non-spiking network provides a qualitative and/or quantitative prediction of the much more complex spiking model. Figure 4 compares the performance of three different types of dynamics: non-spiking, spiking with fixed-delay, where p ij is the length of the shortest connecting path from unit j to unit i. It is important to note here that the length of a path in the graph G is the number of edges that make up the path -it has no relation to the physical distance between the nodes, d ij . The Mean Path Length was originally used to define the "smallworld" phenomenon found in social science (Milgram, 1967). This refers to the idea that, if a person is one step away from each person they know and two steps away from each person who is known by one of the people they know, then everyone is an average of six steps away from each person in a region like North America. Hence everyone is fairly closely related to everyone else giving a "small-world." A fully connected network has the shortest and unique Mean Path Length of 1, whilst in sparse networks the measure varies for different types of connectivity (see Tables in Appendix). A locally connected network has a high Mean Path Length, since each unit is only connected to its nearest neighbors and it is difficult to reach distal units. On the other hand, completely random networks usually have short Mean Path Lengths. Intermediate cases, for example the Watts-Strogatz small-world network, have Mean Path Lengths similar to completely random networks, but significantly lower than the ones of lattices.
Although commonly used as a connectivity measure of neural networks, our investigation reveals that the Mean Path Length is insensitive to connectivity changes if the network is far from local. Another measure, named the Clustering Coefficient, is found to be more sensitive to connectivity changes, and exhibits a good correlation with the associative memory performance.
The definition of Clustering Coefficient is as follows. First we define neighbors of a node in a graph as its directly connected nodes.
Then we define G i as the subgraph of the neighbors of node i (excluding i itself). C i , the local Clustering Coefficient of node i, is defined as: The Clustering Coefficient of a fully connected network is 1, since each node is connected to all others directly. Locally connected sparse networks have high Clustering Coefficients whilst in a completely random networks it is usually low. Interestingly, the so-called small-world networks usually have high Clustering Coefficients, similar to local networks, but short Mean Path Lengths, like completely random networks. Such types of connectivity (short Mean Path Length, high Clustering Coefficient) are also observed in natural networks, for instance, the mammalian cortex.

results
All networks in our study have 5000 units. The first set of results are for the non-spiking model with 250 incoming connections per unit, that is, N = 5000 and k = 250. The theoretical maximum capacity patterns of connectivity is very similar, whilst in the cube-root delay models, the performance of the Gaussian connectivity varies more linearly than that of the other two connectivities. This is because the other two types of connectivity have more distal connections due to their uniform rewiring, consequently increasing the distance-dependent delay of the networks. Overall the Clustering Coefficient can still be used as a reasonable predictor of associative memory performance. and spiking with distance-dependent delays. The comparison is illustrated for Gaussian connectivity patterns. Similar comparisons can be made for other connection strategies from the data given in Section "Appendix." The connectivity of the networks furthest to the left has a very tight Gaussian distribution (σ = 0.4). Interestingly all three networks give a similar EC value of about 25. In all three networks performance then improves as some distal connections are introduced, until the non-spiking network attains its best performance of EC = 42. However the spiking networks show further improvements reaching EC values of over 50. But perhaps the most striking feature of this result is that the simple threshold model provides a good approximation of the much more complex spiking neural network. Figure 5 presents the results of EC against Clustering Coefficient for spiking models with fixed time delay (1 ms) and cube-root delay. The linear correlation between EC and Clustering Coefficient is not absolutely maintained in the spiking models, however the results still show that the EC increases as the Clustering Coefficient decreases. In the fixed-delay models, the performance of all three  prediction is independent of the connection rule used. We further demonstrated that when the cost of wiring is taken into account, by penalizing long-range connections, networks with local Gaussian-distributed connections perform best, and even outperform small-world networks. Finally, these findings also hold when memory is stored and retrieved in networks of spiking neurons with connection delays. These results were obtained with random uncorrelated patterns, hence without taking into account the statistics of a natural environment. The patterns were stationary (no sequences) and so was largely the dynamics of the network (no oscillations or temporal patterns). Our study was conducted on a ring of neurons, but the results can be extended to 2-D networks (Calcraft et al., 2008).
In the network of spiking neurons, the resulting feedback excitation and inhibition forced the neurons into persistent UP or DOWN states. The network lacked, however, dynamical behavior like gain control, synchronization, or rhythm generation, for which separate populations of excitatory and inhibitory neurons would be required (Sommer and Wennekers, 2001;de Almeida et al., 2007). In addition, the spiking network used the same synaptic weights as obtained with the perceptron-rule for bipolar neurons, and was only tested for memory retrieval. Learning in such networks would require a more advanced algorithm like spike-timing-dependent plasticity.

clustering as a negative predictor of MeMory perforMance
Clustering (the occurrence of connections between the targets of the same neuron) may make some connections functionally redundant or induce loops. Cycling through loops may hamper convergence to an attractor. In addition, Zhang and Chen (2008) formally demonstrated that loops also induce correlations between states, or equivalently, they induce noise in the form of an interpattern crosstalk term to a neuron's input, and hence affect pattern separation. Note that this does not directly imply that inducing clustering by adding local connections would worsen memory performance, since the added connections would increment k and hence the maximum capacity as well.

MeMory storage with local connectivity
An unexpected finding was that for a given wiring cost, Gaussianconnected networks performed better than small-world networks. Although very-long-range connections speed up the convergence to an attractor during memory retrieval (Calcraft et al., 2007), they do not enhance the steady-state performance of the network. Moreover, the width (SD) of the Gaussian connection kernel can be small (Figure 4), and does not scale with network size.

relation to brain connectivity
The connectivity of the brain, and more particularly of neocortex, is sparse, but certainly not random (see Laughlin and Sejnowski, 2003). Counting the close appositions between axonal and dendritic trees of reconstructed pyramidal cells assembled in a computer network, Hellwig (2000) estimated that connection probability decreased with inter-neuron distance as a Gaussian function of about 500 μm width (the order of magnitude of a cortical column; see Figure 7). Recent experiments using distant release of caged glutamate by photo-stimulation extend this range, indicating a still considerable excitation from distances up to 2 mm (Schnepel et al., 2010).

connection strategies for optiMizing wiring cost and perforMance
In nature, the construction of associative memory networks is restricted by resource, thus a connection strategy that optimizes both wiring cost and performance would be preferable. We define the wiring cost of two connected nodes in the network simply as the distance between them, and average it over all connected nodes in the network. Note that this is a quite different measure to mean path length, which measures steps along the connection graph. This measure is also different from the connection delay used in the spiking network models, where the cube-root of d was taken to avoid unrealistically long delays. For simplicity this section focuses on the performance of the threshold unit models. Figure 6 plots the EC against the Mean Wiring Cost for different connectivity patterns. The Gaussian-distributed network has the most efficient connectivity, that is, for a given mean wiring cost, it has the highest EC. The fully connected Modular network is the least efficient one, whilst the Watts-Strogatz small-world network, and Gaussian-uniform Modular network lay between them. Effective Gaussian networks can be constructed with very low wiring cost because the width of the distribution can be quite small. In fact a width of just 2k (here 500) gives good performance at low cost. We also found that the distribution does not need to increase in width as the network gets bigger (Calcraft et al., 2007).

discussion
Our study of memory performance in networks with varying connection strategies extends previous work on the effect of connectivity on network dynamics (Stewart, 2004;Percha et al., 2005), wiring economy and development (Chklovskii et al., 2004), and disease (Belmonte et al., 2004;Lynall et al., 2010;Stam, 2010).

Main findings and liMitations
The main finding of the present study is that for networks with a fixed number k of connections, and hence a theoretical maximum storage capacity of 2k random patterns, the degree of clustering can be regarded as a negative predictor of memory performance. This  of a genuine small-world connectivity, the observed clustering of connections may also have a function in the dynamics of the network, or be a consequence of the multidimensional representation of the visual field onto neocortex, mapping not only stimulus position but also higher-order stimulus attributes like orientation. Nevertheless, neocortex is assumed to have a small, but significant, small-worldness (Gerhard et al., 2010), as based on a metric using the ratio of clustering over relative path length (Humphries and Gurney, 2008).

conclusion
We studied the capacity of memory storage and retrieval in networks of bipolar and spiking neurons, and compared different connection strategies and connection metrics. The single metric best predicting memory performance, for all strategies, was the clustering coefficient, with performance being highest when clustering was low. In large networks, the best connection strategy was a local Gaussian probability function of distance, both in terms of avoiding clustering and of minimizing the cost of wiring.
Within a column, Song et al. (2005) found an overrepresentation of bi-directional connections and local three-neuron clusters in rat visual cortex. Whereas such clustering may be a sign  (2000) with permission of Springer-Verlag. The best-fitting Gaussian has a SD of 296.7 μm.