Learning and stabilization of winner-take-all dynamics through interacting excitatory and inhibitory plasticity

Winner-Take-All (WTA) networks are recurrently connected populations of excitatory and inhibitory neurons that represent promising candidate microcircuits for implementing cortical computation. WTAs can perform powerful computations, ranging from signal-restoration to state-dependent processing. However, such networks require fine-tuned connectivity parameters to keep the network dynamics within stable operating regimes. In this article, we show how such stability can emerge autonomously through an interaction of biologically plausible plasticity mechanisms that operate simultaneously on all excitatory and inhibitory synapses of the network. A weight-dependent plasticity rule is derived from the triplet spike-timing dependent plasticity model, and its stabilization properties in the mean-field case are analyzed using contraction theory. Our main result provides simple constraints on the plasticity rule parameters, rather than on the weights themselves, which guarantee stable WTA behavior. The plastic network we present is able to adapt to changing input conditions, and to dynamically adjust its gain, therefore exhibiting self-stabilization mechanisms that are crucial for maintaining stable operation in large networks of interconnected subunits. We show how distributed neural assemblies can adjust their parameters for stable WTA function autonomously while respecting anatomical constraints on neural wiring.


INTRODUCTION
Competition through shared inhibition is a powerful model of neural computation (Maass, 2000;Douglas and Martin, 2007). Competitive networks are typically composed of populations of excitatory neurons driving a common set of inhibitory neurons, which in turn provide global negative feedback to the excitatory neurons (Amari and Arbib, 1977;Douglas and Martin, 1991;Hertz et al., 1991;Coultrip et al., 1992;Douglas et al., 1995;Hahnloser et al., 2000;Maass, 2000;Rabinovich et al., 2000;Yuille and Geiger, 2003;Rutishauser et al., 2011). Winner-takeall (WTA) networks are one instance of this circuit motif, which has been studied extensively. Neurophysiological and anatomical studies have shown that WTA circuits model essential features of cortical networks (Douglas et al., 1989;Mountcastle, 1997;Binzegger et al., 2004;Douglas and Martin, 2004;Carandini and Heeger, 2012). An individual WTA circuit can implement a variety of non-linear operations such as signal restoration, amplification, filtering, or max-like winner selection, e.g., for decision making (Hahnloser et al., 1999;Maass, 2000;Yuille and Geiger, 2003;Douglas and Martin, 2007). The circuit plays an essential role in both early and recent models of unsupervised learning, such as receptive field development (von der Malsburg, 1973;Fukushima, 1980;Ben-Yishai et al., 1995), or map formation (Willshaw and Von Der Malsburg, 1976;Amari, 1980;Kohonen, 1982;Song and Abbott, 2001). Multiple WTA instances can be combined to implement more powerful computations that cannot be achieved with a single instance, such as state dependent processing (Rutishauser and Douglas, 2009;Neftci et al., 2013). This modularity has given rise to the idea of WTA circuits representing canonical microcircuits, which are repeated many times throughout cortex and are modified slightly and combined in different ways to implement different functions Martin, 1991, 2004;Rutishauser et al., 2011).
In most models of WTA circuits the network connectivity is given a priori. In turn, little is known about whether and how such connectivity could emerge without precise pre-specification. In this article we derive analytical constraints under which local synaptic plasticity on all connections of the network tunes the weights for WTA-type behavior. This is challenging as high-gain WTA operation on the one hand, and stable network dynamics on the other hand, impose diverging constraints on the connection strengths (Rutishauser et al., 2011), which should not be violated by the plasticity mechanism. Previous models like Jug et al. (2012) or Bauer (2013) have shown empirically that functional WTA-like behavior can arise from an interplay of plasticity on excitatory synapses and homeostatic mechanisms. Here, we provide a mathematical explanation for this phenomenon, using a mean-field based analysis, and derive conditions under which biologically plausible plasticity rules applied to all connections of a network of randomly connected inhibitory and excitatory units produce a functional WTA network with structured connectivity. Due to plastic inhibitory synapses, convergence of the model does not rely on constant, pre-defined inhibitory weights or other common assumptions for WTA models. We prove that the resulting WTA circuits obey stability conditions imposed by contraction analysis (Lohmiller and Slotine, 1998;Rutishauser et al., 2011). This has important implications for the stability of larger networks composed of multiple interconnected WTA circuits, and thus sheds light onto the mechanisms responsible for the emergence of both local functional cortical microcircuits and larger distributed coupled WTA networks.
This article is structured as follows: We first define the network and plasticity models in sections 2.1 to 2.3. Our main analytical results are given in sections 2.4 and 2.5, and illustrated with simulation results in section 2.6. The results are discussed in section 3, and detailed derivations of the analytical results can be found in section 4.

NETWORK TOPOLOGY
In its simplest abstract form, a WTA circuit ( Figure 1A) consists of a number of excitatory units that project onto a common inhibitory unit. This unit, in turn, provides recurrent inhibitory feedback to all excitatory units. Given appropriate connection strengths, such inhibition makes the excitatory units compete for activation in the sense that the unit receiving the strongest input signal will suppress the activation of all other units through the inhibitory feedback loop, and "win" the competition.
We design a biologically plausible network by taking into account that inhibitory feedback is local, i.e., it only affects cells within a cortical volume that is small enough such that the relatively short inhibitory axonal arbors can reach their targets. We assume excitatory and inhibitory neurons in this volume to be connected randomly (see Figure 1B). Furthermore, we assume that there are a finite number of different input signals, each activating a subset of the excitatory cells in the volume. We construct a mean-field model by grouping the excitatory neurons for each driving input stimulus, summarizing the activity of each group of cells by their average firing rate. This results in a simplified population model of the network which-in the case of two different input signals-consists of two excitatory populations (one for each input), and one inhibitory population (see Figure 1C). We assume full recurrent connectivity between all populations. This scheme can easily be extended toward more input groups. In particular, if an excitatory group receives multiple inputs, it can be modeled as a new class.
Since inhibitory axons are (typically) short-range, distant populations can communicate only via excitatory projections. We combine multiple local circuits of the form shown in Figures 1B,C by introducing excitatory long-range connections between them, as illustrated in Figure 1D. Specifically, we add projections from the excitatory populations of one local group to all excitatory and inhibitory populations of the other group. A similar connectivity scheme for implementing distributed WTA networks has been proposed by Rutishauser et al. (2012). Unlike their model, our network does not require specific wiring, but rather targets any potential cell in the other volume. We will FIGURE 1 | Illustration of the network model. (A) Abstract representation of a WTA circuit, where several excitatory units project onto a common inhibitory unit, and receive global inhibitory feedback from that unit. (B) Example volume of the generic cortical structure that is assumed, consisting of (initially randomly connected) excitatory (pyramidal) cells and inhibitory interneurons. The color of the cells indicates the input channel they are connected to: some cells only receive input from the blue, others from the orange source. It is assumed that the volume is sufficiently small, such that all excitatory cells can be reached by the (short-ranged) axons of the inhibitory cells. If the connection strengths are tuned appropriately, the population receiving the stronger input signal will suppress the response of the weaker population via the global inhibitory feedback. (C) shows the mean field model of the same network that we construct by grouping excitatory neurons by their input source. The three resulting excitatory and interneuron populations are connected in an all-to-all fashion. (D,E) show multiple, distant volumes which are connected via long-range excitatory connections. Projections from one volume to another connect to all cells of the target volume. In (E), the two subgroups are approximated by networks of the type shown in (C), consisting of one inhibitory and several excitatory populations. The black, solid arrows represent exemplary excitatory connections from one population of one group to all populations of the other group. Equivalent connections, indicated by dotted arrows, exist for all of the excitatory populations.
show in section 2.4.4 that this is sufficient to achieve competition between units of spatially distributed WTA circuits.

NETWORK DYNAMICS
The activation of a neural populations x i , which can be excitatory or inhibitory, is described by where τ i is the time constant of the population, w ij is the weight of the incoming connection from the jth population, I ext,i (t) is an external input given to the population, and T i is the activation threshold. Furthermore, [v] + := max (0, v) is a half-wave rectification function, preventing the firing rates from taking negative values. Assuming identical time constants for all populations, i.e., τ i = τ for all i, the dynamics of the full system can be written as where x = (x 1 , . . . , x N ) are the firing rates of the respective populations (excitatory and inhibitory), W is the connectivity matrix (describing local excitatory, local inhibitory, and long-range excitatory connections), I ext (t) is a vector of external inputs, and T = (T 1 , . . . , T N ) are the activation thresholds of the populations. For the single local microcircuit shown in Figure 1C, for example, W would be a 3-by-3 matrix with all entries w ij nonzero except for the inhibitory to inhibitory coupling. For two coupled microcircuits as in Figure 1E, the connectivity matrix consists of 4 blocks, with the diagonal blocks describing local connectivity, and the off-diagonal blocks describing long-range projections from excitatory units to the other circuit.

PLASTICITY MECHANISMS AND WEIGHT DYNAMICS
In our model, we assume that all connections w ij in Equation (2) are plastic, and are subject to the following weight update rule: Here, x pre and x post are the pre-and postsynaptic firing rates, respectively, w max is the maximum possible weight value, and w , A w , and τ s are positive constants, which we set to values that are compatible with experimental findings (see Table 1). The learning rate is determined by τ s , and w and A w determine the point at which the rule switches between depression (LTD) and potentiation (LTP). We will show that in a plastic network, global stability and circuit function are determined exclusively by those plasticity parameters. The plasticity rule is derived from the mean-field approximation of the triplet STDP rule by Pfister and Gerstner (2006), which we augment with a weight-dependent term, effectively limiting the weight values to the interval [0, w max ]. A more detailed derivation of the learning rule can be found in the Methods (section 4.1). The parameters w and A w are set differently for excitatory and inhibitory connections, leading to two types of simultaneously active plasticity mechanisms and weight dynamics, even though the same learning equation is used. We set w = exc and A w = A exc for all excitatory connections, and w = inh and A w = A inh for all inhibitory connections. In particular, we assume A inh to take very low values and set A inh = 0 in our analysis, effectively eliminating any dependence of the fixed point of inhibitory weights on the presynaptic rate. According to fits of the parameters to experimental data (see Table 1), this is a plausible assumption. For the sake of simplicity, we also assume the maximum possible weight value w max to be the same for all excitatory and inhibitory connections. Figure 2 illustrates the weight change as a function of the pre-and postsynaptic activity.

STABILITY ANALYSIS
The WTA circuit is assumed to function correctly if it converges to a stable state that represents the outcome of the computation it is supposed to perform. Conditions under which these networks converge to their (single) attractor state exponentially fast were previously derived by Rutishauser et al. (2011). Here, we extend those results to plastic networks and express stability criteria in terms of global learning rule parameters, rather than individual weight values. We first describe criteria for the stabilization of the network and learning rule dynamics, then derive from them conditions on the learning rule parameters. Our analysis leads to very simple sufficient conditions that ensure the desired stable WTA behavior.
The dynamics of the network activation and the weights are given by Equations (2) and (3), respectively. In the following, we will denote them by f and g, so the full dynamics can be written as a coupled dynamical systeṁ where f corresponds to the right hand side of Equation (2), and g combines the update rules for all weights (with different sets of parameters for excitatory and inhibitory connections) in one vector-valued function. We first restrict our analysis to the simplest case of a single winning excitatory population and derive conditions under which the plastic network converges to its fixed point. Later, we extend our analysis to larger systems of multiple coupled excitatory populations.

Analysis of single-node system
Let us first consider a simplified system, in which only one excitatory population is active, e.g., because one population receives much more external input than all others, and the inhibitory feedback suppresses the other populations. As silent populations neither contribute to the network dynamics nor to the weight dynamics, they can be excluded from the analysis. We can therefore reduce the description of the system to a single excitatory population x E , and an inhibitory population x I , together with the connections w E → E , w E → I , and w I → E between them.  Pfister and Gerstner (2006). In our simulations, we use parameters very similar to the "all-to-all" parameters for inhibitory connections, while for excitatory connections we use ones that are close to the "nearest spike" parameters.

Frontiers in Computational Neuroscience www.frontiersin.org
For a given set of (fixed) weights w c , Rutishauser et al. (2011) have shown by means of contraction theory (Lohmiller and Slotine, 1998) that the system of network activationsẋ = f (x, w c ) converges to its fixed point x * exponentially fast if its generalized Jacobian is negative definite. In our case, this condition reduces to If condition (6) is met, the system is called contracting and is guaranteed to converge to its attractor state exponentially fast for any constant input I ext , where the contraction rate is given by the left hand side of (6), divided by 2τ . Here, corresponds to the network gain. A more detailed derivation of the fixed point can be found in section 4.2. Note that we have set the activation threshold T equal to zero and provide external input I ext to the excitatory population only. This simplifies the analysis but does not affect our results qualitatively.

Decoupling of network and weight dynamics
In the following, we assume that the population dynamics is contracting, i.e., that condition (6) is met, to show that the plasticity dynamics Equation (5) drives the weights w to a state that is consistent with this condition. Essentially, our analysis has to be self-consistent with respect to the contraction of the activation dynamics. If we assume f and g to operate on very different timescales, we can decouple the two systems given by Equations (4) and (5). This is a valid assumption since neural (population) dynamics vary on timescales of tens or hundreds of milliseconds FIGURE 2 | Illustration of the learning rule. The weight change dw/dt is plotted as a function of the post-and presynaptic firing rate for fixed pre-(left) or postsynaptic (right) rates. The gray, dashed line shows the rule that we use for inhibitory connections and whose threshold for LTP, in contrast to excitatory connections, does not depend on the presynaptic rate. The black line marks the transition between LTD and LTP. In this example, the parameters of the learning rules were set to exc = 6 Hz, inh = 18 Hz, A exc = 2, w max = 4, and the weight value was fixed at w max /3 for excitatory and w max /2 for inhibitory connections.
(see Figure 5 for typical timescales of our system), while synaptic plasticity typically acts on timescales of seconds or minutes. This means that from the point of view of the weight dynamics g the population activation is at its fixed point x * almost all the time, because it converges to that point exponentially fast. We can thus model the activation dynamics as a quasi-static system, and approximate the learning dynamics as a function of the fixed point of the activation instead of the instantaneous activation.
The fixed point of this simplified system is found by setting g(x * , w) = 0, and according to Equation (3) is given by Combining this expression with Equations (7) and (8) leads to a system of non-linear equations that can be solved for the fixed point weights w * E → E , w * E → I , w * I → E , and activations x * E , and x * I . These values solely depend on the learning rule parameters w , A w , w max , and the external (training) input I ext . Figure 3 shows the fixed points of the weight dynamics as a function of exc , and the input strength I ext . Notably, w * E → E and w * E → I lie on a fixed line in the w E → E -w E → I plane for all parameters w and A w . As the weight values are bounded by 0 and w max , the weights converge to a finite value for I ext → ∞. This is also illustrated in Figure 4, which shows the final weight values as a function of w max , both for a finite training input and in the limit I ext → ∞.
Importantly, the function of a WTA circuit critically depends on the strength of the recurrent connection w E → E (Rutishauser et al., 2011). If w E → E > 1, the network operates in "hard" mode, where only one unit can win at a time and the activation of all For I ext → ∞ the weights converge to a limit point, as is illustrated in Figure 4. For these plots, the parameters A exc and inh were set to 2 and 18 Hz, respectively, and w max was set to 4. other units is zero. On the other hand, if w E → E is smaller than 1, the network implements "soft" competition, which means that multiple units can be active at the same time. From Equation (27) (Methods) it follows that w E → E > 1 is possible only if w max > A + 1. As we will show in the following section, this condition is necessarily satisfied by learning rules that lead to stable WTA circuits.

Parameter regimes for stable network function
We can now use the fixed points found in the previous section to express the condition for contraction given by condition (6) in terms of the learning rule parameters. In general, this new condition does not assume an analytically simple form. However, we can find simple sufficient conditions which still provide a good approximation to the actual value (see Methods section 4.2 for details). Specifically, as a key result of our analysis we derive the following sufficient condition: Convergence to a point in weight space that produces stable network dynamics is guaranteed if where b is a parameter of the order 1, which is related to the minimum activation x E (or the minimum non-zero input I ext ) during training for which this condition should hold. If the minimum input I min that the network will be trained on is known, then b can be computed from the fixed point x * E,min = x * E (I ext = I min ), and set to b = exc /x * E,min . This will guarantee contracting dynamics for the full range of training inputs I ext ∈ [I min , ∞). In typical scenarios, b can be set to a number of the order 1. This is due to the fact that the network activation is roughly of the same order as the input strength. Setting exc to a value of similar order leads to b = exc /x * E,min ≈ 1. Note that condition (11) is independent of exc and inh . This is due to a simplification that is based on the assumption A exc + b 1, which can be made without loss of generality. If b and A exc are set to very low values, the full expressions given by 38 and (39) (see Methods section 4.2) apply instead. Figure 5 shows the the region defined by (11) for different b together with the exact FIGURE 5 | Regions in learning rule parameter space that lead to a stable, contracting network. All panels show the regions of stability in w max -A exc space for different training input strengths. Colored lines correspond to exact solutions, while black, dotted lines correspond to the sufficient condition (11) for different values of b. The top panels illustrate that relatively small values of b (e.g., 2) roughly approximate the exact solution even for very small inputs (e.g., I ext = 1 Hz; left), whereas b can be set to lower values (e.g., b = 0; right) if the input is larger. The gray-scale value represents the convergence rate |λ| (in units of s −1 ) of the activation dynamics for τ = 10 ms. The bottom panel shows in color the exact regions of contraction for inputs I ext = 1, 10, 100 Hz and the approximation given by condition (11) for b = 0, 1, 2. Some of the colored regions (and dotted lines) correspond to the ones shown in the upper panels. It can be seen that for higher input strengths the upper bound on A exc (or equivalently, the lower bound on w max ) quickly converges to the b = 0 diagonal, which represents the asymptotic condition for I exc → ∞. For these plots, the learning rule parameters were set to exc = 6 Hz and inh = 18 Hz.

Frontiers in Computational Neuroscience
www.frontiersin.org July 2014 | Volume 8 | Article 68 | 5 condition for contraction, indicating that (11) is indeed sufficient and that b can safely be set to a value around 1 in most cases.

Extension to multiple units
So far, we have only studied a small network that can be regarded as a single subunit of a larger, distributed WTA system. However, our results can be generalized to larger systems without much effort. In our model, as illustrated in Figures 1D,E, different localized WTA circuits can be coupled via excitatory projections. These projections include excitatory-to-inhibitory connections, as well as reciprocal connections between distant excitatory units. In order to demonstrate the effects of this coupling, we consider two localized subsystems, x = (x E , x I ) and x = (x E , x I ), consisting of one excitatory and one inhibitory unit each. Furthermore, we add projections from x E to x E and x I , as required by our model. We denote by w E → E the strength of the long-range excitatory-to-excitatory connection, while we refer to the longrange excitatory-to-inhibitory connection as w E → I . Note that for the sake of clarity we only consider the unidirectional case x → x here, while the symmetric case x ↔ x can be dealt with analogously.
We first look at the excitatory-to-inhibitory connections. If only x E is active and x E is silent, then x I and x I are driven by the same presynaptic population (x E ), and w E → I converges to the same value as w E → I . Thus, after convergence, both inhibitory units are perfectly synchronized in their activation when x E is active, and an equal amount of inhibition can be provided to x E and x E .
Besides synchronization of inhibition, proper WTA functionality also requires the recurrent excitation w E → E (between the excitatory populations of the different subunits) to converge to sufficiently low values, such that different units compete via the synchronized inhibition rather than exciting each other through the excitatory links. As pointed out by Rutishauser et al. (2012), the network is stable and functions correctly if the recurrent excitation between populations is lower than the recurrent self-excitation, i.e., w E → E < w E → E .
We now consider the case where x E and x E receive an external input I ext . Whenever x E alone receives the input, there is no interaction between the two subunits, and the recurrent selfconnection w E → E converges to the value that was found for the simplified case of a single subunit (section 2.4.2). The same is true for the connection w E → E if x E alone receives the input. However, in this case x E and x E might also interact via the connection w E → E , which would then be subject to plasticity. As x projects to x , but not vice versa, we require x E > x E if both x E and x E receive the same input I ext , because x E should suppress x E via the long-range competition mechanism. In terms of connection strengths, this means that w * E → I w * I → E > w * E → E , i.e., the inhibitory input to x E that is due to x E must be greater than the excitatory input x E receives from x E . In the Methods (section 4.3), we show that a sufficient condition for this to be the case is which alters our results from section 2.4.3 only slightly, effectively shifting the lower bound on w max by an offset of 1, as can be seen by comparing conditions (11) and (12). On the other hand, making use of the fact that x E < x E , it can be shown that w E → E converges to a value smaller than w E → E (see Methods section 4.3), as required by the stability condition mentioned above.

GAIN CONTROL AND NORMALIZATION
In the previous section, we showed how synaptic plasticity can be used to drive the connection strengths toward regimes which guarantee stable network dynamics. Since the actual fixed point values of the weights change with the training input, this mechanism can as well be used to tune certain functional properties of the network. Here we focus on controlling the gain of the network, i.e., the relationship between the strength of the strongest input and the activation of the winning excitatory units within the recurrent circuit, as a function of the training input.
In the case of a single active population, the gain is given by as can be inferred from Equation (7). Depending on the gain, the network can either amplify ( > 1) or weaken ( < 1) the input signal. Figure 6 shows how the gain varies as a function of the learning rule parameters and the training input strength I ext . Low average input strengths cause the weights to converge to values that lead to an increased gain, while higher training inputs lower the gain. This can be regarded as a homeostatic mechanism, acting to keep the network output within a preferred range. This provides a mechanism for the network to adapt to a wide range of input strengths, while still allowing stable WTA competition.

SIMULATION RESULTS
As a final step, we verify the analytical results in software simulations of a distributed, plastic WTA network, as illustrated in Figures 1D,E. Note that here we consider the case where two subgroups are coupled bidirectionally via excitatory long-range projections, while in section 2.4.4, for the sake of clarity, we focus on the unidirectional case. The desired functionality of FIGURE 6 | Control of the network gain. The network gain is plotted for different learning rule parameters exc , A exc as a function of the training input strength I ext . inh was set to 18 Hz and w max was set to 4.

Frontiers in Computational Neuroscience
www.frontiersin.org July 2014 | Volume 8 | Article 68 | 6 the resulting network is global competition between the excitatory populations, i.e., the population that receives the strongest input should suppress activation of the other populations, even if the excitatory populations are not directly competing via the same, local inhibitory population. We consider a network with two groups, each consisting of two excitatory populations and one inhibitory population (see Figure 1E). While the excitatory populations are connected in an all-to-all manner, inhibitory populations can only target the excitatory populations within their local groups, but do not form long-range projection. Initially, all connection weights (excitatory and inhibitory) are set to random values between 0.3 and 1.8. Note that those values could potentially violate the conditions for contraction defined in (6), but we will show empirically that the plasticity mechanism can still drive the weights toward stable regimes. As training input, we present 1000 constant patterns for 2 s each. In every step, four input values in the ranges 5 ± 2 Hz, 10 ± 2 Hz, 15 ± 2 Hz, and 20 ± 2 Hz are drawn from uniform distributions and applied to the four excitatory units. The different input signals are randomly assigned to the populations in every step, such that a randomly chosen population receives the strongest input. Thereby, each population only receives one of the four inputs. Figure 7 shows the activation of the different populations before and after learning. Before learning (left), the network does not necessarily implement stable competition between the different excitatory populations. Instead, it may end up in an oscillating state or amplify the wrong winning unit. However, after training (Figure 7, right), the network always converges to a stable state representing the winner of the competition. Furthermore, it can be seen that the inhibitory populations perfectly synchronize, as described in section 2.4.4.
The change of weights is illustrated in Figure 8: Initially (top), all weights were set to random values in the range [0.3, 1.8]. Since all populations receive the same average input, the weight matrices should converge to symmetric states. For the specific set of learning rule parameters we chose in this example, and the specific input rates described above, w E → E converges to a value around 1, which means that the network is at the edge of the transition between hard and soft WTA behavior. The weights w E → I , connecting excitatory to inhibitory units, converge to values around 2. Furthermore, the weights w I → E , which connect inhibitory to excitatory units all converge to very similar values (around 1.1), such that inhibition is synchronized across the whole network. Note that not all connections between excitatory populations have converged to the same value. This is because as soon as the network is close to the hard WTA regime, some connections cannot change anymore as only one excitatory unit is active at a time, and the weight change is zero if either the pre-or the post-synaptic unit is inactive.

DISCUSSION
We have shown how neural circuits of excitatory and inhibitory neurons can self-organize to implement stable WTA competition. This is achieved through an interplay of excitatory and inhibitory plasticity mechanisms operating on all synapses of the network. As a key result, we provide analytical constraints on the learning rule parameters, which guarantee emergence of the desired network function. FIGURE 7 | Simulated evolution of the plastic network. 1000 input patterns were applied for 2 s each. Two local subsystems, consisting of two excitatory units and one inhibitory unit each, are coupled via all-to-all excitatory connections, while inhibitory feedback is provided only locally. The first three rows show the populations of the first group, where the first two rows correspond to the two excitatory populations, and the third row shows the activation of the inhibitory population. The last three rows show the activations of the second group. The left panel shows the first 20 s after initialization (before learning), while the right panel shows the network activity during the last 20 s (after learning). Solid blue and orange lines correspond to the firing rate of the respective population, whereby the highlighted segments (orange) mark the winner among the four excitatory populations. The input given to the individual units is plotted as a dotted black or solid magenta line, where the highlighted segments (magenta) correspond to the strongest signal among the four. Thus, the network operates correctly if magenta and orange lines are aligned (the one population that receives the strongest input wins the competition), while misaligned lines (the population receiving the strongest input does not win the competition) indicate incorrect operation. Initially, the network frequently selects the wrong winning unit and even starts oscillating for some input patterns (around 18 s). After training (right), the network converges to a stable state with only the winning unit active for different input patterns.

Frontiers in Computational Neuroscience
www.frontiersin.org Although constraints on the weights for stable competition in recurrent excitatory-inhibitory circuits have been derived before (Xie et al., 2002;Hahnloser et al., 2003;Rutishauser and Douglas, 2009;Rutishauser et al., 2012), it has remained unclear how a network can self-tune its synaptic weights to comply with these conditions. The presented model achieves this and provides important insights regarding the mechanisms responsible for this self-tuning. Our results predict a relationship between the maximum synaptic weight w max in a circuit and the learning rule parameter A exc , which controls the contribution of the presynaptic rate to the shifting of the threshold between potentiation and depression. Furthermore, our model predicts a relationship between the network gain and the amount of excitatory input into the circuit during development or training (see Figure 6), indicating that high gain (amplification) should be expected for weak inputs, and low gain for strong inputs, which is in accordance with common assumptions about homeostasic mechanisms (Turrigiano, 2011).
From a developmental perspective, the self-configuration of functional WTA circuits through plasticity has the advantage of requiring a smaller number of parameters to be encoded genetically to obtain stable and functional network structures (Zubler et al., 2013). With self-tuning mechanisms like the ones suggested here, only the parameters for the two different types of plasticity in excitatory and inhibitory synapses, rather than the strengths of all synaptic connections, need to be specified, and the network can adapt to the statistics of inputs it receives from its environment and from other brain regions.
Besides guaranteeing stability, it is also desirable to control functional properties of the circuit, such as its gain. Experimental data suggests that cortical recurrent circuits often operate in a high gain regime and with strong (larger than unity) recurrent excitatory feedback (Douglas et al., 1995). The strength of this feedback determines whether the WTA is "soft" (multiple excitatory units can be active at the same time) or "hard" (only one unit can be active at a time, i.e., the network operates in a nonlinear regime) (Rutishauser et al., 2011). Many interesting computations that can be realized with these types of networks rely on the non-linearities introduced by such strong recurrent excitation (e.g., Vapnik, 2000), therefore it is important that similar conditions can be achieved with our model. In addition, various forms of learning rely on balanced WTA competition (Masquelier et al., 2009;Habenschuss et al., 2012;Nessler et al., 2013), which requires an adaptation of the gain as the excitatory connections into the circuit undergo plasticity. In our network, the resulting network gain is a function of both the learning rule parameters and the strength of the training input signals. As a consequence, our system can switch between high and low gain, and hard or soft WTA behavior simply by receiving input stimuli of different (average) strengths. Thus, different parts of the network might develop into different functional modules, depending on the inputs they receive.
Our model does not specifically address the question of how the network structure, which leads to our results (essentially random all-to-all connectivity) might develop in the first place. For instance, if certain long-range connections between multiple subcircuits do not exist initially, they will never be established by our model, and the units of the different subcircuits can never compete. On the one hand, this might be a desired effect, e.g., to construct hierarchies or asymmetric structures for competition, in which some parts of the network are able to suppress other parts, but not vice-versa. On the other hand, structural plasticity could account for the creation of missing synaptic connections, or the removal of ineffective connections if the desired stable function cannot be achieved with the anatomical substrate. There is increasing evidence for activity dependent synapse formation and elimination in both juvenile and adult brains (Butz et al., 2009), in particular a coordinated restructuring of inhibitory and excitatory synapses for functional reorganization (Chen and Nedivi, 2013). Another approach, recently investigated in simulations by Bauer (2013), is to set up the right network topology by developmental self-construction processes in a first step, and the tune the network using synaptic plasticity in a second step.
Our model is based on a weight-dependent variation of the learning rule proposed by Pfister and Gerstner (2006), but this is by no means the only learning rule capable of the self-calibration effect we describe in this article. By changing its parametrization, the rule can subsume a wide variety of commonly used Hebbian, STDP-like, and homeostatic plasticity mechanisms. Indeed, further experiments, which are not presented in this manuscript, indicate that a whole class of learning rules with depression at low and potentiation at high postsynaptic firing rates would lead to similar results. We chose the triplet rule to demonstrate our findings as its parameters have been mapped to experiments, and also because it can be written in an analytically tractable form. We have assumed here a specific type of inhibitory plasticity, which analytically is of the same form as the simultaneous excitatory plasticity, but uses different parameters. With the parameters we chose for the inhibitory plasticity rule, we obtain a form that is very similar to the one proposed by Vogels et al. (2011). By introducing inhibitory plasticity it is no longer necessary to make common but biologically unrealistic assumptions, like pre-specified constant and uniform inhibitory connection strengths (Oster et al., 2009), or more abstract forms of summing up the excitatory activity in the circuit (Jug et al., 2012;Nessler et al., 2013), because inhibitory weights will automatically converge toward stable regions. Inhibitory plasticity has received more attention recently with the introduction of new measurement techniques, and has revealed a great diversity of plasticity mechanisms, in line with the diversity of inhibitory cell types (Kullmann and Lamsa, 2011;Kullmann et al., 2012). Our model involves only a single inhibitory population per local sub-circuit, which interacts with all local excitatory units. Not only is this a common assumption in most previous models, and greatly simplifies the analysis, but also is in accordance with anatomical and electrophysiological results of relatively unspecific inhibitory activity in sensory cortical areas (Kerlin et al., 2010;Bock et al., 2011). However, recent studies have shown more complex interactions of different inhibitory cell types (Pfeffer et al., 2013), making models based on diverse cell types with different properties an intriguing target for future studies. The assumption of a common inhibitory pool that connects to all excitatory units is justified for local circuits, but violates anatomical constraints on the length of inhibitory axons if interacting populations are far apart (Binzegger et al., 2005). Our results easily generalize to the case of distributed inhibition, by adapting the model of Rutishauser et al. (2012) (see Figure 1E). Our contribution is to provide the first learning theory for these types of circuits.
Since our model is purely rate-based, a logical next step is to investigate how it translates into the spiking neural network domain. Establishing similar constraints on spike-based learning rules that enable stable WTA competition remains an open problem for future research, although  have shown empirically that WTA behavior in a circuit with topologically ordered input is possible under certain restrictions on initial synapse strengths, and in the presence of STDP and shortterm plasticity. Spiking WTA circuits can potentially utilize the richer temporal dynamics of spike trains in the sense that the order of spikes and spike-spike correlations have an effect on the connectivity.
Potential practical applications of our model, and future spiking extensions, lie in neuromorphic VLSI circuits, which have to deal with the problem of device mismatch (Indiveri et al., 2011), and can thus not be precisely configured a priori. Our model could provide a means for the circuits to self-tune and autonomously adapt to the peculiarities of the hardware.

DERIVATION OF THE PLASTICITY MECHANISM
The learning rule given by Equation (3) is based on the triplet STDP rule by Pfister and Gerstner (2006). Since we are interested in the rate dynamics, we use the mean-field approximation of this rule, which is provided by the authors and leads to an expected weight change oḟ where x pre , x post are the pre-and postsynaptic activations and A ± 2 , A ± 3 , τ ± , τ x,y are parameters that determine the amplitude of weight changes in the triplet STDP model. All of the parameters are assumed to be positive. Through a substitution of constants given by the rule in Equation (13) can be written in the simpler forṁ where w is in units of a firing rate and A w is a unitless constant. The terms in parentheses on the right of Equation (17) can be divided into a positive (LTP) part that depends on x post , and a negative (LTD) part that depends on x pre . In order to constrain the range of weights, we add weight-dependent terms m + (w) and m − (w) to the two parts of the rule, which yieldṡ Throughout this manuscript, we use a simple, linear weight dependence m + = w max − w and m − = w, which effectively limits the possible values of weights to the interval [0, w max ]. We chose this form, which is described by a single parameter, for reasons of analytical tractability and because it is consistent with experimental findings (Gütig et al., 2003). In Pfister and Gerstner (2006), values for the parameters τ x,y , τ ± , and A ± 2,3 of the rule Equation (13) were determined from fits to experimental measurements in pyramidal cells in visual cortex (see Table 1) and hippocampal cultures Poo, 1998, 2001;Sjöström et al., 2001;Wang et al., 2005). We used these values to calculate plausible values for w , A w , and τ s using Equations (14) to (16). In our simulations, we use parameters very similar to the experimentally derived values in Table 1. Specifically, for inhibitory connections we use parameters very similar to the ones found from fits of experimental data to the triplet STDP model with all-to-all spike interactions. On the other hand, we choose parameters for the excitatory plasticity rules which are close to fits of the triplet STDP rule with nearest-neighbor spike interactions. The parameters that were used in software simulations and to obtain most of the numeric results are listed in Table 2. Note that for the weight-dependent rule in Equation (18) we have assumed that the parameter w influences only the LTD part. According to the definition in Equation (15), this is the case if A − 2 A + 2 , or w ≈ A − 2 τ − /τ s , respectively. Otherwise w contains both a potentiating (A + 2 ) and a depressing (A − 2 ) component, and Equation (18) should be replaced with a more complex expression of the form of Equation (13).

DERIVATION OF THE STABILITY CRITERIA
In section 2.4, we outlined how the fixed points and stability criteria for the WTA system can be found. In this section, we provide the detailed derivations that led to these results.
As described in section 2.4, we first consider a simplified system of one excitatory and one inhibitory population, x E and x I , which yield an activation vector x = (x E , x I ) T . They are coupled recurrently through a weight matrix W = w E → E w I → E w E → I 0 , receive external inputs I ext (t) with weights μ E and μ I respectively, and have thresholds T E , T I . Assuming that both units are active, i.e., their total synaptic input is larger than their thresholds, their dynamics are described by where τ exc , τ inh are the population time constants. The fixed points of the activations can be found by settingẋ E =ẋ I = 0. If we assume, for simplicity, that T E = T I = 0 this yields the fixed points is the network gain. Furthermore, we can make the assumption that μ I = 0 and μ E = 1, effectively disabling the external input to the inhibitory population. This reduces Equations (21) and (22) to These simplifications do not change the results of our analysis qualitatively and can be made without loss of generality. Approximating x pre and x post by their fixed point activities (as described in section 2.4), and settingẇ = 0 in the learning rule Equation (18), the fixed point of the weight dynamics (with w > 0) takes the form Note that this fixed point in weight space always exists for any given x pre and x post , and is stable for the weight dependence m + (w) = w max − w; m − (w) = w that we chose in Equation (18). In fact, this is true for all choices of the weight dependence satisfying ∂m + /∂w < 0 and ∂m − /∂w > 0, as can be shown by means of a linear stability analysis. We now derive the fixed points for the weights w E → E , w E → I , and w I → E of the simplified system. For w E → E , Equation (26) can be simplified by noting that x * pre = x * post = x * E , leading to an expression that depends on the activation of the excitatory population x * E : Similarly, we can compute the fixed point of w E → I as a function of x * E , noting that x * post = x * I = w E → I x * E [see Equations (24) and (25)]: Finally, using the relationship x * I = w E → I x * E from Equations (24) and (25), and the previously computed value of w E → I from Equation (28) with the fixed point equation for w I → E , we obtain In the following, we set A inh = 0, as described in section 2.3. An exact solution for the activation x * E at the fixed point of the system is obtained by inserting w * E → E , w * E → I , and w * I → E into Equation (24), and solving the resulting fixed-point prob- . This corresponds to finding the roots of the third order polynomial a 1 = − exc inh + exc I ext + inh I ext + inh A exc I ext The activation of the excitatory population x E at the fixed point is then given by the positive, real root of Equation (30). The fixed point of the activation x * E , and thus the fixed points of the weights, are monotonic functions of the training input strength I ext (see Figure 3, for example). In the following, we investigate the behavior of the fixed point weight values for very large and very small external inputs during training, respectively. This helps us to find conditions on the learning rule parameters that lead to stable dynamics (of the network activation) for any training input strength. We define a positive constant b := exc /x * E , and plug it into Equations (27)-(29). This yields Inserting Equations (35)-(37) into the condition for contraction of the activation dynamics given by (6), we can describe the condition in terms of the learning rule parameters, and a new constant˜ := exc /( exc + b inh ): Assuming A exc + b 1 (note that we can always set A exc to a sufficiently large value), the conditions reduce to 0 <˜ < 1, whereby the first condition can be dropped, since˜ ∈ [0, 1] always holds. The second condition still depends on b, and therefore on x * E . We will illustrate how to eliminate this dependence under very weak assumptions. First, in the limit of very large inputs x * E also takes very large values, leading to b → 0 for I ext → ∞. In that case, condition (41) becomes independent of b and can be written as A exc < w max < 2(1 + A exc ).
On the other hand, in the case of very small inputs we have to include the effects of b, as b can in principle take very large values.
In typical scenarios the output of the network can be assumed to be roughly of the order of its input. If exc is chosen to be of the same order, then b ≈ 1. For any finite b, we can express the stability condition that is valid for all inputs as the intersection of the conditions for large inputs, condition (42), with the one for arbitrarily small inputs, condition (41), leading to Note that this condition can be met for any finite b by choosing sufficiently large A exc and w max . However, as discussed above, choices of the parameter b of the order 1 should be sufficient for typical scenarios, whereas higher values would guarantee stable dynamics for very low input strengths (e.g., I ext exc ). This is illustrated in Figure 5, where the exact regions of stability as a function of w max and A exc are shown for different training input strengths, together with the sufficient conditions given by (43). In practice, a good starting point for picking a value b for which the stability conditions should hold is to determine the minimum non-zero input I min encountered during training for which this condition should hold, and setting b = exc /x * E,min , where x * E,min is the fixed point activation for I ext = I min .

EXTENSION TO MULTIPLE UNITS
In this section, we illustrate how multiple subunits, as analyzed in the previous section, can be combined to larger WTA networks with distributed inhibition. For the sake of simplicity, we only consider the unidirectional case, where a subunit x = (x E , x I ) projects onto another subunit x = (x E , x I ) via excitatory connections w E → E and w E → I . The bidirectional case x ↔ x can be analyzed analogously. If x E and x E receive the same input, the response of x E should be weaker, such that activation of x E causes suppression of x E rather than excitation. This means that must hold. We assume that both subsystems have been trained on inputs of the same average strength, such that their local connections have converged to the same weights, i.e., w * E → I = w * E → I and w * I → E = w * I → E . Furthermore, we assume that condition (44) is true initially. This can be guaranteed by setting the initial value of w E → E to a sufficiently small number. Our task then is to show that condition (44) remains true for all time. The values of w * E → I and w * I → E , or w * E → I and w * I → E respectively, are described by Equations (36) and (37). On the other hand, according to Equation (26), the value of w * E → E is given by Plugging all this into condition (44) and simplifying the expression, leads to the condition which can be replaced by the sufficient condition that guarantees x * E < x * E if both excitatory populations receive the same input. On the other hand, this result implies w * E → E < w * E → E , which is required for stable network dynamics (Rutishauser et al., 2012), and can be verified by comparing the respective fixed point equations

SOFTWARE SIMULATION
Software simulations of our model were implemented using custom Python code based on the "NumPy" and "Dana" packages, and run on a Linux workstation. Numerical integration of the system dynamics was carried out using the forward Euler method with a 1 ms timestep.