Short-Term Facilitation may Stabilize Parametric Working Memory Trace

Networks with continuous set of attractors are considered to be a paradigmatic model for parametric working memory (WM), but require fine tuning of connections and are thus structurally unstable. Here we analyzed the network with ring attractor, where connections are not perfectly tuned and the activity state therefore drifts in the absence of the stabilizing stimulus. We derive an analytical expression for the drift dynamics and conclude that the network cannot function as WM for a period of several seconds, a typical delay time in monkey memory experiments. We propose that short-term synaptic facilitation in recurrent connections significantly improves the robustness of the model by slowing down the drift of activity bump. Extending the calculation of the drift velocity to network with synaptic facilitation, we conclude that facilitation can slow down the drift by a large factor, rendering the network suitable as a model of WM.


INTRODUCTION
Working memory (WM) is an attentional function that enables the online holding and manipulation of information. WM is crucial to higher cognitive functions such as planning, reasoning, decision-making, and language comprehension (Baddeley, 1986;Fuster, 2008). The persistent activity recorded in specific groups of neurons in several cortical areas during WM tasks is thought to be the main neuronal correlate of WM (Fuster and Alexander, 1971;Miyashita and Chang, 1988;Funahashi et al., 1989Funahashi et al., , 1990Romo et al., 1999Romo et al., , 2002. For example, a substantial fraction of neurons in the prefrontal cortex (PFC) elevate their activity selectively and persistently during the delay period of visuo-spatial WM tasks in which a monkey has to remember the direction of a visual cue for several seconds (Funahashi et al., 1989;Constantinidis and Steinmetz, 2001;Constantinidis and Wang, 2004).
This selective persistent activity must be created internally since during the delay period the sensory cue is absent. A classical view is that this activity reflects the existence of many intrinsic states of the PFC circuits, each of which is an attractor of the network dynamics (Hebb, 1949;Amit et al., 1997;Seung, 1998;Wang, 2001). Each state is characterized by a specific group of active neurons that sustain their activity through their recurrent interactions. In this framework, the transiently presented cue triggers the selection of a specific attractor in which network remains after the cue has disappeared until the behavioral task is performed (the delay period). Once the cue feature has lost its behavioral relevance the network returns to its baseline state. Therefore, during the delay, the neuronal activity encodes a memory trace of the cue feature.
The cue features that are kept in WM could be either of discrete nature (categorical), such as a visual object or a spoken word, or continuous, such as the frequency of a vibration or the spatial location of a visual stimulus. In the first case, a paradigmatic theoretical framework is a Hopfield-like network in which the dynamics display a discrete set of attractors (Hopfield, 1984;Amit, 1989). In the second case, on which the present paper focuses, the relevant framework is of a network with a continuous set of attractors.
To be specific we consider the case of visuo-spatial WM task. A model to account for the delay activity observed in these tasks is a recurrent network made of identical neurons with the geometry of a circle and a connectivity pattern such that the interaction strength between two neurons is fine-tuned to depend only on their distance on the ring (Amari, 1977). Therefore the network architecture is invariant to rotation. With sufficiently strong and spatially modulated recurrent excitation the network dynamics possess an infinite and continuous set (ring) of attractors (Ben-Yishai et al., 1995;Hansel and Sompolinsky, 1998). In an attractor the activity is persistent and its profile has the shape of a "bump" localized at an arbitrary location. A transient stimulus tuned to a specific location on the ring, corresponding to the direction to be memorized, selects the bump attractor which peaks at that location. After the stimulus is withdrawn, the network remains in this attractor (Figure 1). Therefore the network is able to encode in its activity the memory of the cue location.
While this mechanism is an attractive general proposal for a network basis of parametric WM, it suffers from an essential limitation. Indeed, it is structurally unstable: any deviation from the rotational symmetry, e.g., spatial heterogeneities in the connectivity or in intrinsic properties of the neurons, destroys the continuity of the attractor set (Tsodyks and Sejnowski, 1995;Zhang, 1996;Seung et al., 2000;Renart et al., 2003). In terms of WM, this means FIGURE 1 | Working memory via persistent activity. A recurrent network of neurons (black circles) is ordered by a one-dimensional variable θ . (A) During spontaneous activity, the network receives a spatially homogeneous weak input (red), which results in a homogeneous profile of neural activity (blue). (B) During the cue period an inhomogeneous input (red), is received, peaked at a particular position θ cue . The resulting profile of neural activity (blue) isa "bump" centered at the same position. (C) During the delay period the network receives a spatially homogeneous input, while the profile of neural activity remains inhomogeneous and centered at the same position.
that during the delay period, the network state drifts toward an attractor that is only very weakly correlated with the cue position, leading to the loss of memory about the precise position of the cue. Therefore the ring model can be a biologically realistic mechanism of parametric WM only if the recurrent connections are sufficiently fine-tuned to satisfy the following two conditions: (1) to guarantee the stability of the bump activity pattern and (2) to prevent its fast drift away from the cue position during the delay period. Moreover, the latter condition requires much tighter tuning than the former as indicated by previous analysis of this and other continuous attractor models (Tsodyks and Sejnowski, 1995;Zhang, 1996;Koulakov et al., 2002;Renart et al., 2003). Since there is little experimental evidence for the degree of tuning of neocortical connections, we address this issue theoretically by considering the ring model with strongly inhomogeneous connections that satisfy the first condition.
Connections between PFC pyramidal neurons exhibit increased level of short-term synaptic facilitation compared to primary sensory areas (Wang et al., 2006). Synaptic facilitation could be involved in WM by providing a memory trace lasting for up to several seconds (Mongillo et al., 2008). Here we show that short-term synaptic facilitation in recurrent interactions in PFC may be mitigating the inherent structural instability of the ring model by slowing down the degradation of the memory trace during the delay period. To this end we consider a rate model with "ring architecture" in which the symmetry of the network is broken due to random spatial fluctuations in the connectivity. We derive analytical one-dimensional approximations of the "bump" dynamics. This allows us to estimate the velocity of the drift of the activity bump and hence the rapidity at which the memory of the cue feature fades during the delay period. Subsequently, we argue that without facilitation, the drift is too fast and the number of attractors is too small for the memory trace to be accurate over the typical duration of the delay period used in WM experiments, which is of several seconds. However, synaptic facilitation temporarily modifies effective connection strengths, selectively amplifying connections between neurons that are activated by the cue. We demonstrate that this amplification tends to pin down the bump at its initial position. We show analytically that this slows down the drift dramatically (up to a factor of 100), so that no additional tuning of connections beyond that required for the stability of the bump is needed. We conclude that short-term facilitation may be essential to the stability of memory of continuous variables over delay duration up to a few tens of seconds.

The model
We consider a recurrent network which models a local circuit in dorsolateral PFC. It consists of a set of N units, or mini-columns, each of which corresponds to a group of PFC neurons with similar functional properties. The network has the architecture of a ring, a unit being labeled by its location θ i on a circle (i = 1,.., N ; −π ≤ θ i < π). The population average firing rate of the i-th unit, denoted as m i = m(θ i ), satisfies the rate dynamics: where [y] + = max(0,y), and the synaptic matrix J is given by: The first term in (2) depends only on the distance between the i-th and j-th units and therefore is invariant by translation along the ring. In the second term the elements n ij are random and independently distributed with zero mean and unit variance. This represents random spatial fluctuations in the connection strengths between mini-columns and breaks the rotational symmetry on the ring. The two terms in (2) scale with the network size N so that each of their contributions to the total input N j=1 J ij m j is finite (= O(1)) in the limit of large N. Note that the scaling N −1/2 of the heterogeneity term results in preservation of the "bump" attractor dynamics for ε = O(1) in the limit of large N (see next section).

Frontiers in Computational Neuroscience
www.frontiersin.org The external input I i (t ) to the i-th mini-column, is the sum of two terms, I i = I b + I cue i . The background input I b reflects the attentional state that we assume to depend on the epoch of the task. We take I b ≤ 0 during pre-stimulus period whereas I b = I 0 > 0 during the delay period. The "cue" input I cue i is received during presentation of a transient sensory-related stimulus that is directionally selective (Figure 1).
Without synaptic heterogeneity (ε = 0), the interaction pattern is rotationally invariant and it is well-known (Ben-Yishai et al., 1995;Hansel and Sompolinsky, 1998;Blumenfeld et al., 2006) that if J 1 > 1 (and for J 0 sufficiently negative) the dynamics of the network in the presence of spatially homogeneous supra-threshold external input (i.e., I i = I 0 > 0), possess N stable attractors. At the attractor, the activity profile has the shape of a localized bump of activity given by where ψ is the center of the bump and θ c is its half-width that satisfies (see Appendix) Upon presentation of the stimulus in direction θ cue , during the cue period, the attractor with ψ = θ cue is selected and the network remains in this state until the end of the delay period (Figures 1B,C). Therefore the cue direction is kept in the population activity of the network during the delay period.

Continuous attractor falls apart in the presence of synaptic heterogeneity
The continuity of the attractor set is broken in the presence of heterogeneities in the interactions, i.e., when ε = 0 (Tsodyks and Sejnowski, 1995;Zhang, 1996;Seung et al., 2000;Renart et al., 2003). This happens even for arbitrary small ε. Although in this case the shape of the "bump" is very weakly perturbed by the interaction heterogeneities, there is only a small number of locations on the circle where it is stable. During the delay period the bump drifts to one of these locations as shown in the simulation results displayed in Figure 2A. As the interactions are more and more heterogeneous (increasing ε), the drift becomes even faster and the perturbation in the bumps shape becomes more severe, until it disappears completely. With the scaling of Eq. (2) this happens at ε that is on the order of one. Note that the Figure 2A used the value ε = 0.5; at this value the magnitude of the synaptic heterogeneities is significantly larger than the magnitude of the synaptic efficacies of the "unperturbed" Ring model ( Figure A1 in Appendix). This is due to the different scalings w.r.t. N of these two terms in Eq.
(2). Nevertheless, the approximate shape of the bump attractor was still roughly preserved.
In order to gain a better understanding of the attractors of the dynamics and the robustness of the memory trace when the interactions are heterogeneous we derived an analytical approximation FIGURE 2 | (A) Evolution of firing rates in a network with synaptic heterogeneity (ε = 0.5). Black curve indicates ψ(t ) for the m(θ,t ) plotted in color-scale. Magenta curves indicate ψ(t ) for the solutions m(θ,t ) with an initial bump centered at different angles. (B) Comparison of ψ(t ), computed for the solution of the Ring model (black) and the reduced one-dimensional evolution according to Eq. (4) (red). (C) Plot ofψ ≈ ετ −1 F nf (ψ) as a function of ψ for the same instance of n ij as in (A). Notice that two attractors from the (A) are at the angles where F nf (ψ) is zero with negative slope. (D) The function g nf (θ c ) (black trace), that describes the dependence of the average speed on the bump width, is very well approximated by the linear function 2/5θ c (gray line).

Frontiers in Computational Neuroscience
www.frontiersin.org for the drift velocity under the assumption that N is large and that ε is small so that the shape of the bump is approximated by Eq. (3), i.e., m(θ,t ) = m 0 (θ − ψ(t )) + O(ε), in the presence of constant spatially homogeneous input I 0 . A singular perturbation analysis of the dynamics (see Methods and also the Appendix) shows that the bump location ψ approximately follows the one-dimensional dynamics: with where m 0 (θ ) is the "bump" profile (3) and θ c is its half-width (see derivation in Appendix). Although Eq. (4) was derived in the limit of small ε, it captures well the evolution of ψ(t ) even if ε is relatively large. This is shown in the simulation results in Figures 2B,C (ε = 0.5). The fixed points of dynamics (4) are given by the solutions to the equation F nf (ψ 0) = 0. If the condition F nf (ψ 0 ) < 0 is also satisfied, then this solution corresponds to a stable fixed point of (4), and therefore to the peak of a "bump" attractor of the dynamics (1). It should be noted that with our choice of the bump width (≈90˚) there were typically two or three attractors of the dynamics of (1), and therefore these discrete attractors did not approximate the circle satisfactorily ( Figure A2 in Appendix).
Equation (4) demonstrates that the dynamics of the firing rates in the ring model exhibit a structural instability, i.e., even a very small perturbation of the structure of the synaptic matrix causes a drift of the WM trace. However, this drift might not be a concern if it were happening on time-scales too slow to be relevant to the WM. For this reason we estimate the drifts average velocity.

Average drift velocity
The precise number of attractors and their location on the ring depend on the specific realization of the interaction pattern n ij = n(θ i ,θ j ). For a given realization, the trajectory of the drift of the bump toward one of them depends also on its initial location at the beginning of the delay period. To get an estimate of the drift speed that is independent of the network realization we computed the square of the drift velocity averaged over the network realizations. Using Eq. (4) we found (see derivation in the Appendix) that The mean square speed of the drift is thus inversely proportional to the square root of the number of mini-columns and to the time constant τ . It is also approximately proportional to the width of the bump ( Figure 2D). Note that the mean square velocity <ψ 2 > does not depend on the strength I 0 of the background input because we have assumed a threshold linear input-output transfer function. For a more general transfer function the background input may affect the speed of the drift.
The size of a typical memory field measured in PFC (Funahashi et al., 1989(Funahashi et al., , 1990 indicates that a possible range of values for the bump width θ c is 60-100˚. As for the timescale τ , its value in our simplified model should be taken as the typical time constants of the synapses in PFC (Shriki et al., 2003). Therefore if one assumes that interactions between PFC neurons is essentially mediated by AMPA and GABA A synapses, it is reasonable to take τ on the order of 5-10 ms. For the network size we take N = 720; this corresponds to a discretization of the representation of the cue feature with a precision of 0.5˚. With these parameters and our choice 1 of the heterogeneity strength ε = 0.5 Eq. (5) thus predicts a drift velocity on the order of many tens of degrees per second (with τ = 10 ms, the mean square velocity is in the range 71-119 deg/s for θ c in the range of 60-100˚). This is much larger than the drift velocity measured experimentally, which is on the order of 1-2 deg/s (White et al., 1994;Ploner et al., 1998). One would need to reduce ε by two orders of magnitude (e.g., ε = 0.005) to move the estimate (5) into the experimentally observed range. A drift velocity closer to, although still significantly faster than, experimental data can also be obtained if one assumes that in the PFC the synaptic time constants of the vast majority of the excitatory interactions are mediated by NMDA currents, with characteristic time constant in the range 50-100 ms.

WORKING MEMORY IN THE PRESENCE OF SHORT-TERM SYNAPTIC FACILITATION
While bump attractor dynamics are preserved for values of the heterogeneity strength ε up to order O(1), the above analysis indicates that the drift of the activity bump may be too fast unless ε is chosen to be much closer to zero -i.e., unless the synaptic efficacies have much finer tuning than necessary for maintaining bump attractor dynamics. In the following we investigate an alternative mechanism which does not necessitate fine-tuned synapses, but instead relies on facilitation to slow down the drift.

Drift velocity of the "bump" slows down in the presence of synaptic facilitation
We assume now that the recurrent interactions display short-term plasticity properties that we model as in (Barak and Tsodyks, 2007;Mongillo et al., 2008). Therefore the dynamics of the network are given by where two synaptic variables (u,x) describe the activity-dependent dynamics of synaptic efficacy: x stands for availability of synaptic resources and u stands for utilization factor, so that their product scales the synaptic strength. The variable u is up-regulated by presynaptic activity above its baseline value u = U (facilitation) and x is down-regulated by the activity below its baseline value 1 We have chosen ε = 0.5 as a "large" heterogeneity strength of order O(1) that still preserves the bump attractor dynamics.

Frontiers in Computational Neuroscience
www.frontiersin.org x = 1 (depression). In the absence of firing, synaptic facilitation, and depression variables decay to their baseline levels with time constants t f and t d respectively.
Simulations of these dynamics reveal that if the overall behavior of a single synapse is facilitating (e.g., U is small enough and t f t d ), the drift velocity of the bump is much slower for comparable shapes and heights of the bump (Figure 3B, cf. Figure 2A, see also Figure A3 in Appendix). To understand how this dramatic reduction in drift velocity results from synaptic facilitation we consider the limit in which the depression time constant is much faster than the facilitation time constant. We also assume that U is small, so that we can use a linear approximation for the dynamics of the facilitation variable u. In this limit, we can substitute x = 1 and approximate the dynamics of u by a linear equation, resulting in a system In the absence of heterogeneities (ε = 0), the model with facilitation also exhibits a ring attractor in a certain range of synaptic parameters. However, unlike in the case of a network with constant synapses, transition from uniform state to ring attractor now depends on the strength of the supra-threshold external input I 0 > 0, since recurrent connection effective strength increases with activity due to facilitation. In particular, for our choice of parameters, the uniform solution is stable for weak inputs, and activity bump appears for increasing input ( Figure 3A). This scenario implies that if the baseline state of the network is below the transition, the considered circuit cannot keep the parametric memory trace, as a stable "bump" solution is needed for such a function. An additional external (non-specific) excitation, which could come, for example, from attentional regulation of the network, yields a ring of stable bump solutions and thus enables the circuit to perform its parametric WM function (Figure 1; note that both in the case with or without facilitation we assume that during the delay period there is a non-selective excitatory attentional signal).
As above, assuming that ε is small, we can derive an analytic expression for the evolution of ψ(t ). One finds (see derivation in the Appendix) thaṫ with and the functions m 0 (θ ) and u 0 (θ ) are the attractors of the unperturbed (ε = 0) system (7). Equation (8) approximates well the dynamics of the center of the drifting bump ( Figure 3B).

Attractors in the presence of synaptic facilitation
While the speed of the drift is significantly slowed in the presence of the synaptic facilitation, the attractors of the drift appear to be very similar in Figures 2A and 3B. In fact, the functions F f (ψ) and F nf (ψ), while on a different scale have very similar set of zeros ( Figure 3C), and F f (ψ) is close to "rescaling" of the function F nf (ψ) (Figure A4 in Appendix).
It can be shown (see Appendix) that in the absence of facilitation the location of attractors of the Ring model does not depend on the strength of the feedforward input I 0 . This is no longer true in the presence of facilitation. In order to investigate how the location and the number of attractors are affected by the presence of synaptic facilitation we performed simulation in which we varied the strength of the constant feedforward input I 0 fivefold. Figure 4B shows that although the function F f (ψ) gained new attractors, these new attractors are not deep, i.e., small perturbations would easily let ψ(t ) escape that attractor. The "deep" attractors, on the other hand, hardly change their location (see Figure 4A for the evolution of the attractors). This feature of the considered circuit ensures that the attractors during the delay period are not sensitive to realistic changes in input magnitude (such as ramping of rates from the previous layer). As the strength I 0 of the feedforward input increases several-fold, the profile of Frontiers in Computational Neuroscience www.frontiersin.org F f (ψ) eventually diverges from being close to a rescaled version of F nf (ψ) (Figure A4 in Appendix).

Synaptic facilitation slows down substantially the drift of the bump
How does the drift velocity depend on synaptic facilitation? To answer this question we have derived an analytical expression for the average of the mean square velocity of the drift. One finds that with See Appendix for the derivation and the explicit formulas for g f (θ c , m 0 , t f ).
In contrast to what we found in the absence of facilitation, the drift velocity depends on the spatial mean m 0 of the "bump," due to non-linear dependency of the synaptic input on the presynaptic activity. Note also that the time constant τ of the rate dynamics does not affect the drift velocity given by Eq. (9). This is because we have assumed here that this dynamics is much faster than the facilitation dynamics. This factor contributes substantially to the reduction in the velocity of the drift in presence of facilitation.
The ratio between the mean drift velocity without and with facilitation Figure 5 as a function of θ c and m 0 for values of U = 0.05, t f = 1 s, and τ = 10 ms. One sees that, facilitation may result in more than 100-fold slow down of the drift (Figure 5).

DISCUSSION
Shortly after continuous attractors networks were proposed as a model of processing sensory stimuli with continuously varying features, the issue of structural instability of this model was identified (Tsodyks and Sejnowski, 1995;Zhang, 1996;Seung et al., 2000;Wang, 2001;Renart et al., 2003). In particular, when the network is not perfectly tuned, the continuous attractor is splitting into a small number of discrete attractors. This issue is especially relevant in case of WM networks, where information about the sensory cue is carried in the activity pattern of the network. In the case of ring model considered in this paper, the location of the activity bump encodes the stimulus feature and thus has to be stable on the time-scale of several seconds relevant for WM. The problem of structural stability is therefore of a quantitative nature -how long does it take for the bump to significantly drift away from its initial position determines the stability of the memory.
In this work, we considered a perturbation of the ring model obtained by adding a random component to the connectivity pattern that breaks its rotational invariance. We were then able to derive an analytic expression for the drift velocity of the bump under the condition that the perturbation is weak enough so that its shape is not significantly altered. This solution shows that, unless parameters are chosen such that synaptic efficacies are significantly more fine-tuned than what is necessary to preserve bump attractor dynamics, the bump may drift far away from the cue position very quickly, at a speed inconsistent with experimental observations. Whether such a precise tuning can be achieved biologically is an open question. It was addressed theoretically in (Renart et al., 2003) where ring model with inhomogeneities in neuronal thresholds was considered. The authors proposed a mechanism of synaptic tuning based on adiabatic plasticity, which significantly slowed down the drift velocity of the bump to realistic levels. We believe however that the ability of plastic cortical circuits to precisely tune themselves for one particular task can be limited due to constant bombardment by spiking inputs from other cortical areas that are not necessarily related to the particular WM task.
Motivated by recent experimental studies of synaptic connectivity in the PFC (Wang et al., 2006), we considered the effect

Frontiers in Computational Neuroscience
www.frontiersin.org FIGURE 5 | The ratio ψ 2 nf / ψ 2 f of mean square velocities without and with facilitation as a function of the bump width θ c and the average firing rate m 0 . The black dot indicates the same parameters (θ c ,m 0 ) as in Figure 3B. Note that the average velocity of the drift is slower for lower m 0 (and thus lower external input I 0 ). of short-term synaptic facilitation on the ring model, and found that it significantly slows down the drift velocity, rendering the network suitable for WM. This solution eliminates the need for an additional fine tuning of the network connections. The reason facilitation makes the memory more stable is that when the activity bump is triggered by the input at a certain position, recurrent connections between the corresponding neurons facilitate and hence make it harder for the bump to drift away. This type of stability is complementary to the one demonstrated in (Mongillo et al., 2008), where it was shown that WM with facilitation is stable to brief suppression of spiking due to external perturbation.
In the present work, we have focused on the case where heterogeneities are in the connectivity pattern. However, our analysis can be extended to other sources of heterogeneities, e.g., in the intrinsic properties of the neurons or in the background inputs. For instance, in the latter case a similar approach as above shows that facilitation in the recurrent synaptic interactions can also slow down the drift velocity by roughly two orders of magnitude as it is the case when the heterogeneities are in the connectivity pattern (see Appendix.) Our results have been derived in the framework of a singlepopulation effective ring model (Amari, 1977;Ben-Yishai et al., 1995). In this framework excitatory and inhibitory neurons are lumped into one population of neurons and excitatory and inhibitory interactions are described by one effective interaction function with a "Mexican hat" profile. As a consequence, when facilitation is introduced it affects excitation as well as inhibition. However, our approach can be generalized straightforwardly to a two population rate models in which the four (EE, EI, IE, II) interactions are described separately. One can show that in this case, the drift velocity is also reduced considerably when the synapses of the excitatory neurons exhibit facilitation (Figure A5 in Appendix).
Finally, we would like to point out that in our analysis we assumed that rate dynamics is controlled by fast synaptic currents having the time-scale of several milliseconds, such as the case of AMPA-mediated glutamate current. At least some of the synaptic transmission between pyramidal cells in the PFC is mediated by NMDA receptors, which are much slower (τ ( 50-100 ms; see, e.g., Myme et al., 2003). Depending on the dominance of NMDA-mediated transmission, the bump stability will be slowed down proportionally to NMDA time constant, as can be seen, e.g., from Eq. (4) above. Whether PFC is characterized by increased prevalence of NMDA receptors compared to other cortical areas is controversial: while increased levels of mRNA for NMDARs were measured in a human postmortal immunohistological study (Scherzer et al., 1998), direct electrophysiological recordings found equal NMDA/AMPA current ratios in PFC and the visual cortex (Myme et al., 2003). What is the dominant mechanism for WM stabilization needs to be addressed in future experimental studies.

NEURAL DYNAMICS
We approximated the discrete system (1) by While all analytic derivations (see Appendix) were performed using this continuous model and also system (7), all simulations were performed using the appropriate discrete models with N = 720 mini-columns. This corresponds to a spatial resolution of half a degree. Throughout the paper we used the following constants: the time-scale of the firing rate dynamics was assumed to be τ = 10 ms, while the time-scales of synaptic facilitation and depression were assumed to be t f = 1 s, and t d = 0.1 s, cf (Barak and Tsodyks, 2007). We also assumed that the baseline level of the utilization of synaptic facilitation is U = 0.05; this corresponds to a maximal-possible 20-fold facilitation of an effective strength of a synapse. The magnitude of the synaptic heterogeneity used in the simulations was ε = 0.5.

SYNAPTIC PARAMETERS AND BUMP ATTRACTORS
We have used the following parameters for the model with facilitation and depression (6): J 0 = −10, J 1 = 8, I 0 = 10. With these parameters, and the other constants defined as above, the unperturbed system has a stable family of bump attractors (3) with m 0 ≈ 4.1 Hz, and θ c ≈ 90˚. We have chosen the parameters for the other two models, the simplified facilitation model (7) (J 0 = −8.04, J 1 = 3.48, I 0 = 18.2) and for the ring model (1) (J 0 = −10, J 1 = 2.13, I 0 = 40.4) so that they resulted in a stable bump attractor (3) of approximately the same width and height as in the model (6).

SCALING OF THE SYNAPTIC HETEROGENEITY
We have chosen the scaling ε in Eq. (2) as ε = 0.5 for the model without facilitation. We have chosen the appropriate scaling ε f in the other two models with facilitation so that ε f J f 1 = 0.5 J nf 1 ,

Frontiers in Computational Neuroscience www.frontiersin.org
where J f 1 and J nf 1 are the values used for the parameter J 1 in Eq. (2) for the model with and without facilitation respectively. We chose this particular scaling in order to compensate for different values of J 1 in different models. For example, if we assume that spatial fluctuations in the connectivity result from anatomical randomness in the number of connections between mini-columns, the amplitude of fluctuations will scale with the square root of the average number of connections, which in turn will be proportional to J 1 . We checked that choosing the fixed value of ε = 0.5 does not change the results significantly.

EVOLUTION OF THE BUMP CENTER
For a function q(θ ,t ) we denote its first Fourier coefficient as Z (t ) = q 1 e iψ(t ) = π −1 π −π dθe iθ q(θ , t ) and its phase as ψ(t ). For models without facilitation, we used q = m to compute the center of bump ψ(t ); for the model with facilitation and depression the effective synaptic input q(θ ,t ) = u(θ,t )x(θ ,t )m(θ,t ) was used; for the approximated model (7), q(θ ,t ) = u(θ ,t )m(θ,t ) was used. We used singular perturbation analysis to derive the evolution Eqs. (4) and (8) describing the dynamics of ψ(t ); these derivations, as well as the derivations of mean square velocities are provided in Appendix.

ACKNOWLEDGMENTS
Vladimir Itskov was partially supported by NSF DMS 0967377 and the Swartz Foundation, David Hansel was supported by ANR-09-SYS-002-01, and Misha Tsodyks by the Israeli Science Foundation. The work of David Hansel and Misha Tsodyks was also done in the framework of the France-Israel Lab. for Neuroscience (FILNe).

The non-perturbed ring model
We begin with some basic facts about the Ring model (Amari, 1977;Amit, 1989). Recall that the unperturbed (i.e., ε = 0) discretized Eq. (1) in the main text approximates its continuous analog where [y] + = max(0,y) denotes the threshold-linearity and J (θ ) = 1 2π J 0 + 1 π J 1 cos(θ ). Denote then it can be easily shown that We model the delay period by assuming that during the delay period the feedforward input is constant: I (θ) = I 0 = const. It is well-known that if J 1 > 1 then there is a continuous family of steady states ("bumps"), centered at an arbitrary angle ψ: where the bump's half-width θ c is related to the synaptic parameters via while the Fourier coefficients m 0 and m 1 can be found from Note that here the width of the bump depends only on J 1 , while the amplitude of the bump is proportional to I 0 for large enough positive I 0. It can be shown that for J 1 < 1 the only fixed point is straight line (no bump solution), and that for J 1 > 1 the only steady state is the "bump" -no stable straight-line solution. It particular, system (1) does not have a multi-stable regime with constant feedforward inputs (Amit, 1989).

The dynamics of ψ(t) in the presence of synaptic inhomogeneity
It follows from (2) that the time-dependent coefficients ψ(t ) and m 1 (t ) satisfy the following equations: where J (θ, θ ) = 1 2π J 0 + 1 π J 1 cos(θ − θ ) + εn(θ, θ ). As simulations suggest, during the delay period with constant feedforward input, the evolution of m(θ,t ) can be described by where m 0 (θ ) is the steady state (3) centered at zero.
Using (5) and keeping in mind that (2) implies that π −π dθ sin(θ − ψ)m(θ ) = 0, and also that π −π dθ m(θ )( 1 2π J 0 + 1 π J 1 cos(θ − θ )) = J 0 m 0 + J 1 m 1 cos(ψ − θ), we obtain the following: Similarly, we obtain Assuming the approximation of small ε we can see that the dynamics of m 0 , m 1 , and θ c are on a faster timescale than the dynamics of ψ. Therefore, for the purpose of understanding the dynamics of ψ(t ), we can approximate m 1 (t ) −π m 0 (θ) cos θdθ , is the first Fourier component of the bump attractor 1 m 0 (θ) of the unperturbed (ε = 0) ring model with constant feedforward input I 0 . Taking into account the approximation m(θ,t ) = m 0 (θ − ψ(t )) + O(ε) we obtain the following one-dimensional dynamics of ψ(t ): This in particular means that the attractors of the dynamics of ψ(t ) do not depend on the scaling ε of the "noise" n(θ,θ ) as long as ε is small enough. These attractors can be found as the zeroes of F nf (ψ) at which F nf (ψ) < 0.

Computation of the mean square velocity without facilitation
Given a periodic function function n(θ,θ ) we define Furthermore, we assume that the circle is discretized into N equal size bins with centers at θ 1 ,. . .,θ N , and that the values of n(θ i ,θ j ) is independently identically distributed, moreover n(θ i , θ j ) = √ N 2π n ij , where n ij are independent random variables with zero mean and unit variance. Note that the scaling of n(θ i ,θ j ) is chosen so that for large N and approximating the integrals by sums we obtain n ij m j in agreement with Eqs.
(1) and (2) in the main text. Lemma 1. Under the above assumptions and in the limit of large N, the following identity for the variance of F f,g holds: 1 Here we always assume that m 0 (θ) is centered at θ = 0.
Frontiers in Computational Neuroscience www.frontiersin.org proof. Observe that g θ j n i+k,j+k , thus denoting f i = f(θ i ), g j = g (θ j ), and also using that n i,j n k,l = δ ik δ ij we obtain this approximates the integral (12) in the limit of large N. Now we can use this computation with f(θ) = sinθ and g (θ ) = m 0 (θ ) to calculate the average square velocity (11) as Note that the mean square velocity depends only on the half-width θ c of the bump attractor m 0 (θ ) and thus, thanks to identity (4), does not depend on I 0 , or J 0 . Moreover, g nf (θ c ) def = N F 2 nf is well approximated by a linear function y(θ c ) = √ 2/5θ c (see Figure 2F the main text) for most values of θ c and is bounded by √ 3.

COMPUTATIONS FOR THE SIMPLIFIED FACILITATION MODEL
We use the following system, whose solutions approximate the solutions of the system with both facilitation and depression ( Figure 3B in the main text) where u(θ ,t ) is the facilitation variable, and U is the constant that determines the "depth" of facilitation (U = 0.05 in all simulations.) As before, we assume that J (θ, θ ) = 1 2π J 0 + 1 π J 1 cos(θ − θ ) + εn(θ , θ ). We denote q(θ, t ) def = u(θ, t )m(θ , t ) and define the "location" ψ(t ) of the "bump" as the phase of the first Fourier transform of q(θ,t ): Because the time constant of the rate dynamics (τ = 0.005 s throughout this paper) is much smaller than the time constant (t f = 1 s) of the dynamics of the facilitation variable u(θ), we can assume separation of time-scales, i.e., assume that the "fast" variable m(θ ) is always near its attractor that is determined by the dynamics of the "slow" variable u. This implies that −m + π −π dθ J (θ , θ )q(θ ) + I 0 + ≈ 0, and thus Here Frontiers in Computational Neuroscience www.frontiersin.org are the attractor 2 of the unperturbed (ε = 0) system (13). Using this and also the identity Similarly to the case of the drift with no-facilitation [see The Dynamics of ψ(t ) in the Presence of Synaptic Inhomogeneity, (8)- (10)] it can be shown that the variables q 0 (t ), q 1 (t ), and θ c (t ) evolve on much faster time-scale than ψ(t ). Since we are interested only in the dynamics of ψ(t ), we therefore can consider the evolution of ψ(t ) assuming that those variables are already near their respective attractors, i.e., Using (16), observing that π −π dθ sin(θ − ψ)q(θ) = 0, using the approximations (14) and also that m 2 ≈ (m 0 (θ − ψ)) 2 + 2εm 0 (θ − ψ)η(θ,ψ) + O(ε 2 ) we can discard the integrals of odd functions, we then derive and thus obtaiṅ where q 0 1 = π −1 π −π dθ cos(θ)q 0 (θ) and the functions q 0 (θ) = u 0 (θ)m 0 (θ ) and m 0 (θ) are the attractors of the unperturbed (ε = 0) system (11).

Frontiers in Computational Neuroscience www.frontiersin.org
While above formulae provide a method for computing the mean square velocity, what we really need to know is the dependence of the mean square velocity on the "observable" parameters m 0 and θ c . For this reason we need to find the dependence of the parameters J 1 , and q 0 1 on those observables. We consider the attractor m 0 (θ) and u 0 (θ) (13) of the unperturbed system (11) in the case of constant I (θ) = I 0 = const. In this case J (θ, θ ) = J (θ − θ ) = 1/2πJ 0 + 1/πJ 1 cos(θ − θ ) and thus where q 0 0 def = 1 2π ∫ π −π m 0 (θ)u 0 (θ)dθ, q 0 1 def = 1 π ∫ π −π cos(θ)m(θ )u(θ )dθ , and j i def = q 0 1 J i . These yield the following system of equations: We observe from the first equation that q 0 0 = −J −1 0 j 1 cos (θ c ) + I 0 , plug this into the second equation and then obtain 2π j 1 cos (θ c ) From here we obtain the following: , This furnishes the explicit relationship between J 1 , q 0 1 on one hand and the observables m 0 and θ c on the other. Thus the formula (2.1) combined with where the function can be computed from the following formulae: Frontiers in Computational Neuroscience www.frontiersin.org = 1 2π J 0 + 1 π J 1 cos(θ) is the unperturbed connectivity of the Ring model and ν(θ ) is the heterogeneity in the feedforward inputs.

Frontiers in Computational Neuroscience www.frontiersin.org
Similarly to Lemma 1, it can be shown that if we assume that the circle is discretized into N equal size bins with centers at θ 1 , .., θ N , and the values of ν i = ν(θ i ) are independently identically distributed with zero mean and unit variance, then approximating the integrals by sums in the limit of large N one obtains where for any continuous function f (θ). Therefore, by taking f (θ ) = sin θ t −1 f + 2m 0 (θ ) we obtain that where the function can be evaluated using Eqs. (17), (18), and (22).
Frontiers in Computational Neuroscience www.frontiersin.org FIGURE A2 | Discrete set of attractors of the Ring model with synaptic heterogeneity does not approximate a continuous attractor. Black trace indicates the average (over 200 trials) ratio of the total circle length covered by 3˚intervals around each putative attractor. Gray area indicates 1 SD around the average. When the rotational invariance of the interactions is perturbed the set of discrete attractors does not provide a good approximation to a continuum ring attractor manifold even in the limit of large N. To show that we first found the points ψ 0 on a circle that satisfy F nf (ψ 0 ) = 0 and F nf (ψ 0 ) < 0. These points are not guaranteed to be the exact actual centers of the bump attractors of the system (1), however every center of the attractor bump of (1) is in an O(ε) -neighborhood of one such point. We therefore overestimate the actual number of attractors. The mean number of putative attractors slowly grows, with increasing N, although for a realistic number of mini-columns it is rather low. However, it is not the number of attractors per se that makes it possible to approximate the continuous attractor by a collection of discrete attractors: it is rather the distribution of how the attractors are placed on the circle that matters. To quantify the proportion of the entire circle covered by putative attractors, we draw an interval around each of them (3˚in our simulations) to determine the percentage of the total area covered. Numerical study revealed that their coverage either does not grow or grows slowly for the range of tested N; it only reaches up to 9.9% for N = 1000.

FIGURE A3 | (A)
Evolution of m(θ , t ) in the full facilitation-depression model with the same matrix of heterogeneities n ij as in Figure 2A. Black curve indicates ψ(t ). For the first 3 s (brown bar at the bottom) the network is presented with the feedforward cue input, while after the cue the network is presented with constant feedforward input. (B) Evolution of m(θ ,t ) according to the simplified model with facilitation with the same matrix n ij as in (A) and Figure 2A. Black curve indicates ψ(t ). For the first 3 s (brown bar at the bottom) the network is presented with the feedforward cue input, while after the cue the network is presented with constant feedforward input.
Frontiers in Computational Neuroscience www.frontiersin.org FIGURE A4 | Similarity of F nf ( ψ) and F f ( ψ). We have used the cosine of the angle as a measure of similarity between the functions F nf (ψ) and F f (ψ). Note that − 1 ≤ cosγ ≤ 1. The value cosγ = 1 implies that F f (ψ) = constF nf (ψ). (A) Histogram of cosine angles for I 0 = 10. For each instance (N trials = 1000) of the synaptic heterogeneity n ij the value of cosγ was computed. For comparison, Figure 3C had cosγ 0.936. (B) Same angles as in panel (A) were computed now as a function of the strength I 0 (5 ≤ I 0 ≤ 35). Mean of cosγ is the black trace, while the error bars mark 1 SD.
Frontiers in Computational Neuroscience www.frontiersin.org