Re-encoding of associations by recurrent plasticity increases memory capacity

Recurrent networks have been proposed as a model of associative memory. In such models, memory items are stored in the strength of connections between neurons. These modifiable connections or synapses constitute a shared resource among all stored memories, limiting the capacity of the network. Synaptic plasticity at different time scales can play an important role in optimizing the representation of associative memories, by keeping them sparse, uncorrelated and non-redundant. Here, we use a model of sequence memory to illustrate how plasticity allows a recurrent network to self-optimize by gradually re-encoding the representation of its memory items. A learning rule is used to sparsify large patterns, i.e., patterns with many active units. As a result, pattern sizes become more homogeneous, which increases the network's dynamical stability during sequence recall and allows more patterns to be stored. Last, we show that the learning rule allows for online learning in that it keeps the network in a robust dynamical steady state while storing new memories and overwriting old ones.


INTRODUCTION
Memories are based on synaptically induced changes of intrinsically generated brain activity. Examples for such intrinsic activities are the recurring sequences of neuronal activity patterns in the hippocampus (Wilson and McNaughton, 1994;Nadasdy et al., 1999;Lee and Wilson, 2002;Davidson et al., 2009); see Buhry et al. (2011); Wikenheiser and Redish (2012) for review. Classically, these sequences were interpreted as replaying previous activity patterns. Meanwhile they have been found to also preplay future behavior (Diba and Buzsaki, 2007) or reverse replay past behavior (Foster and Wilson, 2006;Diba and Buzsaki, 2007). More recently, it has been shown that they even predict future behaviors (Gupta et al., 2010;Tonegawa, 2011, 2013;Pfeiffer and Foster, 2013). The diversity of these sequences has generated an equally diverse set of possible functional explanations, ranging from memory consolidation (Nakashiba et al., 2009;Jadhav et al., 2012) to memory deletion (Hoffman et al., 2007) and path planning (Azizi et al., 2013;Ponulak and Hopfield, 2013).
In this paper, we will specifically address one variant of the memory consolidation and deletion hypothesis, viz. whether these sequences can be used to drive a learning rule that allows for efficiently re-encoding memories and thereby solve the problem of catastrophic forgetting. The basic idea of this hypothesis is that new memories might be encoded by assemblies that are not optimally sparse and thus allow secure retrieval. A retrograde learning rule that propagates long-term depression (LTD) will be shown to be able to reduce these assemblies toward a level of sparseness, which is optimal from the retrieval point of view and, at the same time, allows the network to operate in a stable regime of online learning, in which old memories are overwritten by new ones. This learning rule operates on a time scale that is slower than the fast time scale of initial imprinting. As a result, new memories will be represented by a larger number of neurons (and synapses) than old memories, which are encoded more efficiently and will eventually be forgotten.

MATERIALS AND METHODS
Here, we investigate memory consolidation and retrieval in a network which stores sequential associations of binary patterns (Nadal, 1991;Gibson and Robinson, 1992;Hirase and Recce, 1996;Leibold and Kempter, 2006;Kammerer et al., 2013). As in these previous papers, the dynamics is formulated in discrete time. The individual time steps can be biologically interpreted as the cycles of a collective network oscillation (e.g., hippocampal ripple oscillations; Maier et al., 2011). The employed network model is identical to that described in Medina and Leibold (2013) and lays particular emphasis on handling heterogeneous pattern sizes, i.e., the number of active neurons at any time may be different. Formally, this is expressed by the vector of coding ratios where M k = f k N is the number of active neurons in the k-th binary pattern ξ k ∈ {0, 1} N , N is the number of neurons in the network, and the indices k = 0, . . . , P represent each of the P + 1 patterns that are connected by the P pairwise directed associations. Unless mentioned otherwise, the coding ratios f k are randomly drawn from a gamma distribution (to avoid negative patterns sizes) with mean coding ratio φ 0 and standard deviation σ φ .
The associations between the individual patterns of the sequence ξ 0 , ξ 1 , ξ 2 , . . . are stored in the synaptic weight matrix, which is chosen according to a clipped Hebbian rule (Willshaw et al., 1969): a synapse from neuron j to i has weight s ij = 0 only if a spike of neuron i never follows one of neuron j in any of the P associations, otherwise s ij = 1. In addition to this Willshaw rule, we also allow for a morphological connectivity, i.e., a synapse from neuron j to neuron i only exists with probability c m (Gibson and Robinson, 1992;Leibold and Kempter, 2006). This implies a second set of binary synaptic variables w ij , with w ij = 1 if the respective synapse exists and w ij = 0 otherwise. For such a learning rule and heterogeneous pattern sizes, it was shown in Medina and Leibold (2013) that the probability c of a potentiated synaptic connection (s ij = 1) equals In this and related models, the choice of binary synapses facilitates the mathematical tractability of the theory, although, in biology, synaptic weights generally follow long-tailed distributions (Song et al., 2005). The long tail, however, allows one to subdivide synapses into weak and strong ones, which could be considered as being approximated by a noisy binary approach.

SYNAPTIC METAPLASTICITY
According to Willshaw's learning rule, a synapse is in the potentiated state (s ij = 1) if it connects two neurons that fire in sequence at least once. However, some neuron pairs may fire in sequence multiple times if they are part of the representation of consecutive patterns more than once. Although disregarded so far, the number of times a neuron pair fires in sequence is important since it tells us how many associations rely on this connection being potentiated. In order to conserve this information while using binary synapses, we consider synaptic meta levels with serial state transitions, a model similar to that proposed in Amit and Fusi (1994); Leibold and Kempter (2008). A state diagram of our plasticity model is shown in Figure 1A. After a synapse has been potentiated once, every further occurrence of sequential firing in the sequence activation schedule increments the meta level by one, leaving the synaptic weight s ij unchanged. Figure 1B shows the distribution of synaptic states in the network for three different pattern loads P. At higher loads, synapses are more likely to reach higher meta levels.

NETWORK DYNAMICS
Following Medina and Leibold (2013), neurons are modeled using a simple threshold dynamics that translates the synaptic matrix J ij = s ij w ij into an activity sequence: a neuron i fires a spike at cycle t + 1 if its postsynaptic potential h i (t) = N j = 1 (w ij s ij − b) x j (t) at time t exceeds the threshold θ. Here, x j (t) ∈ {0, 1} represents the binary state of neuron j at time t and b denotes the strength of a linear instantaneous feedback inhibition (Hirase and Recce, 1996;Kammerer et al., 2013). The negative feedback constant is chosen b = c for all subsequent simulations (Medina and Leibold, 2013). To save computational time, most of the upcoming results are derived in a mean field approximation. To this end, in each time step, neurons are subdivided into two populations: an On population which is supposed to fire according to the sequence schedule and an Off population which is supposed to be silent (Leibold and Kempter, 2006). The number of active neurons at time step t can thus be divided into a number m t of correctly activated neurons (hits) and a number n t of incorrectly activated neurons (false alarms). Using these conventions yields the mean field dynamics (Medina and Leibold, 2013) and (z) ≡ [1 + erf(z/ √ 2)]/2 denoting the cumulative distribution function of the normal distribution. Here, the mean number of synaptic inputs μ ≡ h(t) and the variance σ 2 ≡ h(t) 2 − h(t) 2 , for the On population, are with ς = c/c m ; see Equation (2). The analog expressions for the Off population are Finally, the variability coefficient V 2 ς used in Equations (7) and (9) is given by

RETROSYNAPTIC LTD
The replay model in Medina and Leibold (2013) assumed the synaptic matrix J ij to remain constant. Synaptic plasticity may, however, take place on a slower time scale and change network dynamics between consecutive replay events. In this paper, we investigate the idea that replay evokes a retrosynaptic LTD to achieve a more efficient utilization of synaptic resources, thereby increasing storage capacity. We therefore assume that the stored patterns are initially too large and, over time, are reduced by learning such that the coding ratios f k converge to an optimal value. This idea is implemented as shown in Figure 2. During replay of association ξ t → ξ t+1 (Figure 2A), active cells that receive excessive synaptic input send a retrosynaptic LTD signal to all presynaptic cells which were active in the previous time step. The emission of such a signal is modeled as a stochastic process in which the emission probability ψ increases with the number h of synaptic inputs received by the cell like Here the parameter h 0 defines a minimal pattern size M = h 0 /c m beyond which plasticity signals can occur. Its choice determines the optimal memory capacity of the network, as this minimal pattern size can become a stable fixed point of the dynamics of pattern sizes.

FIGURE 2 | Retrosynaptic LTD. (A)
During sequence replay, excessive depolarization h at time t + 1 triggers a retrosynaptic LTD signal that is propagated with probability ψ(h) to all presynaptic cells that were active at time t (black squares denote hits, gray squares denote false alarms). (B) Each cell receiving an LTD signal responds with probability q by decrementing the state of all its input synapses from cells that fired at time t − 1 and all its output synapses to cells that fired at time t + 1.
To combine this learning rule with the mean field network dynamics, we have to find an expression for the probability P s that a presynaptic cell receives at least one retrosynaptic signal. The number of inputs received at time t + 1 is on average μ On for an On cell, and μ Off for an Off cell. Thus, for an On cell, we have a probability 1 − ψ(μ On ) c m m t+1 of receiving no retrograde LTD signal from any active cell in the On population, and a probability 1 − ψ(μ Off ) c m n t+1 of receiving no retrograde LTD signal from the Off population. Similarly, for an Off cell, these probabilities are 1 − ψ(μ On ) c m m t+1 and 1 − ψ(μ Off ) c m n t+1 , and thus P On As illustrated in Figure 2B, upon receiving one or more LTD signals, with probability q, the presynaptic cell decrements by one the meta state of all its input synapses from the m t − 1 + n t − 1 cells that were active in the previous time step, as well as the meta state of all its output synapses to the m t + 1 + n t + 1 cells that are active in the following time step. Each synapse in the subset of decremented synapses therefore takes part in one association less. As a result, the cell no longer takes part in the neural representation of pattern ξ t , although it might still be spuriously activated during replay. On average, the size of pattern ξ t is therefore updated according to where = 1 − q m t M t P On s .

ONLINE LEARNING
If associations are stored in the network one after another (online learning), new memories will overwrite old memories (Nadal et al., 1986;Amit and Fusi, 1994), which is also known as palimpsest learning, and thereby the connectivity between On neurons of old associations is increasingly diluted. The remaining signal strength of an association k depends on the probability y k = 1 − p(0|k) that a synapse is not in state zero, given that it participates in association ξ k → ξ k + 1 (i.e., it connects neurons that fired in sequence during the storage of that association). To account for overwriting, the dynamical Equations (6) and (7) are modified as follows In our model framework, the synaptic connectivity is changed in two ways. First, during imprinting of a new association, synapses increment their meta state level. Second, synaptic states are decremented via retrosynaptic LTD. To capture these changes, we define the average state distribution ρ(s), which describes the probability that an arbitrarily chosen synapse is in state s, and thus c/c m = 1 − ρ(0).

Effect of synaptic potentiation on state distribution
If a new association is added that links pattern ξ k to pattern ξ k + 1 , a random synapse increases its state with probability f k f k + 1 and thus the change in the state distribution is and 1I is the unit matrix.

Effect of synaptic depression on state distribution.
Conversely, retrograde LTD is described by the matrix multiplica- and p s , p s are the probabilities that a replay event decreases the state s of a random synapse by 1 and 2, respectively.
To obtain p s and p s , we define the probability p(s, ↓ ) = ρ(s) p s that a synapse is in state s and receives the signal to go down one meta level. Similarly, p(s, ↓↓ ) = ρ(s) p s is the probability that a synapse is in state s and receives the signal to go down two meta levels.
A depression event ↓ during replay of the association ξ k → ξ k + 1 can have two origins: (1) the depression signal ↓ (k + ) that is sent by a neuron of pattern ξ k to its output synapses, and (2) the depression signal ↓ (k − ) the neuron sends to its input synapses. Since a subpopulation of synapses may be part of both the inputs and the outputs of pattern ξ k , a synapse may be depressed twice and thus go down two levels. Since the patterns are statistically independent, both depression events are independent and thus the probability that a synapse is in state s and goes down by two levels upon replay of association ξ k → ξ k + 1 is given by Similarly, the probability p(s, −) that a synapse stays in state s is A synapse either stays in state s, it goes down by one state or goes down by two states, and thus This normalization condition then yields the probability p(s, ↓ ) that a synapse is in state s and is decreased by one, viz.
The probabilities p(s, ↓ (k ± ) ) can be further split up into two non-overlapping subsets of synapses, one (called k) that connects the On populations of association k and another one (calledk) denoting all other synapses. Therefore we have Since the LTD signal ↓ is independent of the synapse state s, we have and in analogy for k − 1. The last terms on the right-hand side are obtained from equations (12) and (13) as follows and What remains to be obtained in Equations (27) and (28) are the conditional probabilities p(s|k) and p(s|k). From heuristic considerations, we approximate Equation (33) assumes that the presence of association ξ k → ξ k + 1 can affect the conditional state distribution p(s|k) in two ways: either it increases the state by one (p(s − 1|k) r k ), or it has no effect on the state (p(s|k) (1 − r k )). The constants r k can be interpreted as the fraction of synapses for which association ξ k → ξ k + 1 contributes to the next meta level.
We will refer to them as the remaining memory strength of association k.
Combining equation (33) with we can recursively compute and in particular provide a connection between r k and y k via

Effect of synaptic depression on signal connectivity
In addition to changes in the state distribution ρ that describes the noise connectivity during associations, retrosynaptic LTD also specifically influences the synapses between the On populations according to y l = 1 − p(0|l). For more recent associations y l will be large, whereas for older associations y l will be small. The change in y l that results from retrosynaptic LTD while replaying association ξ k → ξ k + 1 is computed from the change in p(0|l), For associations l / ∈ {k − 1, k} the conditional probabilities of depression are independent of the association l, i.e., p(↓ |s, l) = p(↓ |s) = p s and p(↓↓ |s, l) = p s . The conditional state occupancies are obtained via the r-factors from Equation (36) as p(s|l) = p(s − 1|l) r l + p(s|l) (1 − r l ).
For associations k − 1 and k, synapses can only experience bychance LTD from one of the two signals (association k − 1 from ↓ (k + ) and association k from ↓ (k − ) ), since LTD from the other signal would result in a decrease of the pattern size (with undiluted connectivity). Likewise there is no double decrement for these associations. As a result, the update rules for these associations are and, replacing k − 1 by k, The probabilities p(1|k, k − 1), p(1|k, k − 1) and p(1|k, k − 1) in Equations (38) and (39) can be obtained in analogy to due to statistical independence of the patterns.

Effect of synaptic depression on subthreshold variance
The dynamics of sequence replay not only depends on the mean connectivities c and c m y k but also on the second moment of the connectivity matrix as captured by V 2 ς from Equation (10). Retrosynaptic LTD will also affect this second moment. As an approximation, we again use the r-factors from Equation (36), which are an estimate of the fraction of presynaptic On neurons that contribute to the meta level of association k. Thus, we can replace the coding ratio f k − 1 in Equations (2) and (10) by the diluted coding ratio r k − 1 f k − 1 and obtain Since the definition of the r-factors in Equation (33) implements only an approximation, the two ways of computing the mean connectivities via c/c m = 1 − ρ(0) and c/c m = 1 − P k = 1 (1 − f k f k − 1 r k − 1 ) are slightly different. To achieve numerical robustness we obtain ρ(0) by applying Newton's method to solve the implicit Equation

RETROSYNAPTIC LTD DURING SEQUENCE REPLAY SPARSIFIES LARGE PATTERNS
The mean field description for the pattern size changes from Equation (14) can be interpreted as a dynamical system itself, since it constitutes a discrete-time iterated map on the pattern sizes. The time scale of this dynamics is slower than the time scale of sequence replay since, during the replay of a sequence, the pattern sizes change only by a small amount. Figures 3A,B show the temporal evolution of the sizes of some example patterns and of the full distribution of pattern sizes for q = 0.1 that results from the mean field Equation (14). The simulations show that the pattern sizes converge to a common fixed point and, as a result, the pattern size distribution becomes delta-like. For such homogeneous pattern sizes the memory capacity is maximized (Medina and Leibold, 2013).
To more systematically analyze plasticity on the slow time scale, we revisit dynamical Equation (14) and interpret it as a one-dimensional iterated map M t → M t . Figure 3C which means that the minimum of the iteration function M for M > h 0 /c m is smaller or equal h 0 /c m . The critical value q c is the smallest value of q for which condition (43) is fulfilled and is indicated by the kink of the graphs in Figure 3E. For larger q the iterated map can produce pattern sizes below h 0 /c m , which are then marginally stable fix points but the resulting pattern sizes may be too small for successful replay. The critical q c is not universal and depends on parameters. Most importantly, it decreases with c m and a ( Figure 3F). The critical value remained above a few percent for a wide range of parameters. Specifically, in sparsely connected networks (c m 1), the choice q ≈ 0.05 is generally subcritical and thus allows for an optimal storage capacity.
The dynamics of pattern sizes is paralleled by a dynamics of the mean network connectivity from Equation (2); Figure 3D. A reduction of the pattern sizes leads to a corresponding decrease in connectivity. The rate of this decrease is higher for higher values of q. For subcritical values of q (0 < q ≤ q c ) the average connectivity converges to a fixed point that is independent of q. For supercritical q (1 > q > q c ) the connectivity converges to a lower fixed-point connectivity, indicating a substantial fraction of too small pattern sizes. In the extreme case q = 1 all synapses are depotentiated and the connectivity converges to 0.

PLASTICITY DURING SEQUENCE REPLAY INCREASES DYNAMIC STABILITY
The changes in connectivity due to retrosynaptic LTD are paralleled by changes of fast dynamics of sequence replay according to Equations (3) as exemplarily illustrated for three different plasticity stages (initial, after 5 and 10 iterations) and firing thresholds in Figure 4. As plasticity proceeds and the pattern size distribution in the sequence becomes more homogeneous, the activity fluctuations during replay are reduced and, eventually, allow for the whole sequence to be retrieved successfully.
In the example of Figure 4, learning extends the range of thresholds under which the network successfully replays the full sequence if the network was perfectly initialized (m 0 = M 0 , n 0 = 0). For a large threshold (e.g., θ = 55), learning allows for the emergence of ongoing sequence replay in a regime where initially no self-sustained network activity was possible. Before any plasticity takes place, the pattern sizes are highly inhomogeneous and the network falls silent almost immediately. After 5 plasticity iterations, fluctuations are reduced and the network is able to successfully retrieve more items in the sequence. Near perfect pattern retrieval (m t /M t = 1 and n t /(N − M t ) = 0) is made possible after 10 iterations. Similarly, for low thresholds (e.g., θ = 25), replay initially drives the network into an epileptic state (m t /M t ≈ n t /(N − M t ) ≈ 0.5). The reduction of pattern sizes due to learning, again, allows for ongoing sequence replay.
Defining the retrieval quality (Leibold and Kempter, 2006) as the relative difference between hit ratio and false alarm ratio, allows a better comparison of the replay performance for a large set of parameter choices. Formally this is done via the replay success rate, which is the fraction of runs for which at time t the replay quality t is above 0.5 (Medina and Leibold, 2013). Figure 5A shows the evolution of replay success rates for three plasticity stages and three different memory loads P. Initially, the pattern sizes are large and inhomogeneous, and ongoing sequence replay is not possible. Only for small loads (P = 2500) and for a small firing threshold range (θ ≈ 45), can the first items be retrieved with high probability. As plasticity reduces inhomogeneity and sparsifies the patterns, the range of firing thresholds θ for which the full sequence can be retrieved expands. This is made possible by a decrease in the noise connectivity c, shown in Figure 5B and verified through cellular simulations. In a modified model without synaptic meta states there was no improvement by applying repeated learning steps, since synapses were switched to an inactive state too quickly ( Figure 5C).

ONLINE LEARNING
So far, the initial distribution of pattern sizes was centered at mean values far above the fixed point M = h 0 /c m . However, once the pattern size distribution has reached this optimal value, retrosynaptic LTD will only take place if a new association with an oversized pattern is added into the synaptic matrix. In our model, this can be simulated as a homogeneous sequence with one pattern of size larger than M = h 0 /c m , as illustrated in Figure 6: for Relative activity C Plasticity step π FIGURE 5 | (A) Replay success rate over time for different firing thresholds θ and a sequence of length Q = 100. These plots were obtained using the mean field model, and were verified using cellular simulations. Left to right: increasing plasticity iterations. Top to bottom: increasing pattern load P. The initial pattern size distribution had parameters φ 0 = 0.02 and σ φ /φ 0 = 10%. Other parameters were: N = 10 5 , c m = 0.1. (B) Connectivity decreases as the network sparsifies its stored associations. This plot was obtained by simulating the actual neural network with three different pattern loads P and a randomly generated coding ratio vector φ. The connectivity was calculated both using the mean field equation (2) (blue) and counting the actual number of potentiated synapses (red dots), showing a perfect match. (C) Advantage of metaplasticity (left) over simple binary synapses (right) during retrosynaptic LTD. Blue and red traces indicate hits and false alarms (as in Figure 4) for 0, 5 and 10 learning steps. The bottom row depicts the replay quality of the 100th pattern in the sequence as a function of the number of learning steps. Only with metaplasticity the replay remains stable for many learning steps. low firing thresholds (θ = 26), the excess synaptic drive generated by an oversized pattern initially leads to sequence termination by setting the network into an epileptic state. Plasticity via retrosynaptic signals gradually reduces the size of the problematic pattern, eventually allowing for successful replay of the full sequence. This shows that retrosynaptic LTD in principle makes it possible to integrate new associations into the network, and therefore provides a possible basis for online learning, i.e., the ongoing storage of new memories.
Of course, adding new associations (increasing P) will consequently also increase the mean connectivity c, up to a point at which classically memories can no longer be retrieved (Willshaw et al., 1969;Nadal, 1991;Kammerer et al., 2013). For these large connectivities c the false alarms add considerable synaptic inputs such that a neuron is no longer always able to correctly decide whether it should fire or not. Using our present model of retrograde LTD, however, neurons could detect such over-excitation and may subsequently depress synapses.
To investigate whether this mechanism allows for selforganized sequence replay in a steady state, we set up a simulation in which we add sequences of 7 new patterns before each plasticity episode and monitor the retrieval quality as well as the mean connectivity. The dynamics of the connectivity c and c m y k is thereby simulated according to Section 2.4.
The result of one such simulation is summarized in Figure 7. The simulation starts with an empty network, i.e., all synapses are in state 0. Each time after storing a new sequence, the newest 60 sequences (if already available) are replayed starting with perfect initialization of the first pattern, m = M and n = 0. These replays induce retrosynaptic LTD. Before the network has reached a steady state, replay is generally successful for all sequences ( Figure 7A) and is worse for the last recalled patterns in younger sequences, because there the pattern sizes have not yet converged to their optimum h 0 /c m = 1000. This is because oversized patterns tend to evoke dynamical instabilities that lead to many false alarms and bad replay quality. After the network has reached a steady state, the first of the 60 replays generally fail, whereas the younger sequences can be replayed at high quality ( Figure 7B). Interestingly, the mean connectivity c converges to its steady state more quickly than the replay dynamics ( Figure 7C). The pattern sizes are slightly above their optimum h 0 /c m (Figure 7D; note that each 7th pattern is the final pattern of each sequence and does not shrink according to the learning rule. These final patterns stay at their initial size). Only for the newest patterns the sizes reflect the initial distribution (here a uniform distribution between 1000 and 2000).
The r values that measure the remaining memory strength of an association (see Methods), provide an additional view on the memory capacity of the network; Figure 7E. Their convergence to zero for old memories reflects the memory time scale of the network. Additionally, the approach to the steady state is made visible if one monitors the r value of the oldest memory (r 1 ) over time ( Figure 7F). The convergence of r is much slower than the convergence of the mean connectivity c (Figure 7C), explaining why the replay dynamics further changes long after the mean connectivity has reached its steady state.

DISCUSSION
Fast hippocampal activity sequences have been hypothesized to underlie memory consolidation (Ego-Stengel and Wilson, 2010;Mölle and Born, 2011;Jadhav et al., 2012). On the cellular level, the associated re-encoding of episodic memories can either occur at the synapses between hippocampus and neocortex (Buzsaki, 1996;Frankland and Bontempi, 2005) or within the hippocampus itself. So far, hypotheses for hippocampus-intrinsic consolidation were mainly focusing on synaptic mechanisms (Frey and Morris, 1997;Milekic and Alberini, 2002;Päpper et al., 2011). The present paper provides a mechanistic model of memory re-encoding on the circuit level whereby associations between assemblies of neurons are not strengthened over time, but assemblies are reduced in size to utilize the hippocampal resources more efficiently.
Retroaxonal learning affecting both input and output synapses of a neuron has been suggested to aid stabilization of recent memories previously (Harris, 2008), although only in the context of synaptic potentiation. There, neurotrophins have been hypothesized to constitute a plausible underlying biochemical pathway. Here, we suggest a specific functional role for a retroaxonal spread of depression and have shown that it may allow a network to operate in an online mode where old memories are overwritten by new memories. Moreover, the suggested retrograde LTD predicts that depression in output synapses should be correlated with depression in input synapses. A different mechanism suggested to reduce the overall excitatory drive in a network is synaptic scaling, whereby all synapses of an overexcited neuron undergo LTD (Turrigiano et al., 1998;Watt et al., 2000;Turrigiano, 2008;Savin et al., 2009). Retroaxonal learning is a more content-specific mechanism than synaptic scaling since it only affects synapses that have been active in the recent past and thus generally accounts for longer retention times.
Previous models of online learning (Amit and Fusi, 1994;Fusi et al., 2005;Ben Dayan Rubin and Fusi, 2007;Leibold and Kempter, 2008;Amit and Huang, 2010;Huang and Amit, 2011) usually do not explicitly take into account the network dynamics underlying the induction of plasticity. This paper presents a hypothesis of how LTD can be derived from network dynamics. The initial imprinting of the memories by LTP is still ad-hoc since we assume it to be occurring via extra-hippocampal signals.
Several other theoretical explanations for sequence replay and the sharp-wave ripple state have been suggested. (1) Sequences can be seen as avalanche-like activity patterns that are amplified by dendritic non-linearities (Memmesheimer, 2010;Jahnke et al., 2012Jahnke et al., , 2013. (2) CA1 pyramidal cell spike patterns may be triggered by strong feedforward excitation from CA3 inputs that are temporally coordinated by fast recurrent inhibition (Ylinen et al., 1995;Geisler et al., 2005;Taxidis et al., 2012). (3) The ripple oscillation may result from a network of gap-junction coupled axons (Traub et al., 1999;Traub and Bibbig, 2000;Vladimirov et al., 2013). (4) Sequences may result from a few overlapping attractor states in a recurrent network of neurons (Azizi et al., 2013). So far, these models are hardly evaluated with respect to their memory capacity (although coding capacity was probed in Azizi et al., 2013).
High memory capacities have been found in classical models of memory networks, developed independently of the hippocampal physiology, that suppose neuronal sequences to result from attractor networks with asymmetrically biased synaptic matrices (Dehaene et al., 1987;Buhmann and Schulten, 1988) in discrete time. One major drawback of these classical theories as well as the model presented here is their formulation in discrete time, which makes them hard to connect to cell-physiological properties of pyramidal cells. On the cellular level, sequence replay is most likely associated with the presence of huge precisely timed excitatory and inhibitory synaptic conductances (Maier et al., 2011). Whether and how under such conditions a neuron can fire and, more specifically, can select to fire at one specific oscillation cycle during a ripple, remains to be shown.
Sparsification of the hippocampal code may be an important intermediate step to prepare consolidation of memories in the hippocampal-neocortical loop, since generally storage capacity increases with sparseness (Nadal, 1991;Leibold and Kempter, 2006) and associating large hippocampal assemblies with neocortical states might be too costly. On the other hand, initially large assemblies might have the advantage that new associations can be retrieved more robustly. Optimal sparseness cannot be obtained by translating from one brain area to another via a random connectivity matrix, since then associations get lost as they may fall in the lower tail of the statistical distribution of the number of synaptic connections and thus do not give rise to sufficient excitation in the downstream brain area. Optimally sparse codes, hence, always require additional plasticity rules that carve out the subset of neurons that can fire reliably. The activity-driven increase in sparseness could also explain the prevalence of a few dominant preplay sequences (Dragoi and Tonegawa, 2013) that may provide an easily addressable substrate for future associations. Our model predicts that, once these sequences are connected with a memory item, the internal representation becomes more sparse and the sequences are no longer spontaneously visible. However, they are nevertheless stored within the hippocampal synaptic matrix and can be retrieved upon presentation of appropriate cue patterns.