Adaptation of short-term plasticity parameters via error-driven learning may explain the correlation between activity-dependent synaptic properties, connectivity motifs and target specificity

The anatomical connectivity among neurons has been experimentally found to be largely non-random across brain areas. This means that certain connectivity motifs occur at a higher frequency than would be expected by chance. Of particular interest, short-term synaptic plasticity properties were found to colocalize with specific motifs: an over-expression of bidirectional motifs has been found in neuronal pairs where short-term facilitation dominates synaptic transmission among the neurons, whereas an over-expression of unidirectional motifs has been observed in neuronal pairs where short-term depression dominates. In previous work we found that, given a network with fixed short-term properties, the interaction between short- and long-term plasticity of synaptic transmission is sufficient for the emergence of specific motifs. Here, we introduce an error-driven learning mechanism for short-term plasticity that may explain how such observed correspondences develop from randomly initialized dynamic synapses. By allowing synapses to change their properties, neurons are able to adapt their own activity depending on an error signal. This results in more rich dynamics and also, provided that the learning mechanism is target-specific, leads to specialized groups of synapses projecting onto functionally different targets, qualitatively replicating the experimental results of Wang and collaborators.


INTRODUCTION
It is the current belief that experiences and memories are registered in long-term stable synaptic changes. Long-term plasticity, and in particular Hebbian learning or Spike-Timing-Dependent-Plasticity (STDP), is a form of unsupervised learning that captures correlations in the neuronal input. Hence, their involvement in, for instance, the development of receptive fields (e.g., Song et al., 2000;Clopath et al., 2010) or memory and associations is long-standing knowledge. However, the variety of different longterm plasticity rules , indicates that the precise synaptic prescriptions of long-term plasticity mechanisms remain unclear.
The investigation of the brain wiring diagram known as connectomics has recently made spectacular progress and generated excitement for its perspectives (Seung, 2009). Novel discoveries in molecular biology (Wickersham et al., 2007;Zhang et al., 2007;Lichtman et al., 2008), neuroanatomical methods (Denk and Horstmann, 2004;Chklovskii et al., 2010), electrophysiology (Song et al., 2005;Hai et al., 2010;Perin et al., 2011), and imaging (Friston, 2011;Minderer et al., 2012;Wedeen et al., 2012) have pushed forward the technological limits for ultimate access to neuronal connectivity. The comprehension of this level of organization of the brain (Kandell et al., 2008) is pivotal to understanding the richness of its high-level cognitive, computational and adaptive properties, as well as its dysfunctions.
At the microcircuit level (Binzegger and Douglas, 2004;Grillner et al., 2005;Silberberg et al., 2005;Douglas and Martin, 2007a,b), the non-random features of cortical connectivity have recently raised a lot of interest (Song et al., 2005;Perin et al., 2011). The occurrence of stereotypical connectivity motifs was experimentally demonstrated and, in some cases, accompanied by physiological information on neuronal and synaptic properties (Song et al., 2005;Wang et al., 2006;Silberberg and Markram, 2007;Perin et al., 2011), on activity-dependent short-term and long-term plasticity (Buonomano and Merzenich, 1998) and rewiring (Chklovskii et al., 2004;Le Be' and Markram, 2006). Recent experimental findings obtained in young ferret cortices (Wang et al., 2006) indicate that short-term facilitation and depression correlate to specific connectivity motifs: neurons connected by synapses exhibiting short-term facilitation form predominantly reciprocal (bidirectional) motifs; neurons connected by synapses exhibiting short-term depression form unidirectional motifs. Interestingly, the same overexpression of connectivity motifs has been observed in another brain area, i.e., the excitatory microcircuitry of the olfactory bulb (Pignatelli, 2009).
Earlier work by Giugliano (2012, 2014) attempted to shed light on this correlation between STP and the observed wiring diagram configuration. They demonstrate that all facilitating or all depressing networks, upon receiving the same wave-like stimulation, give rise to the experimentally observed motifs: bidirectional for facilitating synapses and unidirectional for depressing synapses. This was explained both in the context of mean field analysis and microscopic simulations as a frequency-dependent effect. This is a simple consequence of the type of input (wave like) and the choice of the STDP triplet rule (Pfister and Gerstner, 2006). Differently from the classical pair rule, the triplet rule displays a frequency-dependent behavior, which can explain some experimental results (Sjöström et al., 2001): at low frequencies the rule reveals the classic STDP and, given a wave-like input, it results in unidirectional connectivity (Clopath et al., 2010;Vasilaki and Giugliano, 2014). At high frequencies, however, it reveals "classic Hebb" behavior: neurons that fire together, wire together. Hence, the low firing network develops unidirectional connectivity, while the high firing network develops bidirectional connectivity; for details see . However, the observed synaptic development was not associated to any particular type of learning, but was explored as the emerging structure upon receiving a wave like input: what the network learned per se in that context was not clear.
With the present work we aim to complement and extend on Giugliano (2012, 2014). We define a learning model for STP through which a population of neurons can modify its synapses in order to adapt its own activity and then fulfill a given time-varying task. The key idea comes from an optimization perspective: neurons that are able to modify their synapses, for instance making depressing synapses more and more depressing or even turning them into facilitating ones, would allow for much more flexibility and efficacy in signal transmission. A similar argument can be found in Markram et al. (1998a), whereas for earlier but different mechanisms of STP optimization or learning we redirect to Natschläger et al. (2001) and Carvalho and Buonomano (2011).
Then, we construct a typical inverted associative learning problem (Asaad et al., 1998;Fusi et al., 2007;Vasilaki et al., 2009b) where neurons have to learn to respond with high or low frequencies, when presented with the same wave-like input signal. We use this paradigm to show the potential of our model. In particular, not only do we provide an explanation for the correspondence motifs-synaptic properties within the context of learning both STP and STDP (triplet rule) but we also qualitatively capture, for instance, the heterogeneity in synaptic properties observed by Wang et al. (2006).
Moreover, having defined the learning model as a targetspecific mechanism, we are able to obtain variability in the short-term profile of synapses innervating functionally different targets. Finally, we show that the learning model can be reduced to a minimal model where only the time constant of recovery from depression τ rec needs to be learnt in order to obtain neurons firing at high or low frequency. Comparing this finding with the results from Carvalho and Buonomano (2011), we suggest that different parameters of the model describing STP might be related to different types of coding.

SINGLE NEURON MODEL
Each neuron is modeled as in Carvalho and Buonomano (2011): the sub-threshold dynamics of the electrical potential V i of the generic neuron i are described by the equation: where E rev = 30 mV is the reversal potential and g L = 0.1 μS is the leak conductance -both quantities are equal and fixed for all neurons. g ij i,j=1,...N is the matrix of conductances and the generic element g ij represents the conductance of the synapse going from neuron j to neuron i. Upon arrival of a presynaptic action potential elicited by neuron j, each of the conductances g ij with i = 1, . . . N, i = j increases by a quantity w ij , called effective synaptic efficacy, and decays exponentially back to zero with a fixed time constant τ g = 10 ms, equal for all synapses: where t f j is the f -th spike emitted by neuron j. The effective synaptic efficacy depends on both presynaptic and postsynaptic factors: where r ij and u ij are the presynaptic variables representing depression and facilitation in the STP model (see Subsection 2.2) and A ij is the postsynaptic variable for the maximum synaptic strength (or absolute efficacy), which represents the maximum synaptic response (see Subsection 2.7). If V i (t) ≥ 1 mV a spike is elicited by neuron i and V i (t + dt) is set to 0 for the next t ref = 10 ms (refractory period).
the degree of depression and facilitation of the synapse connecting neuron j to neuron i. The time course of r ij and u ij is given by the following kinetic equations (Markram et al., 1998b;Maass and Markram, 2002): U ij , τ rec ij and τ facil ij are the parameters of the model and they represent, respectively: fraction of resources used by the first action potential, time constant of recovery from depression and time constant of synaptic facilitation. A learning rule for STP has to allow changes to (at least one of) these parameters. At each synapse, the product of r ij and u ij determines the presynaptic efficacy.

STDP MODEL
We use the triplet learning rule defined by Pfister and Gerstner (2006) whereas when the neuron elicits a spike they increase by 1: Then, assuming that neuron i fires a spike, the STDP implementation of the triplet rule can be written as follows: where γ is the learning rate, is an infinitesimal time constant to ensure that the values of m 2 i and o 2 i used are the ones right before the update due to the spike of neuron i, and A ij is the maximum strength of the connection from j to i. Values of STDP amplitudes are taken from Pfister and Gerstner (2006) and are listed in Table 1.
In order to set A min we note that if the maximum weights connecting the input neurons to a specific output neuron all collapse to zero in the low firing rate regime, then, in the subsequent high firing rate regime, inputs were not able to "wake up" this neuron: it remained almost silent all the time. To avoid this, we set A min = 10 −3 . With such a small value we can still apply the symmetry measure (Esposito et al., 2014), which assumes A min = 0, see Subsection 2.9, to evaluate the symmetry of the network.

LEARNING TASK
Neurons are divided into different populations, each of them is required to fire at one of the two target firing rates: 30 Hz (high) or 5 Hz (low). To allow the populations to reach their target rate, both short-and long-term plasticity parameters are adapted via error-driven learning (see Subsection 2.6) and, in addition, the maximum synaptic strength is shaped by the STDP triplet rule (see Subsection 2.3).

INPUT SIGNAL AND INPUT NEURONS
In all simulations, the input signal is delivered only to a subset of neurons in the network, which we call input neurons N in . Each of these neurons receives a pulse-like stimulus with a fixed frequency ν in = 10 Hz, whose amplitude (2 mV) is chosen to always elicit an action potential in the corresponding input neuron. The stimulus delivery, however, is not synchronous across the input neurons, but it follows a sequential protocol: neurons are stimulated one after another with a fixed time delay t delay and in a fixed order. We choose t delay = (ν in N in ) −1 so that neurons that belong to input cyclically receive a stimulus. To further explain this, one may imagine labeling the neurons depending on the order they receive the stimulus, and therefore on the firing order, then have the fir- . . . , N N in , N 1 , N 2 , N 3 , . . . , N N in , N 1 , . . . , with each pair of adjacent spikes being separated by a time interval of t delay . We can think of the N in neurons as if they are organized in a ring and the stimulus as a cyclically traveling wave across this ring. To include the effect of noise, a random Gaussian variable with zero mean and standard deviation equal to 0.1 t delay is added to the firing times. The magnitude of the standard deviation is such that there is no inversion in the firing order. With this construction, the stimulus delivered to input neurons can be thought as generated by an external (not explicitly simulated) population of neurons where each external neuron projects only onto one corresponding input neuron. Note that, by construction, in the absence of any other signal, the firing pattern of the input neurons reflects that of the stimulus. This means that the external signal implicitly fixes a level of minimum activation for the N in neurons: their firing rate cannot be smaller than ν in . Due to this constraint, the input neurons, despite being free to change their parameters according to STP learning rules (see Subsection 2.6), are not totally free to regulate their firing activity, which may prevent them from effectively fulfilling the task. The rest of the neurons, instead, are totally free to adapt their activity and are called output neurons. For these reasons, we read out the interesting quantities only from output neurons (we refer to Results and to Figures 1A, 4A for more details on the architecture).

ERROR-DRIVEN LEARNING RULE FOR STP
The task can be formulated as an optimization problem where neurons regulate their own activity in order to minimize the objective function defined as: where ν lim is the maximum allowed frequency due to the refractory period (ν lim = 1/t ref ), ν targ is the target firing rate and ν is the mean firing rate across a single population. To calculate firing rates of single neurons ν i we use an exponential moving average with time constant τ ν = 1 s: whereν i is the current firing rate, which basically reflects if neuron i has fired ν i = 1 Hz or not ν i = 0 Hz . The population mean firing rate is therefore: with N pop being the size of the population.
By following a standard procedure, learning rules can be derived from Equation (9) by applying the gradient descent method (Hertz et al., 1991). Since the task is not based on single neurons but it involves an entire population, we use a mean-field approach for the derivation of the learning rules. Therefore, from now on in this section, we switch from the above single neuron notation to mean-field variables, by dropping the ij indices. It is worth noting that in our formulation the target is achieved not by directly acting on the firing rates, but by tuning the STP parameters, which in turn affects the firing itself. Therefore, ν = ν U, τ rec , τ facil and by using the chain rule we can formally write the following update rule for each parameter p: where η p is the learning rate, which in principle could be different for each parameter. The form of the function ν U, τ rec , τ facil can be derived with a semi-heuristic procedure, following . Whenever possible, for the meanfield variables we use the same symbols as in Vasilaki and Giugliano (2014) for consistency. Thus, we introduce the meanfield variables u, x, U, and A, respectively describing facilitation, depression, synaptic utilization and maximum strength. We assume a threshold-linear gain function between input mean current h and output mean firing rate ν = a [(h − ϑ)] + , for some constants a, ϑ. We can then write the dynamic mean-field equations for a population of neurons recurrently connected by short-term synapses as follows (Chow et al., 2005): where I ext is the mean external current and τ is a decaying constant. By imposing equilibrium conditions,ḣ =ẋ = u = 0, and combining the resulting equations, we can finally write: Now we observe that by taking the limit h → ∞ in F ( ν h ) we obtain an upper bound for the maximum allowed firing rate ν ≤ A τ rec + I ext (for more details see Vasilaki and Giugliano, 2014). We can heuristically turn the above inequality into an equality: so as by plugging Equation (15) into Equation (12) we can finally obtain an explicit form for the learning rule. In particular, since only one of the three parameters appears in Equation (15), we have a single rule for τ rec only: Then, according to the above derivation, the only parameter that needs to be learnt is τ rec . Here we adopt the view (Tsodyks and Markram, 1997;Markram et al., 1998b;Thomson, 2000;Chow et al., 2005) that facilitation/depression corresponds to small/large values of τ rec and U as well. Therefore, assuming that they apparently play a similar role, we can heuristically take a similar dependence of ν upon U: ν = A U + I ext , which leads us to a similar learning rule: With the same heuristic argument we can also write down a relation involving τ facil . Indeed, it is well-know that facilitation/depression corresponds to large/small values of τ facil , so we can hypothesize a linear relation, also including the dependence on the maximum strength for similarity with the other parameters. Thus, ν ∝ Aτ facil + I ext , which gives the following learning rule: Finally, based on the fact that A turns out to appear in Equation (15), and supported by experimental results showing an interaction between STP and STDP Sjöström et al., 2003), we can also introduce a STP-dependent learning rule for the maximum synaptic strength: This synaptic modification clearly does not substitute the traditional STDP, since the two rules come from different mechanisms. Rather, we assume they both contribute to maximum weights changes (see Subsection 2.7).

SINGLE NEURON LEARNING FRAMEWORK: COMBINING STDP AND STP LEARNING MODELS
Equations (16-19) are mean field learning rules for the four parameters τ rec , U, τ facil , A. It is straightforward to turn them into single neuron online learning rules. From now on, we return to a single neuron notation. Similarly to STDP, we hypothesize that modifications of STP are triggered by postsynaptic events: every time neuron i elicits a spike, its current firing rate is updated as well as the mean population firing rate. Neuron i can therefore backwards regulate its incoming synapses, through the following set of equations: The firing event of the neuron i also triggers STDP, according with Equation (8). This contribution sums up with the above STPdependent change, so as the total modification of the maximum synaptic strength is: Note that when we converted mean field population equations into single neuron equations we kept the population mean firing rate ν , instead of turning it into the single rate ν i . This is because the task is defined at a population level. Learning rates of the three STP parameters are chosen to be equal and error-dependent: withη = 0.1. The learning rate for maximum synaptic strength, instead, is fixed in time and it is the same as the one used for STDP, η A ij ≡ γ . Now we have four single neuron rules for the STP learning model, plus an equation for STDP and an equation for combining the different rules for the maximum synaptic strength. All these six rules together, Equations (8, 20-24) form a complete learning scheme for each neuron, which is implemented in our simulations. These rules are now local, since their computation takes place separately in each neuron, but receive a global signal encoding for the task performance error.

INVESTIGATION OF DIFFERENT RULE COMBINATIONS
In the Results section we consider different learning mechanisms: in addition to STDP, that is crucial for the formation of motifs Giugliano, 2012, 2014), different combinations of the four STP rules are taken into account while the remaining parameters are kept fixed. At first we allow only two parameters to change: (i) τ rec , because Equation (15) implies that for high frequencies this is the only critical parameter for adapting the firing rate of the population, and (ii) U, since it was a key parameter adopted in the work in Carvalho and Buonomano (2011). Then, we introduce the STP-dependent rule on the maximum synaptic strength, Equation (23), with the view to observe a more stable learning process. Following this, we also include τ facil in the learning scheme for a full parameter adaptation (full model) and finally we investigate the minimal number of parameters that needs to be adapted (minimal model), based on Equation (15). Looking for other parameter combinations might not be meaningful, as Equation (15) indicates the key parameters that are involved in changing the mean firing of the population.

CONNECTIVITY ANALYSIS
To reveal the type of connectivity in the output population, we use a symmetry index defined as a measure of the symmetry of the connectivity matrix W (Esposito et al., 2014): Here M is the number of instances where both A ij and A ji are zero, i.e., there is no connection between two neurons. Since in our case connections are bounded in the interval 10 −3 , 1 , M = 0 all the time. Equation (26) is able to capture the presence of global non-random structures in a network, returning a value included in [0, 1] . Values of s close to 1 reflect the presence of a global bidirectional motif, whereas when s approaches 0, a unidirectional motif is emerging. Note that, in order to apply the measure Equation (26), we assume that the lower bound for connections is 0. However, the choice of a small value such as 10 −3 does not affect the measure.

DATA SHARING
We provide the scripts that were used to construct the main figures of the paper in the ModelDB database, accession number:169242.

SINGLE POPULATION WITH A TIME-VARYING TASK: A CONTINUUM BETWEEN FACILITATION AND DEPRESSION
First, we apply our learning model to a specific task demonstrating how synapses can change their behavior driven by an external feedback signal. The problem we study is simple: a population of neurons is presented with a stimulus and is required to produce a certain output as a response to that stimulus. Once the learning has been successful, for the same input signal the desirable output changes. In other words, neurons are trained to respond to a change in the associative paradigm (inverted associative learning problem), that can be due to, for instance, changes in the environmental conditions. Let us give a concrete example of an inverted associative learning problem, taken from Asaad et al. (1998). In their work, the authors trained monkeys to associate visual stimuli (pictures) with delayed saccadic movements, left or right, with associations being reversed from time to time. Monkeys had to go beyond learning a single cue-response association: they are required to learn to associate, on alternate blocks, two cue objects with two different saccades. In other words, after having learned the relation object A, go right , and object B, go left , the associations were reversed such that now they needed to learn object A, go left and object B, go right . Similar to the (Asaad et al., 1998) experiment, we assume a binary problem, i.e., environmental conditions can change only between two states, and we measure the neurons' activity in terms of firing rate. This means that neurons are initially asked to fire at some rate and, after learning this task, they are asked to fire at a different rate, while keeping the same input signal all the time. Thus, the problem we defined is a simpler version of the monkey experiment, with only a single input. In order to train the neurons on the current associative paradigm, an external global signal is required, that can be considered as an error signal (see Section Methods 2.6 and 2.7).

Problem description and network architecture
We created a learning network of N = 40 conductance-based integrate-and-fire neurons (see Section Methods 2.1) all to all connected. Synaptic connections are modified by the STDP triplet rule (Pfister and Gerstner, 2006) and STP is implemented by using the Tsodyks and Markram model (TM model) described in Markram et al. (1998b); Maass and Markram (2002). Figure 1A shows the network architecture, composed by two non-overlapping regions: a blue one with N in = 30 neurons receiving the input signal and a red one with N out = 10 neurons from which we read out the quantities of interest. Note that for clarity, only a few neurons (black circles) and connections (black arrows) are drawn. The network is therefore formed by two functionally distinct populations, with the input population delivering the stimulus to the output one. Recursive connections are present within each population and across populations, and they are all plastic, in the sense of both long-term and STP. We refer to this architecture as a first or single population scenario.
The input neurons are stimulated one after the other, following a sequential protocol, and approximately with the same rate, ν in = 10 Hz. The amplitude of the stimulus is such that input neurons release a spike every time they receive an input (see Section Methods 2.5). This external source can be thought as an additional population of neurons, which we are not simulating here, where each "external" neuron is connected only with a corresponding neuron in the input population by means of a fixed feedforward connection (blue dashed arrows).
We hypothesize that the whole learning network (green region in Figure 1A) is presented with a sequence of two tasks while the stimulus pattern is kept fixed. The tasks are firing low (5 Hz) and firing high (30 Hz) and the sequence is low-high-low-high. Therefore, neurons have to repeatedly learn a new association and forget the previous one in a dynamic context divided in four phases of t ph = 100 s. We refer to them as: low 1, high 1, low 2, high 2. As discussed at the beginning of this section, this picture is inspired by a typical inverted associative learning problem: considering the monkey experiment from Asaad et al. (1998) as a metaphor, our scenario provides a simplified version, where instead of having two different inputs, object A and object B, we have a single input. Indeed, we can think we are presenting the network with only object A and while doing this we switch the target association between the two states go right and go left, which correspond to our low and high firing rate targets. We call the desirable context-dependent target rate, ν targ . As described in Methods, the difference between ν targ and the current firing rate of each population ν is the error signal that, according which our rate-dependent STP causes synapses to adapt their activity.
In all simulations, single neuron parameters U, τ rec , τ facil , w ij are initially drawn from uniform distributions (for i = j), respectively in [0.05, 0.95], [100, 900] ms, [1, 900] ms, 10 −3 , 1 . Synaptic variables are initialized at their equilibrium values, i.e., r ij = 1 and u ij = U ij . All the simulations in this subsection use γ = 1 for the high rate regime and γ = 2 for the low rate regime. Values of the parameters are listed in Table 1.

Learning U and τ rec
We initially studied the problem with a learning scheme involving U and τ rec only, Equations (20, 21), so there is no additional change in maximum synaptic strengths due to STP. Indeed, due to Equation (15) and (Carvalho and Buonomano, 2011), we wanted to test the hypothesis that U and τ rec are the only crucial parameters that need to be learnt for adapting the firing rate of a population. The results are displayed in Figures 1B-E, with vertical black arrows marking the beginning of each of the four phases, and in Figure 2. Figure 1B shows the average firing rate of the N out neurons, with shaded area being the standard deviation. Target firing rates are show with gray dotted lines. The adaptation to the new target is fast, except during the low 2 phase, when we switch from high to low rate, where an initial fast decrease of the firing rate is followed by a much slower adaptation. Despite the fact that neurons do not reach the target rate during this phase, we observe a monotonically decreasing activity which would eventually stabilize at 5 Hz if we were allowing the simulation to run for longer. The reason for this double slope adaptation will be further discussed now. Figure 1C shows the evolution of the symmetry index (see Section Methods 2.9). At the beginning, the value reflects the randomness in the connections (the mean value of s for a network with uniform random connections is indeed 0.614, see Esposito et al., 2014), whereas, as learning takes place, we observe the development of unidirectional (low values of s) and bidirectional (high values of s) motifs, depending on the set target. This can also be formalized by applying the p-value hypothesis test obtained by using mean and variance of s on a completely random network with uniform distribution of connections (Esposito et al., 2014). P-values are shown in Table 2. We, again, observe rather slow dynamics during the low 2 phase that, within the fixed simulation time, prevent the system from reaching a clear connectivity configuration. However, the trend of s clearly shows that the connectivity within the output population is approaching unidirectionality. Figures 1D,E depicts the time course of the recovery time constant τ rec and synaptic utilization U averaged across the output neurons, with shaded area representing standard deviation. Both parameters oscillate between high values, which correspond to depressing behavior, and low values, that indicate facilitation. Note that the dynamics of τ rec and U is fast in all phases, the third included. This is not surprising since STP is a fast process and leads to fast adaptation of its parameters. As a result, neurons' response to a change in the target rate takes place in a short time. However, during the low 2 phase, synaptic parameters saturate before the neurons could fulfill the task, with STDP being the only remaining mechanism through which the output population can regulate its own activity. This results in a much slower decrease toward the target rate for two reasons: (i) STDP by itself acts on much longer time scales, (ii) switching from high to low rate is the most challenging part of the entire sequence of tasks due to the saturation of the maximum weights in the previous high 1 phase, which slows down the process even further. Figure 2 provides additional evidence of the alternation between the two different synaptic behaviors. Plots are organized in five rows, with each row displaying information from a different phase of the simulation. Panel A shows the initial uniform condition, panel B the end of low 1 phase, etc. For each stage, we draw the histograms of recovery time constant (Column 1) and synaptic utilization (Column 2). According to the narrow standard deviation observed in Figures 1D,E, distributions peak around extreme values, reflecting two different, synaptic behaviors. Column 3 in Figure 2, displaying the single synapse traces obtained with a TM model, demonstrates the corresponding behaviors: at the end of the phases where neurons are required to fire low we observe a typical depressing response, whereas at the end of the high rate regimes synapses show a typical facilitating trace. To generate the traces, we used a 5 Hz signal to stimulate a synapse with a parameters given by the mean values obtained from the corresponding histograms. Note that the synaptic trace for the initial condition, i.e., before learning shapes the parameters, already shows depression, which explains why the distributions of τ rec and U at the end of the low 1 phase are much broader than in the following phases.
Altogether, the four panels (B-E) in Figure 1 and the five panels (A-E) in Figure 2 show that the properties and activity of the output population oscillate between two states and that the desirable structure is formed depending on the target rate. In particular, we observe that neurons that fire at low frequency turn their synaptic properties into depressing and the connections formed are mostly unidirectional. On the other hand, when the target rate is set at a high frequency, neurons develop facilitating synapses and bidirectional connections.

STP-dependent modification of A enhances performance
Given the speed convergence issue in the low 1 phase, we introduced an additional learning mechanism, i.e., the STP-dependent rule for A, Equations (23, 24). Indeed, this mechanism provides an additional way, besides the STDP, for regulating the long-term plastic synapses. In all the other aspects, the model remains as above. Figure 3 shows simulation results, with panels A-D depicting the same quantities as panels B-E in Figure 1 (symbols as before). A direct panel-by-panel comparison shows that the results are very similar, meaning that with this new learning configuration the output population also learns to adapt its synaptic properties in order to fulfill the current task, with subsequent motifs formation. As expected, due to the additional leaning rule for A, the dynamics are faster: in particular, during the low 1 phase, neurons reach the target rate within the simulation time, and the value of the symmetry measure is much lower than before, confirming the formation of a unidirectional motif; compare with Figures 1C, 3B and see Table 2. Note that the adaptation of the STP parameters is also faster, as they depend on the current value of the maximum synaptic strength. Thus, the STP-dependent modification of A improves the overall performance and introduces an interesting link between STP and STDP.

TWO POPULATIONS WITH A DIFFERENT TASK: SYNAPTIC DIFFERENTIATION
Now we consider a different scenario, which we refer to as the second or double population scenario. The two tasks associated with low and high targets are now simultaneously active and must be learnt by different populations, interacting via lateral connections and receiving the same stimulus source. Reasons are multiple: we want to investigate if our model allows to contemporary encode both associative paradigms, without the need of forgetting one of the two. In addition, we want to study the possibility that targetspecific STP emerges as a consequence of the target-dependent  learning rules we chose for our model. In particular, we want to test whether our model is able to reproduce existing experimental data, specifically that appearing in Table 1 from the paper by Wang et al. (2006).

Network architecture
The new configuration is depicted in Figure 4A and it is obtained by mirroring the structure of the first scenario and by adding recursive connections between functionally homologous populations. This led to a network of N = 80 conductance-based integrate-and-fire neurons, organized in two distinct branches of 40 neurons each, with the first branch required to fire at a high rate (ν = 30 Hz) and the second branch at a low rate (ν = 5 Hz). Both targets remain fixed throughout the entire simulation. Each branch is a replication of the architecture we used previously, i.e., it is formed by an input and an output population recursively connected. Thus, the network is formed by four functionally different populations: ℘ in 1 , ℘ in 2 , ℘ out 1 , ℘ out 2 , with obvious meaning of symbols. Input populations in both branches receive the stimulus from the same source: a single wave-like signal is delivered to the N input = 60 neurons with ν = 10 Hz, stimulating one neuron per time (see Section Methods 2.5), first the neurons in ℘ in 1 and then the neurons in ℘ in 2 . All connections are plastic following the STDP triplet rule and TM model for STP.
Lateral connections are present between the inputs ℘ in 1 , ℘ in 2 and between the outputs ℘ out 1 , ℘ out 2 . To stress that they are functionally different, we drew their initial values from a uniform distribution in 10 −3 , 10 −1 , but, during the evolution, synapses are allowed to grow up to A max = 1 as any other synapse. Furthermore, cross connections between different output and input populations, i.e., between ℘ in 1 , ℘ out 2 and between ℘ out 1 , ℘ in 2 are absent. The rest of the connections -within each population and across populations belonging to the same branch -are drawn from a uniform distribution in 10 −3 , 1 and they are not allowed to exceed this interval during the simulation. STP variables are initialized as in the single population scenario and in all the simulations presented in this subsection we used γ = 2 as the learning rate.

Full model: adaptation of U, τ rec , τ facil and A
We begin by studying the behavior of the full model: all four parameters are modified by our rate-dependent STP, Equations (20-24). Taking into account the modifications of all three STP parameters allows us to make a direct comparison with (Wang et al., 2006). Results are displayed in Figures 4B-C and in Figure 5. Figures 4B,C shows the time course of the mean firing rate and symmetry index in both output populations, black lines for ℘ out 1 and light gray lines for ℘ out 2 . Shaded areas and dark gray dotted lines represent standard deviation and target firing rates. Both populations ℘ out 1 , ℘ out 2 approach the target rate while developing specific connectivity: as expected, a bidirectional motif emerges in the population firing at the high rate whereas the population firing at the low rate develops mostly unidirectional connections.
Figures 5A-C shows the time evolution of the three parameters of the TM model: black lines and gray lines represent the mean value of the synapses projecting from the two output populations ℘ out 1 ℘ out 2 , respectively onto ℘ out 1 and ℘ out 2 . Shaded area is the standard deviation. As expected from the previous simulation, we observe that the two populations develop different synaptic types: high values of τ facil and low values of τ rec and U, as observed in the population firing at the high rate, suggest a facilitating behavior, whereas values as the one observed in ℘ out 2 , characterize depressing synapses. Mean values at the end of the simulation are reported in Table 3 rows 1,4. These results show that our model develops target-specific STP and results in good agreement with the data in Wang et al. (2006). Indeed, although single values are not identical, the qualitative synaptic behavior is represented: recalling the notation used in Wang et al. (2006), two main types of synapses are present. The group projecting from ℘ out 1 ℘ out 2 onto ℘ out 1 can be mapped onto the type E1 and the group projecting from ℘ out 1 ℘ out 2 onto ℘ out 2 that can be mapped onto the type E2.
Following Wang et al. (2006), we can also refine our classification, introducing a further distinction within each class. With this purpose, we show in Figures 5D-F the distributions of τ rec , τ facil and U at the end of the simulation within the entire output population ℘ out 1 ℘ out 2 . For each histogram, data have been divided into four groups, representing the four different subtypes: ℘ out 2 to ℘ out 2 with light gray, ℘ out 1 to ℘ out 2 with medium gray, ℘ out 1 to ℘ out 1 with dark gray, ℘ out 2 to ℘ out 1 with black. While the distinction between the two synaptic types mapping onto E1 and E2 is evident, the difference between two subtypes in the same type cannot be easily seen. However, by looking at the mean values of synaptic parameters in Table 3 rows 2, 3, 5, 6 and in particular the ratio τ rec /τ facil in Table 3 column 5, the distinction into four subtypes becomes more clear. As reported in column 7 of Table 3, we can map the synaptic subtypes as follows: E1a corresponds to the group ℘ out 1 → ℘ out 1 , E1b to ℘ out 2 → ℘ out 1 , E2a to ℘ out 2 → ℘ out 2 and E2b to ℘ out 1 → ℘ out 2 . Finally, similarly to Figure 2, in Figures 5G-J we show single synapse traces for each subtype. We observe that, except for the last trace, different groups effectively show a distinctive response to the same stimulus (12 Hz) and the traces reproduce the ones of the corresponding subtypes in Wang et al. (2006).  Figures 5D-F we present four different histograms for each parameter, we can reason on the overall distribution within the entire output population ℘ out 1 ℘ out 2 as the sample size is the same in all histograms. We can therefore observe that the distribution of τ rec closely matches that in Wang et al. (2006), whereas the distribution of U reproduces the peak at around 0.25 but is less broad. On the other hand, the distribution of τ facil is rather different, being totally shifted toward facilitating values in our case. This may be due to the fact that U is much more peaked around low values. We decided then to discard τ facil from the learning scheme and run a simulation where only U, τ rec and A are learnt, as we did for the single population scenario in subsection 3.1.3. We observed that the behavior of the output populations and all the results remain unchanged. We provide an explanation for this in the Discussion.

A minimal model for rate-dependent STP: adaptation of τ rec and A
Finally, we study the minimal model: a model that suffices to obtain the desired behaviors by adapting as few parameters as possible. The choice of the parameters to be learnt is naturally suggested by the form of the objective function Equation (9): τ rec and A. Interestingly, this minimal model preserves two key features: (i) both a presynaptic parameter, τ rec , and a postsynaptic parameter, A, participate in learning, (ii) STP and STDP are linked to each other through the STP-dependent modification of A.
In Figure 6 we show the results of the minimal model: from A to D, respectively: mean output firing rates, symmetry index, τ rec evolution and τ rec distribution in the four groups of synapses. By comparing these panels with the ones from the full model simulation, we observe that output populations still efficiently fulfill the task while developing the expected connectivity motifs. Also, in Table 4 we report the mean values of τ rec for the four groups of synapses that we identified with the full model: there is still a clear distinction between them. We can therefore conclude that this minimal model is sufficient for qualitatively reproducing the main two types and also the subtypes of Wang et al. (2006).

DISCUSSION
It is well-known that synapses are activity-dependent connections through which neurons propagate information. STP is a mechanism that describes these phenomena in short time scales and FIGURE 4 | Double population scenario: network architecture, activity and connectivity of the output populations with full (U, τ rec , τ facil , A) learning scheme (Part 1). (A) Architecture. The previous network is doubled so that there are now four populations: two input regions (blue) and two output regions ℘ out 1 , ℘ out 2 (red). The four populations are organized in two branches, one required to fire at high rates (30 Hz) and the second at low rates (5 Hz). Within each branch connections are all to all (black arrows) whereas initially weak connections (gray arrows) are present between the two output populations and between the two input populations. Input neurons receive a wave-like stimulus from outside (blue dashed arrows). All synapses obey both Spike-Timing Dependent Plasticity and rate-dependent Short-Term Plasticity. (B) Mean firing rate of the output populations, black line for ℘ out 1 , and gray line for ℘ out 2 . Shaded area represents standard deviation and horizontal dotted gray lines show the two target firing rates (30 Hz for ℘ out 1 , 5 Hz for ℘ out 2 ). (C) Symmetry measure applied on the connectivity of the output population. Color legend as in (B). Connectivity evolves differently in the two populations, leading to a bidirectional motif in ℘ out 1 and to a unidirectional motif in ℘ out 2 .
introduces two typical synaptic behaviors: depression and facilitation. Contrary to long-lasting modifications of maximum synaptic strengths, for example STDP, existing models of STP do not rely on any learning mechanisms, apart from very few exceptions; see for instance (Carvalho and Buonomano, 2011). Motivated by their work, it is our belief that more efficient dynamics would be possible if synapses were allowed to change their short-term behavior by tuning their own parameters, depending on one or more external controlling factors, for example, their current task. Typically, one asks which is the firing regime for which a certain type of synapse performs better (Barak and Tsodyks, 2007), whereas we are looking at the picture from a reverse perspective: we want to obtain some frequency regime, which is the most efficient way to do it from a synaptic point of view? A similar concept can be found in Natschläger et al. (2001), where the authors trained a network with a temporal structured target signal, using optimization techniques. In our work, we developed a learning scheme for STP, and we obtained, with a semi-rigorous argument, a learning rule for only one of the three parameters of the TM model, τ rec . Based on specific experimental results (Tsodyks and Markram, 1997;Markram et al., 1998b;Thomson, 2000) and data fitting (Chow et al., 2005), we used the conjecture that STP behavior of synapses has the same functional dependence on U and τ rec , which allowed us to write a similar rule for the synaptic utilization U. Interestingly, such learning rules depend on the maximum synaptic strength, and they therefore: (i) provided a natural link between STP and STDP and (ii) allowed us to derive an STP-dependent rule for the maximum synaptic strength, to be added to the STDP contribution.
The interaction between short-and long-term plasticity is largely supported by experimental evidence , although the exact mechanisms are still unknown. Some results (Markram and Tsodyks, 1996;Sjöström et al., 2003Sjöström et al., , 2007 suggest that synapses become more/less depressing after long-term potentiation/depression. Our rules incorporate this behavior: long-term potentiation/depression always produces larger/smaller changes in STP parameters. However, whether these modifications bring more facilitation or depression critically depends on whether the population firing rate ν is approaching the target rate ν targ from above or below. Consider, for example, Equation (16): if ν targ − ν < 0, then long-term potentiation will produce a stronger depression, thus reproducing the experimentally observed behavior. In our simulations, this happens to the neurons that are firing at low frequencies. On the other hand, if ν targ − ν > 0, then an increase in A will make τ rec even smaller, resulting in a less depressing synapse. In our simulations, this happens to the neurons that are firing at high frequencies. A similar argument can be formulated for the induction of long-term depression. We note that several mechanisms have been identified to compete during synaptic transmission, resulting in a more complex and less clear relationship between STP and STDP (Sjöström et al., 2007).
In Sjöström et al. (2003Sjöström et al. ( , 2007 the authors link the interaction between short-and long-term plasticity with the frequency of firing: at high rates, synapses tend to become stronger and more depressing, while at lower frequencies they tend to become weaker and less depressing. Our derivation, instead, suggests the opposite: if we rely on the hypothesis that large values of τ rec lead to depression and small values to facilitation (Chow et al., 2005), according to Equation (15), facilitating synapses allow neurons to reach higher frequencies. These findings, together with the STDP triplet rule, from the basis of our work: they provide the theoretical basis for the experimentally observed correspondence between facilitation and bidirectionality, and between depression and unidirectionality. The behavior expressed by Equation (15) is experimentally and computationally based on previous work that relates facilitation with high frequency and rate code, and depression with low frequency and temporal code (Fuhrmann et al., 2002;Blackman et al., 2013). This is because, for example, a facilitating synapse may require several spikes to elicit an action potential, meaning that only high frequency stimulation can generate postsynaptic spikes (Matveev and Wang, 2000;Klyachko and Stevens, 2006). We derived our rules by minimizing an error function that is equal to zero when the target and actual firing rates are equal. Alternatively, we could have defined a reward function opposite to the error function in the sense that for zero error the reward   (Wang et al., 2006) on the basis of the traces only shows that we are able to identify three of the four subtypes.
function has its maximum value, and it is equal to zero for large error. We could have then taken the gradient of the reward function instead, bringing the derived rules into the framework of policy gradient learning methods and reinterpreting the feedback signal as a reward signal (Urbanczik and Senn, 2009;Vasilaki et al., 2009a;Richmond et al., 2011). In biological systems, dopamine is thought to act as reward signal (Schultz et al., 1997;Fiorillo et al., 2003), and its role in the context of learning   Wang et al. (2006), we provide the results in the form mean ± s.m.e.. Column 5: ratio between the two time constants, τ rec /τ facil , in our simulation. Column 6: for a direct comparison, we provide the values of τ rec /τ facil as in Wang et al. (2006) associated with STDP, and more generally with Hebbian learning, has been extensively studied (Tobler et al., 2005;Izhikevich, 2007;Legenstein et al., 2008).
Each of the learning rules we proposed depends, however, on the difference between the target and the actual firing rates, computed at the population level. This implies the presence of: (i) a

Synaptic groups τ rec (ms)
℘ out Symbols are as in Table 3. Similar to Wang et al. (2006), we provide the results in the form mean ± s.m.e.
single feedback signal encoding the population activity, which is processed outside the population and broadcasted to all neurons; (ii) an external signal bringing information about the current paradigm, i.e., the target firing rate. Similar to Urbanczik and Senn (2009), we can assume that synapses receive both signals via ambient neurotransmitter concentrations, leading to an on-line plasticity rule. We initially tested our learning scheme by implementing the rules for τ rec and U on a classical paradigm of inverting associations: keeping the stimulus fixed and varying the associations, the network had to learn to first make choice A and then unlearn it in favor of choice B. This led to a network able to periodically switch its behavior from depressing to facilitating and vice versa, closely following the change in the association paradigm. Throughout the simulation, the network formed motifs similar to those experimentally observed in Wang et al. (2006) and Pignatelli (2009), with facilitating synapses developing bidirectional motifs and depressing synapses developing unidirectional motifs. The desirable motifs were formed due to two factors: (i) the triplet rule that governed long-term potentiation and (ii) the wave-like input stimulus of the network. The form of the plasticity rule guarantees that when neurons fire at high frequency, the synaptic efficacy increases. Hence, synapses will grow up to their bounds, leading to bidirectional connections. On the contrary, when neurons fire at low frequencies, the synaptic efficacy decreases, yet the wave-like input imposes unidirectional connectivity.
We further extended this learning model by adding an STPmotivated rule for the maximum synaptic strength, and we tested it on the same invert association scenario. Results showed the same behavior as before but with faster dynamics due to the joint action of STP and STDP on the absolute efficacy.
In the second part of the paper, we extended our study. First, we considered two populations that have to fire at different frequencies (low, high). Then, we introduced a learning rule for the facilitating time constant, in order to have a full learning model involving all four parameters. The aim was twofold: (i) Comparison of our results with experimental data in Wang et al. (2006). Although the accuracy is not excellent, we were able to qualitatively reproduce the basic differentiation in the ranges of values of the STP parameters, reflecting the existence of four different synaptic subtypes. We believe that by further adapting the model, in particular learning rates and target frequencies or by considering other rule combinations, it is possible to obtain different parameter values (in principle an infinite combination of them), and thus possibly reproduce the results of Wang and collaborators even better. However, we think this may not be critical because, as a recent study  has pointed out, fitting techniques generally used for deriving STP parameters from experimental data may give unreliable results. Given this limitation, we think it is important that our model accounts for a large variety of parameter values in principle, and that in this specific case of Wang et al. (2006) it is able to replicate the basic distinction in the synaptic response.
(ii) Differentiation of synaptic types innervating two functionally different populations. The reason for this lies in the way we constructed the learning model: what triggers the synaptic modification is the spike of the postsynaptic neuron. The firing rate of the population to which this postsynaptic neuron belongs is the information used to tune the values of STP parameters. In other words, we implement a target-specific learning mechanism. This choice is based on an optimization argument: the more direct and efficient way for a neuron to influence its own activity through synaptic changes is to modify incoming synapses rather than outgoing synapses. A second scheme, a source-specific learning mechanism modifying the outgoing synapses, would have probably led to the same results within closed microcircuits, but on a much longer time scale.
Our target-specific learning mechanism is also supported by experimental evidence (see Blackman et al., 2013 for a review). Despite the fact that STP seems to be mainly a presynaptic mechanism, it has been shown that the target cells can also determine the STP dynamics. All the studies we are aware of have established such a target specificity only in the context of excitatory cells innervating other excitatory cells on one hand and inhibitory cells on the other, specially interneurons (Markram et al., 1998b;Reyes et al., 1998;Buchanan et al., 2012). It would therefore be interesting to appropriately modify the double population scenario by incorporating a population of inhibitory neurons and comparing the results with existing data. In addition, some authors (Blackman et al., 2013;Costa et al., 2013) suggested that a similar differentiation might exist within excitatory only populations. Having target-specific STP for excitatory-excitatory connections is still an open possibility that needs to be further explored. Here we show from a theoretical point of view that such a differentiation is possible between fundamentally similar (all excitatory) but functionally different (encoding for different paradigms) targets.
The well-established existence of STP-target specificity provides us with a possible biological explanation for the learning rules we derived. Indeed, this scenario requires that the postsynaptic neuron can regulate specifically its own presynaptic compartment only, by a retrograde signal that does not affect neighboring cells. Thus, diffusive retrograde messengers, for example endocannabinoids and nitric oxide, do not appear to be the most suited agents, whereas synaptic adhesion molecules, for example cadherins (Bozdagi et al., 2004) and neuroligins (Dean and Dresbach, 2006), seem to be better candidates for playing this role. These molecules are responsible for governing the presynaptic transmitter release through many different presynaptic mechanisms (Zucker and Regehr, 2002;Blatow et al., 2003;Deng et al., 2011;Blackman et al., 2013).
We underline that the way we obtained the learning rules is based in part on heuristic evaluation. According to Equation (15), derived from a semi-rigorous argument, the key parameters seems to be τ rec and A. By also including U following Carvalho and Buonomano (2011), we obtain a learning scheme involving τ rec , U and A only, which we used to study the double population problem and evaluate the importance of τ facil . Results remain essentially unchanged from the full model, suggesting that τ facil does not play a critical role in the task we defined. This is not surprising and the reason is that our rules link facilitation with a high firing rate, and depression with a low firing rate. Indeed, even with a small facilitation time constant (small τ facil ), synapses are still able to fire at a high rate, as long as the stimulating frequency is high enough and recovery from depression is fast enough (low τ rec ). Therefore, the time constant of recovery from depression seems to be the only parameter regulating the firing frequency of the neuron for high firing rates, exactly as it comes out from the objective function (we recall that Equation 15 comes from an inequality obtained in the limit of high frequency). With our novel view of allowing synapses to modify their properties from facilitating to depressing and vice versa, we therefore suggest that τ rec is the parameter that is mostly related to rate coding, whereas U to temporal coding.
This conclusion is also supported by Carvalho and Buonomano (2011). In this paper the authors described a simple problem based on temporal synchrony between two inputs that cannot be solved unless STP is learnt, together with STDP. Besides the long-lasting change in A, they introduce a temporal synaptic plasticity for U only and they showed that this indeed solves the problem. Also, they reported that changing U only was the most efficient way to solve the problem. Our work supports the hypothesis that, when dealing with rate coding tasks, the only necessary parameter that has to be learnt is τ rec , whereas, based on Carvalho and Buonomano (2011), when dealing with temporal coding tasks, the only necessary parameter is U.
Another result pointing to a similar direction can be found in Natschläger et al. (2001), where the authors use optimization techniques, rather than explicit learning rules, to train a network of neurons in order to transform a time-varying input into a desired time-varying output. They show that to achieve good performance, one needs to change at least two parameters, either A and τ rec , or A and U. This confirms that learning must involve at least one presynaptic and one postsynaptic parameter, and that τ facil seems not to be relevant in these types of tasks.
We finally presented results from what we call the minimal model, where only τ rec and A were allowed to change, since both their corresponding update rules come directly from the gradient of the objective function we defined. Results confirmed our belief, as we were still able to learn the tasks while obtaining results similar to those from Wang and collaborators. It is in agreement with our conjecture that when we tried to apply learning on U and A only (results now shown here), the network failed to perform its task because the population that was supposed to fire high stabilized at a much lower frequency, i.e., ∼ 15 Hz. Therefore, an alternative minimal model adapting U and A would be able to successfully learn only targets of a lower firing regime. We believe that specialization of parameters in the STP model depending on tasks and signal encoding may be a key ingredient toward a better understanding of synaptic and neuron functionality.

AUTHOR CONTRIBUTIONS
All authors provided materials and contributed to writing the article.