Spike Pattern Structure Influences Synaptic Efficacy Variability under STDP and Synaptic Homeostasis. I: Spike Generating Models on Converging Motifs

In neural systems, synaptic plasticity is usually driven by spike trains. Due to the inherent noises of neurons and synapses as well as the randomness of connection details, spike trains typically exhibit variability such as spatial randomness and temporal stochasticity, resulting in variability of synaptic changes under plasticity, which we call efficacy variability. How the variability of spike trains influences the efficacy variability of synapses remains unclear. In this paper, we try to understand this influence under pair-wise additive spike-timing dependent plasticity (STDP) when the mean strength of plastic synapses into a neuron is bounded (synaptic homeostasis). Specifically, we systematically study, analytically and numerically, how four aspects of statistical features, i.e., synchronous firing, burstiness/regularity, heterogeneity of rates and heterogeneity of cross-correlations, as well as their interactions influence the efficacy variability in converging motifs (simple networks in which one neuron receives from many other neurons). Neurons (including the post-synaptic neuron) in a converging motif generate spikes according to statistical models with tunable parameters. In this way, we can explicitly control the statistics of the spike patterns, and investigate their influence onto the efficacy variability, without worrying about the feedback from synaptic changes onto the dynamics of the post-synaptic neuron. We separate efficacy variability into two parts: the drift part (DriftV) induced by the heterogeneity of change rates of different synapses, and the diffusion part (DiffV) induced by weight diffusion caused by stochasticity of spike trains. Our main findings are: (1) synchronous firing and burstiness tend to increase DiffV, (2) heterogeneity of rates induces DriftV when potentiation and depression in STDP are not balanced, and (3) heterogeneity of cross-correlations induces DriftV together with heterogeneity of rates. We anticipate our work important for understanding functional processes of neuronal networks (such as memory) and neural development.

1 The Influence of Auto-correlation Structure onto DiffV

The time scale of auto-correlation of Gamma process
In Section 3.2 of the main text, we make the statement that for a Gamma process, the time scale of auto-correlation τ cross ∼ CV 2 r0 when CV > 1, and τ cross ∼ 1 4CV 2 ·r0 when CV < 1, with r 0 being the rate of the Gamma process. Here, we make a simple proof.
Gamma process is a renewal process in which the inter-spike intervals are independently identically distributed as Gamma distribution: We denote X i to be the ith inter-spike interval, starting from the 0th spike. By defining moment generating function of Gamma distribution (http://mathworld.wolfram.com/gammadistribution.html) it is easy to prove that X ≡ n i=1 X i (which is the distribution of the interval between the 0th spike and the nth spike) follows Γ(x|nα, β), i.e.
1) When α < 1 (i.e. CV = 1 √ α > 1), the auto-correlation of a Gamma process concentrates near zero, which describes the tendency that if the 0th spike is at t 0 , then the next few spikes are also near t 0 . From eq.2, we see that when n = 1 α , X ∼ Γ(x|1, β), which is an exponential distribution. This means that if we choose a spike every n spikes from the Gamma process, these chosen spikes will form a Poisson process, whose connected auto-correlation 1 is zero. This suggests that the time scale of the auto-correlation of the Gamma process should be no longer than the time scale of n inter-spike intervals, which means that τ cross ∼ n r 0 = 1 αr 0 = CV 2 r 0 with r 0 being the firing rate.
2) When α > 1 (i.e. CV = 1 √ α < 1), the connected auto-correlation of a Gamma process is oscillating decaying. This is because that if the 0th spike is at t 0 , then the next nth spike tend to be near t n = t 0 + n r0 , because of the regularity of the spike train. From eq.2, the standard deviation (s.d.) of the distribution of the time of the nth spike will be β √ nα. The oscillating behavior of the auto-correlation will be damped if the spike time distribution of adjacent spikes are overlapped, especially when the probability that a spike appears at tn−1+tn 2 is almost the same as the probability that a spike appears at t n−1 or t n . This happens when the s.d. of the distribution is equal to half of the inter-spike interval: which gives (using β = 1 αr0 ) n = α 4 .
Therefore, the time scale of the auto-correlation of the Gamma process should be no longer than the time scale of n inter-spike intervals, which means that τ cross ∼ n r 0 = α 4r 0 = 1 4CV 2 r 0 .

The derivation of eqs. 21 and 22 in the main text
This subsection will derive eqs. 21 and 22 in the main text.
Suppose a spike of the central neuron is at time t i , let's consider the potentiation value caused by it in the ath non-central neuron, which is j ∆w a,p (t i , t j|a,i,p ) = A p ∞ j=1 exp(− t i − τ delay − t j|a,i,p τ ST DP ).
Under strict regularity, t j|a,i,p = t 1|a,i,p − (j − 1)∆t, with ∆t being the inter-spike interval. If we define t 0 = t i − τ delay − t 1|a,i,p , then the equation above becomes .
Under strict regularity, t 0 is uniformly distributed within [−∆t, 0], and it is easy to show that if the size of the 2 converging motif is sufficiently large, Var a ( j ∆w a,p (t i , t j )) = A 2 After considering the depression process, we have which is eq.21 in the main text.
Under Poisson processes, the ocurrences of spikes in any two small time bins of length dt are independent, and the variance of the number of spikes within a time bin is r 0 dt, with r 0 being the firing rate. As ∆w a,p (t i , t j ) = Therefore, which is eq.22 in the main text.
From eq.3 and eq.4, we know that to prove d reg < d P oi is equivalent to prove This statement above is easy to prove after noting that f (0) = 1 and f (x) < 0 for x > 0.

The Interaction of Auto-correlation Structure and Heterogeneity of Rates
In this section, we will consider the case when the spike trains are stationary processes (so that the trial averaged firing rates do not change with time), and different non-central neurons have different firing rates. In this case, the trial averaged synaptic change rate is with r 0 being the firing rate of the central neuron, r a being the firing rate of the ath non-central neuron. Therefore, 2 Var a (r a ).
We see that auto-correlation structure does not enter the formula above, and therefore does not influence DriftV in this case. To intuitively understand this, we denote the interval between a central spike t i and a non-central spike t j to be ∆t = t i − t j ; and note that as we suppose that the central and non-central spike trains are independent, the distribution of ∆t should be always uniform within (0, ∞) in the long run, independent on the auto-correlation structure of their spike trains. This causes the same potentiation and depression under STDP even when the auto-correlation structure changes.
Next, we want to understand how the auto-correlation structure may change DiffV under heterogeneity of rates.
To do this, we generate spike trains as Gamma processes, with the firing rate of the central neuron being kept at r 0 = 20Hz, and the firing rates of the non-central neuron following lognormal distribution with mean r 0 (by combining Model Auto & Model Long Tail in Materials and Methods in the main text). The coefficient of variance (CV ) of the spike trains of the central and non-central neurons are the same. We set A p = A d , so that DrV = 0 and DiV ≈ E T (Var a (∆w a )) (see eq.5 in the main text). We then compared E T (Var a (∆w a )) in this model with that under homogeneity of rate introduced in Section 3.2 of the main text. We find that heterogeneity of rates hardly changes DiffV when spike trains are bursty, but effectively discounts the increase of DiffV caused by regularity (Supplementary Figure 1A).
To understand the reason of this phenomenon, suppose that the smallest firing rate of the non-central neurons is r min and the largest is r max , and we can divide the interval [r min , r max ] into many bins of length 2 , with being a small value. We denote the sth bin to be A s = (r s − , r s + ), with r s being the middle value of this bin. If the converging motif is very large, then there will be many non-central neurons whose firing rates lie within each bin.
After implementing the theorem of total variance (see eq.2 in the main text), we have that Var a (∆w a ) = E As (Var rt∈As (∆w t )) + Var As (E rt∈As (∆w t )).
As DriftV = 0 here, we have Var As (E rt∈As (∆w t )) = 0, so that which means that the value of E T (Var a (∆w a )) can be understood by investigating how E T (Var rt∈As (∆w t )) changes with r s . As A s is a small bin near r s , we can suppose that all the values within A s can be approximately by r s , so that E T (Var rt∈As (∆w t )) ≈ E T (Var rt=rs (∆w t )), in which E T (Var rt=rs (∆w t )) represents the efficacy variability when the firing rates of all non-central neurons are approximated to be r s . Using simulations, we found that E T (Var rt=rs (∆w t )) changes with r s in the following features (Supplementary Figure 1B): 1) E T (Var rt=rs (∆w t )) tends to increase with r s .
2) When CV 1 (i.e. the spike trains are regular), E T (Var rt=rs (∆w t )) tends to sharply peak at r s = r 0 , and may also peak at 2r 0 , 1 2 r 0 etc. To understand the first point above, note that if we regard the synaptic changes under STDP as random walks,  Figure 4 in the main text.
5 then the synaptic change caused by a spike of a non-central neuron can be regarded as a step of the random walk. If the firing rates of the non-central neurons are large, then the step number of the random walk on a synapse per unit time will be large, thereby increasing the diffusion strength. In our model, we found E T (Var rt=rs (∆w t ))∝r s (with ∝ representing "approximately proportional to") when CV > 1 (Supplementary Figure 1B, upper panels). From eq.6 and 7, this implies that E T (Var a (∆w a ))∝E s (r s ). This means that DiffV will not significantly change as long as the mean firing rate of the non-central neurons conserves, which explains why E T (Var a (∆w a )) does not change significantly when the firing rates of the non-central neurons become widely distributed (Supplementary Figure 1A).
The second point above can be understood using the mechanism transient cross-correlation introduced in Section 3.2.2 in the main text. When spike trains are strongly regular, if N 1 r s = N 2 r 0 (with N 1 and N 2 being two positive integers with no common divisor larger than 1), then E T (Var rt=rs (∆w t )) can be enlarged by the correlation between the synaptic updatings caused by adjacent central spikes. DiffV under strong regularity has a sharp peak at r s = r 0 (Supplementary Figure 1B lower panels). So if the firing rates of all the non-central neurons are r 0 , then DiffV will be large under strong regularity. However, if the firing rates of the non-central neurons are heterogeneous, then many r s will be at non-peak values in Supplementary Figure 1B lower panels. From eq.6 and 7, this implies that E T (Var a (∆w a )) will be decreased. Therefore, heterogeneity of rates decreases DiffV in regular spike trains (see Supplementary Figure 1A Because of τ cross τ ST DP , we neglect the contribution to the efficacy variance by the random displacements of spike times in a synchronous event, therefore the efficacy variance is only contributed by the difference of spike numbers of different neurons in a synchronous event. In this case, suppose an central spike at time t i , and a synchronous event S happening during (t 1 , t 2 ) with t i − τ delay < t 1 < t 2 , then we approximate the STDP updating caused by the central spike and any non-central spike in S using the time difference between t i and the middle time Thus, the efficacy variance caused by pairing the central spike and the synchronous event S is with N a being the spike number of the ath non-central neuron during S. This approximation will be used in our following calculations.
τ cross ≤ τ delay : Given an central spike at time t i , we denote the synchronous event that the central spike belongs to as S 0 . When τ cross ≤ τ delay , the interaction of S 0 and t i can only depress the synapses. After considering the axonal delay, the 6 duration of the interval between t i and the middle time of S 0 is uniformly distributed within (τ delay − τcross 2 , τ delay + τcross 2 ), therefore, its contribution to E T (Var( j ∆w a,d (t i , t j ))) is We again use the fact that τ cross τ ST DP in the approximation above. In our model, the occurrence of synchronous events is a Poisson process of rate r 0 /p. For simplicity, we set τ cross → 0 for the synchronous events other than S 0 . In this case, Similarly, Therefore, combining eqs.9-11, we have that for τ cross ≤ τ delay , which is eq.27 in the main text. When p is large, Var a (N a (p)) ≈ pCV 2 SpikeN um (Cox (1962) which is eq.28 in the main text.
τ cross > τ delay : Suppose that the synchronous event S 0 that the central spike t i belongs to lasts during (t 1 , t 2 ), then when τ cross > τ delay , there are two possibilities: 1) If t i < t 1 + τ delay , all the non-central spikes come at the central neuron after t i , thereby depressing the synapses.
2) If t i > t 1 + τ delay , then it is possible that some non-central spikes come at the central neuron earlier than t i , which potentiate the corresponding synapses, while the other non-central spikes come at the central neuron later than t i , which depress the corresponding synapses.
After considering these two possibilities, we can write the contribution of S 0 to E T (Var a ( j ∆w a,d (t i , t j ))) as And we again use the fact that τ cross τ ST DP in the approximation above. In this equation, Var a (N a , x) means that if M a of the N a spikes of the ath non-central neuron are chosen (which means that they lie within an interval of duration ∆τ with ∆τ /τ cross = x in a synchronous event), then Var a (N a , x) = Var a (M a ). In this model (Model Sync in the main text), the time of each of the N a spikes is independently and uniformly distributed within the synchronous event S 0 , therefore each of them has probability x to be chosen.
Similarly, the contribution of S 0 to E T (Var a ( j ∆w a,p (t i , t j ))) is And the same as the τ delay < τ cross case (eqs.10 and 11), we approximate the contribution of the synchronous events other than S 0 as Eqs.14-16 together give the value of E T (Var a ( j ∆w a,d (t i , t j ))) + E T (Var a ( j ∆w a,p (t i , t j ))).
Here M a follows binomial distribution, whose mean is N a x and variance is N a x(1 − x). Therefore, with q(N a ) being the distribution of N a , and we also use the fact that Var a (N a ) ≈ pCV 2 SpikeN um if p is large and dN a q(N a )N a = p. Therefore, we know that in eq.14, and in eq.15, Combining eqs.14-19, we have with which gives eq.29 in the main text.

The Influence of Synchronous Firing onto c II
In this section, we will try to understand the influence of synchronous firing onto c II (Supplementary In this section, we will discuss the influence of synchronous firing onto c II only for completeness.
Readers may skip this subsection when reading for the first time.
From Supplementary Figure 2 A4, B4 and C4, we can see that c II is usually smaller than 1, which, by definition (eq.15 in the main text), means that ρ P D is usually negative. The reason for this negative correlation is the same as that shown in Figure 4D inset in the main text: if the ath non-central neuron fires more (less) spikes, then both the potentiation and depression imposed on the ath synapses tend to be strong (weak). Therefore, the total potentiation and depression value tend to be negatively correlated through the heterogeneity of spike numbers of the non-central neurons. However, c II increases with the decrease of CV SpikeN um , and becomes larger than 1 when CV SpikeN um is small enough. By definition (eq.15 in the main text), c II > 1 means ρ P D > 0. We will try to understand this phenomenon in the following discussion.
Suppose a central spike t i and the synchronous event S 0 that t i belongs to, let us consider a synchronous event S − that occurs before S 0 , and a synchronous event S + that occurs after S 0 . Then under the case that the inter-event interval p/r 0 is large enough (which is realized when p is large) and the occurrence of synchronous events is not too bursty, it is likely that In A1-C4, the size of the converging motif, the parameters for STDP as well as the simulation time and trials are the same as in Figure 4 in the main text. 10 If τ cross τ ST DP , then the synaptic updating caused by pairing t i with non-central spikes in S + or S − will not have much difference between when τ cross ≤ τ delay and when τ cross > τ delay , since in both cases the synaptic updatings can be approximated by with t S− and t S+ respectively being the middle time of S − and S + . However, the interactions of t i with non-central spikes in S 0 is quite different when τ cross ≤ τ delay with those when τ cross > τ delay . When τ cross ≤ τ delay , all the non-central spikes in S 0 depress the synapses after pairing with t i (i.e. ∆w a (t i , t j∈S0 ) < 0), but when τ cross > τ delay , some non-central spikes may potentiate the synapses (i.e. ∆w a (t i , t j∈S0 ) > 0). From Supplementary Figure 2 A4, B4 and C4, c II decreases with CV SpikeN um when τ cross > τ delay , but does not significantly change when τ cross ≤ τ delay , which suggests that the possibility that ∆w(t i , t j∈S0 ) > 0 under τ cross > τ delay is a key point to understand the decrease of ρ P D with CV SpikeN um .
Intuitively, during a synchronous event (0, τ cross ) (with τ cross > τ delay ), the non-central spikes that are emitted during (0, t i − τ delay ) will potentiate the synapses, and the non-central spikes that are emitted during (t i − τ delay , τ cross ) will depress the synapses. If CV SpikeN um is very small, then the ath neuron can fire almost exact p spikes during a synchronous event. In this case, if these p spikes are within (0, t i − τ delay ), then all of them will potentiate the ath synapses, so that ∆w a,p is large and ∆w a,d = 0; but if they are all within (t i − τ delay , τ cross ), then ∆w a,p = 0 and ∆w a,d will be very negative. This seems to be a possible mechanism that positively correlate ∆w a,p and ∆w a,d when CV SpikeN um is small and τ cross > τ delay (Supplementary Figure 2 A4). Mathematically, it is complicated to analytically calculate ρ P D , but we can implement this idea by using the following simple model, thereby understand the increase of ρ P D with the decrease of CV SpikeN um under τ cross > τ delay .
Let's suppose S 0 happens during (0, τ cross ) with τ cross > τ delay , and the spike of the central neuron is at t i − τ delay = xτ cross with 0 < x < 1, so that x portion of the spikes of the non-central neuron potentiate the synapses, while 1 − x portion depress the synapses. If τ cross τ ST DP , then we can suppose that all the spikes within (0, xτ cross ) will potentiate the synapses by y p ≈ A p exp(− xτcross 2τ ST DP ), and all the spikes within (xτ cross , τ cross ) will depress the synapses by −y 2τ ST DP ). The correlation between the potentiation and depression values in this model will be calculated in Section 4.1 below (eq.22), the result is , with q(N a ) being the probability distribution of the spike number of the ath neuron in a synchronous event. We will explain the general meaning of Var(M a |N a , x) in Section 4.1 below (eq.21); but here (i.e. in Supplementary are chosen independently with probability x. In this case, M a follows binomial distribution, whose mean is N a x and variance is N a x(1 − x). Therefore (see also eq.17), From this equation, we can see that if we let a = CV 2 SpikeN um , then it is easy to show that As a > 0, 0 < x < 1, it is easy to show that ∂ρ P D ∂a < 0, which means that ρ P D increases with the decreasing of CV SpikeN um . This explains the decrease of ρ P D with CV SpikeN um and the positive ρ P D when CV SpikeN um is small under τ cross > τ delay .

The Interaction of Synchronous Firing and Auto-correlation Structure
As we mentioned in the main text, auto-correlation structure comes into spike patterns with synchronous firing in at least three ways: 1) The broadness of the distribution of spike number per neuron per synchronous event (AT SpikeNum ).
2) The burstiness/regularity of the pieces of spike train within synchronous events (AT WithinEvent ).
3) The burstiness/regularity of the occurrence of synchronous events (AT events ).
We have already discussed the influence of AT SpikeNum onto DiffV in the main text, in the following part of this section, we will consider AT WithinEvent and AT events . Overall, we find that broader distribution of spike number per neuron per synchronous event (for AT SpikeNum ), burstier spike trains within synchronous events (for AT WithinEvent ), and burstier occurrence of synchronous events tend to increase DiffV (for AT events ). These results can be concluded into a rule of thumb: the burstiness of spike trains tends to increase DiffV, while the regularity tends to decrease DiffV.

The Burstiness of the Piece of Spike Train within a Synchronous Event
We use the following model (Model Sync-Auto 1) to generate spike trains in which both AT SpikeNum and AT WithinEvent can be explicitly controlled.
Model Sync-Auto 1: In this model, the occurrence of synchronous events and the spike number that a neuron is to fire during a synchronous event are determined in the same way as Model Sync in the Materials and Methods of the main text (CV SpikeN um = 1 by default). If a neuron is to fire M (M > 0) spikes during a synchronous event of duration τ cross , then the piece of spike train during this interval will be generated as follows: We first define a Gamma process with rate p/τ cross (here p has the same meaning as it in Model Sync in the main text, which is the mean spike number of a neuron during a synchronous event) and coefficient of variance CV W ithinEvent , and then generate M + 2 spikes using this Gamma process. Suppose that the time of the ith spike of the Gamma process is t i (0 ≤ i ≤ M + 1 and t 0 = 0), and the synchronous event is within the interval [t event , t event + τ cross ]. Then the jth (with 1 ≤ j ≤ M ) spike of the neuron in the synchronous event will be at time with x being a random number uniformly chosen from the interval [0, τ cross ], and is fixed for all these M spikes.
The idea of this operation is that given a spike train of M + 2 spikes, we first cut the spike train at the middle time between the 1st and 2nd spikes, and also cut at the middle time between the last 1st and last 2nd spikes.
Then we rescale the left spike train to length τ cross , and translationally shift the spike train by a random interval, implementing periodic condition to deal with the spikes being moved out of the time boundary of [0, τ cross ] during the shift. By doing this, the probability density that a spike appears at any time during the interval [0, τ cross ] is the same for different trials.
In this model, the burstiness of the spike train within a synchronous event is controlled by CV W ithinEvent . If CV W ithinEvent is small, then the neurons will fire regularly in a synchronous event; if CV W ithinEvent is large, then the neurons will fire burstly in a synchronous event.
Our simulations suggest that the burstiness of the piece of spike train within a synchronous event does not significantly influence DiffV when τ cross ≤ τ delay , but it increases DiffV when τ cross > τ delay (Supplementary Figure 3A). We also find that d and c II are the main reasons for the increase of DiffV with CV W ithinEvent , c I does not has significant effect (Supplementary Figure 3B-D).
To understand the change of d with CV W ithinEvent , note that under the assumption τ cross τ ST DP , eq.13 still applies to the case τ cross ≤ τ delay , and eqs.14-16 still apply to the case τ cross > τ delay . When τ cross ≤ τ delay , the efficacy variance mainly comes from the difference of spike numbers of different non-central neurons in a synchronous event, therefore if the distribution of spike numbers per neuron per synchronous event is kept unchanged, then the burstiness of the piece of spike train within a synchronous event can hardly contribute to the efficacy variance: this is the reason why CV W ithinEvent hardly influences d when τ cross ≤ τ delay (Supplementary Figure 3B). When τ cross > τ delay , we can see from eqs.14 and 15 that the burstiness of the piece of spike train within S 0 (i.e. the synchronous event that the central spike t i lies in) may influence the efficacy variance through changing Var a (N a , x).
As mentioned before, this factor means that if M a of the N a spikes of the ath non-central neuron lie in an interval of duration ∆τ (with ∆τ /τ cross = x) in a synchronous event, then Var a (N a , x) = Var a (M a ). Therefore, , with ρ P D being the correlation coefficient between the total potentiation and depression values imposed on the same synapse, and f P D is the coupling factor. To understand the change of c II with CV W ithinEvent under τ cross > τ delay , we plot ρ P D and f P D with CV W ithinEvent , and find that ρ P D is the main reason for the increase of c II with CV W ithinEvent , the contribution of the coupling factor is not significant (Supplementary Figure 5AB).
ρ P D tends to be negative, which is because of the heterogeneity of spike numbers of the non-central neurons. If the ath non-central neuron fires more spikes, then both the potentiation and depression processes on the ath synapse will get stronger. This is the reason why Corr a (∆w a,p , N a ) > 0 (with N a being the spike number of the ath non-central neuron) and Corr a (∆w a,d , N a ) < 0 (Supplementary Figure 5C), which makes ρ P D = Corr a (∆w a,p , ∆w a,d ) < 0.  Figure 4 in the main text. Error bars represent s.e.m.
We find that CV W ithinEvents hardly influences ρ P D when τ cross ≤ τ delay , but increases ρ P D when τ cross > τ delay (Supplementary Figure 5A). Now we try to understand this phenomenon in the following discussion.
Suppose an central spike at time t i and the synchronous event S 0 that the spike belongs to, let us consider a synchronous event S − that occurs before S 0 , and a synchronous event S + that occurs after S 0 . Then under the case that the inter-event interval p/r 0 is large enough (which is realized when p is large) and the occurrence of synchronous events is not too bursty, it is likely that If τ cross τ ST DP , then the synaptic updating caused by pairing t i and non-central spikes in S − and S + can be 16 approximated by with t S− and t S+ respectively being the middle time of S − and S + . In the equations above, we have used similar approximations as in eq.8. We can see that the spike patterns within S − and S + hardly have influence onto the synaptic weights.
Similar arguments also apply to S 0 when τ cross ≤ τ delay , because in this case, all the spikes of the non-central neurons within S 0 depress the synapses with their interactions with t i , and so that the spike pattern within S 0 also hardly influences synaptic weights. This explains why CV W ithinEvent has little influence onto ρ P D when τ cross ≤ τ delay (Supplementary Figure 5A).
However, the situation is different when τ cross > τ delay , because the equation above is no longer valid in this case, and ∆w a (t i , t j∈S0 ) is larger than zero if t j + τ delay < t i . From Supplementary Figure 5A, ρ P D increases with CV W ithinEvent when τ cross > τ delay , but not when τ cross ≤ τ delay , which suggests that the possibility that ∆w a (t i , t j∈S0 ) > 0 under τ cross > τ delay is a key point to understand the increase of ρ P D with CV W ithinEvent . Intuitively, during a synchronous event (0, τ cross ) (with τ cross > τ delay ), the non-central spikes that are emitted during (0, t i − τ delay ) will potentiate the synapses, and the non-central spikes that are emitted during (t i − τ delay , τ cross ) will depress the synapses. If CV W ithinEvent is large, then the piece of spike train within a synchronous event will be bursty, so that the following situation is likely to happen: if the ath non-central neuron fires all its spikes during (0, t i − τ delay ), then all its spikes potentiate the ath synapse, so that ∆w a,p > 0 while ∆w a,d = 0; however, if the ath non-central neuron fires all its spikes during (t i − τ delay , τ cross ), then ∆w a,p = 0 and ∆w a,d will be very negative. This seems to be a possible mechanism that positively correlate ∆w a,p and ∆w a,d when CV W ithinEvent is large (Supplementary Figure 5A). Mathematically, it is complicated to analytically calculate ρ P D , but we can implement this idea using the following simple model, which focuses on a single synchronous event, thereby helps to understand the increase of ρ P D with CV W ithinEvent under τ cross > τ delay .
Let's suppose S 0 happens during (0, τ cross ) with τ cross > τ delay , and the spike of the central neuron is at t i − τ delay = xτ cross with 0 < x < 1, so that x portion of the spikes of the non-central neuron potentiate the synapses, while 1 − x portion depress the synapses. If τ cross τ ST DP , then we can suppose that all the spikes within (0, xτ cross ) will potentiate the synapses by y p ≈ A p exp(− xτcross 2τ ST DP ), and all the spikes within (xτ cross , τ cross ) will depress the synapses by −y 2τ ST DP ). With the same as eq.21, we denote N a as the spike number of the ath non-central neuron during S 0 , and denote M a as the spike number of the ath non-central neuron within (0, xτ cross ). Then, the variance of the potentiation value caused by the spikes within (0, xτ cross ) is where we use eq.21 in the last step. Similarly, the variance of the depression value caused by the spikes within (xτ cross , τ cross ) is And the variance of the total STDP updating value caused by the spikes within S 0 is Therefore, .
If we let then it is easy to show that   Figure 3C, we know that c I does not significantly influence DiffV when CV W ithinEvent changes, which means that it is not an important factor to understand the change of DiffV with CV W ithinEvent .
Here, we discuss the change of c I with CV W ithinEvent only for completeness. Readers may skip this subsection when reading for the first time.
Suppose a spike of the central neuron t i ∈ S 1 (with S 1 being a synchronous event), and a spike of the ath non-central neuron t a,j ∈ S 2 (with S 2 also being a synchronous event). If the inter-event interval p/r 0 is large and the occurrence of synchronous events is not too bursty, then under τ cross ≤ τ delay , it is likely that: 1) If S 2 occurs before S 1 , then t i − τ delay > t a,j , for all t a,j ∈ S 2 ; The same as A, but for l,k ρ l,k (see eq.30 in the main text). We found that ρ l,k safely decays to zero when l ≥ 50 in our parameter range, therefore we cut off l at l = 50 when calculating the summation. (C) The same as B, but for l,k f l,k (see eq.31 in the main text). In A-C, the size of the converging motif, the parameters for STDP, as well as the simulation time and trials are the same as in Figure 4 in the main text. Error bars represent s.e.m.
2) If S 2 occurs after S 1 or S 2 = S 1 , then Under τ cross ≤ τ delay , the first condition above implies that the synaptic updating caused by pairing t i and t a,j can be approximated by (with t S1 and t S2 being the middle time of S 1 and S 2 ); and the second condition above implies that These two equations suggest that when τ cross ≤ τ delay , the spike pattern within the piece of spike train within a synchronous event does not have significant effect on the synaptic changes ∆w a (t i , t a,j ), which is the reason why c I hardly changes with CV W ithinEvent when τ cross ≤ τ delay (Supplementary Figure 3C).
When τ cross > τ delay , some non-central spikes in S 1 may potentiate the synapses after pairing with t i , and some others may depress the synapses. In this case, it is difficult to understand the change of c I with CV W ithinEvent , and we resorted to simulations. Our simulations suggest that c I may slightly decrease with CV W ithinEvent in this case, and the decrease of correlations instead of coupling factors is the reason for this phenomenon (Supplementary Figure 6). But overall, the influence of CV W ithinEvent onto c I is not strong.

The Burstiness of the Occurrence of Synchronous Events
To investigate the influence of the burstiness of the occurrence of synchronous events onto DiffV, we use the following model to generate spike trains: Model Sync-Auto 2: In this model, the spike train of a neuron within a synchronous event is determined in the same way as Model Sync in the main text (we set CV SpikeN um = 1 by default). The occurrence of synchronous event is a Gamma process with rate p/τ cross and coefficient of variance CV events . If CV events is small, then synchronous events will occur regularly; if CV events is large, then synchronous events will occur burstly.
Our simulations suggest that the burstiness of the occurrence of synchronous events tends to increase DiffV, especially when CV events > 1 (Supplementary Figure 7A). It does this through d and c I (Supplementary Figure 7BC), while c II contributes negatively (Supplementary Figure 7D).

4.2.1
The influence of CV events to d CV events influences d mainly through the synchronous events other than S 0 , i.e. E T (Var a ( j / ∈S0 ∆w a,d (t i , t j ))) and E T (Var a ( j / ∈S0 ∆w a,p (t i , t j ))) (see eqs.10 and 11). To understand the underlying mechanism of this influence, we let τ delay = 0. In this case, eq.16 becomes with {τ l } (l = 0, 1, · · ·) being a Gamma process with rate r 0 /p and coefficient of variance CV events , starting from ] means that that more synchronous events can be gathered closer to S 0 with the increasing of CV events .

The influence of CV events to c I
Following similar procedure as in Figure 7 in the main text, we find that the increase of c I with CV events is mainly due to the increase of correlation coefficients ρ m,n;k (see eq.17 in the main text) (Supplementary Figure 9A), while the coupling factors f m,n;k (see eq.17 in the main text) don't have significant contributions (Supplementary Figure 9B). To understand the increase of m<n ρ m,n;k with CV events , let's for simplicity only consider the case τ cross < τ delay p r0 (with p/r 0 being the inter-event interval). This calculation will help us gain insight on the mechanisms how CV events increases ρ m,n;k .
In this case, the depression of the ath synapse caused by the mth spike (which belongs to the synchronous event S 0 ) of the central neuron can be approximated by with CV events (see eq.23). The results were from averaging over 10000 trials of Gamma processes with rate r 0 /p (here we set r 0 = 20Hz, p = 5).
with {τ l } (l = 0, 1, · · ·) being a Gamma process with rate r 0 /p and coefficient of variance CV events , starting from τ 0 = 0, and N a,l being the spike number of the ath non-central neuron in the lth synchronous event after S 0 . In the equation above, we omit the contribution to ∆w a,d (t m ) by the synchronous events that happens during the interval . This approximation is acceptable if τ delay is far smaller than the inter-event interval p/r 0 (which is particularly correct when p is large) and when the occurrence of synchronous events is not too bursty. From eq.24, we know that if t m and t n are two spikes of the central neuron that belong to two immediately adjacent synchronous events S 0 and S 1 , then Then ρ m,n;d = Corr a (∆w a,d (t m ), ∆w a,d (t n )) = Corr a (X a + bY a , Y a ) with , From eq.26, we can see that ∆w a,d (t m ) and ∆w a,d (t n ) are correlated together by sharing the term Y a , and their correlation can be increased if the variance of the correlated term bY a is increased, which can be realized by increasing either |b| or Var a (Y a ). We will see how CV events increases their correlation through b 2 and Var a (Y a ) in the following discussion. We found that ρ l,k safely decays to zero when l ≥ 50 in our parameter range, therefore we cut off l at l = 50 when calculating the summation. (B) The same as A, but for l,k f l,k (see eq.31 in the main text). Note that the percentage of the change of l,k f l,k is significantly smaller than that of l,k ρ l,k . (C) E T (ρ m,n;d ) as a function of CV events , with m and n represent two central spikes in immediately adjacent synchronous events. The dots with error bars give simulation results; the blue line represents results calculated from eq.30, using the distributions of b 2 and Var a (Y a ); the red line represents results calculated by replacing b 2 and Var a (Y a ) in eq.30 with E T (b 2 ) and E T (Var a (Y a )). In A-B, the size of the converging motif, the parameters for STDP, as well as the simulation time and trials are the same as in Figure  4 in the main text. In C, the statistics of b 2 was from 10000 trials of the Gamma process, and the statistics of Var a (Y a ) was from simulations of the same conditions as in A and B. Error bars represent s.e.m.
As N a,0 is indepedent of N a,l in eq.24, Corr a (X a , Y a ) = 0. Therefore eq.26 becomes We have to use the distribution of b 2 and Var a (Y a ) to calculate E T (ρ m,n;d ). Even if it is possible to do so, such calculation has little help for us to intuitively understand the physical mechanisms why E T (ρ m,n;d ) changes with CV events . For a good understanding of this mechanism, we will estimate how E T (b 2 ) and E T (Var a (Y a )) change with CV events (note that E T (Var a (X a )) does not change with CV events ), and then compare E T (ρ m,n;d ) with Vara(Xa)+E T (b 2 )E T (Vara(Ya)) in the following discussion. When the occurrence of synchronous events is a Gamma process of rate r 0 /p and coefficient of variance CV events , the distribution of τ 1 is To estimate how E T (b 2 ) changes with CV events , note that and it is easy to show that log(1 + ax)/x is a decreasing function of x if a > 0. Therefore, E T (b 2 ) is an increasing function of CV 2 events . What's more, Figure 8).
The discussions above shows that both E T (b 2 ) and E T (Var a (Y a )) increase with CV events . As ρ m,n;k is an increasing function of both b 2 and Var a (Y a ) (eq.30), our calculations above give an understanding on why E T (ρ m,n;d ) increases with CV events . From eq.26, we can see that ∆w a,d (t m ) and ∆w a,d (t n ) are correlated together by sharing the term Y a , and their correlation can be increased if the variance of the correlated term bY a is increased, which can be realized by increasing either |b| or Var a (Y a ). From the discussions above, we see that CV events increases the correlation coefficient through both of the two mechanisms.
In Supplementary Figure 6C, we compare E T (ρ m,n;d ) with , we see that although they do not coincide, they have the same tendency to increase with CV events . What's more, , which suggests that the broad distribution of b 2 and Var a (Y a ) tends to decrease E T (ρ m,n;d ). Figure 7D, we know that c II negatively contributes to the increase of DiffV with CV events especially when CV events > 0.7, which means that c II is not an important factor to understand the change of DiffV with CV events . Here, we discuss the influence of CV events onto c II only for completeness, readers may skip this subsection when reading for the first time. When CV events > 0.7, ρ P D is negative and decreasing, while the coupling factor f P D is positive and increasing (Supplementary Figure 10AB). This means that both ρ P D and f P D contribute to the decreasing of c II with CV events . We will try to understand the mechanisms underlying their contributions in this subsection.

From Supplementary
The potentiation and depression on the ath synapse can be written as ∆w a,p = ∆w a,p (same) + ∆w a,p (diff) (31) with ∆w a,p (same) (∆w a,d (same)) being the potentiation (depression) caused by pairing central and non-central spikes that belong to the same synchronous event, and ∆w a,p (diff) (∆w a,d (diff)) being the potentiation (depression) caused by pairing central and non-central spikes that belong to different synchronous events. ∆w a,p (same) and ∆w a,d (same) are determined by the statistical features of a single synchronous event, while CV events mainly changes ∆w a,p (diff) and ∆w a,d (diff). To understand the underlying mechanism why ρ P D decreases with CV events when CV events is large, here we consider the correlation between ∆w a,p (diff) and ∆w a,d (diff) ρ P D (diff) = Corr a (∆w a,p (diff), ∆w a,d (diff)) and set τ delay = 0 and τ cross τ ST DP for simplicity. By definition, s z s,d ).
In this equation, s is the index for synchronous events. N a,s is the spike number of the ath non-central neuron in the sth synchronous event, and z s,p (z s,d ) is the potentiation (depression) per spike caused by pairing these N a,s non-central spikes with the central spikes that not in the sth synchronous event.
Using the approximation similar to eq.8, it is easy to show that (note that τ delay = 0 in our calculation) with N 0,s being the spike number of the central neuron in the sth synchronous event, {τ l } (l = 0, 1, · · ·) being a Gamma process with rate r 0 /p and coefficient of variance CV events starting from τ 0 = 0.
Therefore, Therefore, if we let A p = A d = A, we will have and Z = Var a ( i N a,s z s,d ) Armed with these results, it is easy to show that Our analytic calculation well predict the numeric results (Supplementary Figure 10C). Now let's understand the change of ρ P D (diff) with CV events . By definition, B > C, so we can write the denominator of eq.33 as (p + a)B with 0 < a < 1. From Supplementary Figure 10D left, we can see that a ≈ 1 when CV events is small, and gradually decreases when CV events gets large. Therefore, if p is relatively large, then from eq.33 with∝ representing "approximately propotional to". What's more, Therefore, the change of ρ P D (diff) basically reflects the change of s.d. mean ratio of ∞ l=1 exp(− τ l τ ST DP ) (compare Supplementary Figure 10C with 10D right).
When CV events is small, A, B and C are all small, which makes Var a (∆w a,p (diff)) and Var a (∆w a,d (diff)) are Spike patterns are also generated according to Model Sync-Auto 2, but τ delay = 0ms. (D) Statistics of the series {exp(− τ l τ ST DP )} (l = 1, 2, 3, · · ·) as a function of CV events , with {τ l } (l = 0, 1, · · ·) being a Gamma process with rate r 0 /p (here we set r 0 = 20Hz, p = 5) and coefficient of variance CV events starting from τ 0 = 0. A, B and C in the left panel are defined in eqs.34-36, and D in the right panel is the s.d. mean ratio of ∞ l=1 exp(− τ l τ ST DP ). In A-C, the parameters for STDP, the size of converging motifs as well as simulation time and trials are the same as in Figure small; so ∆w a,p (diff) and ∆w a,d (diff) plays a small role in the value of ρ P D = Corr a (∆w a,p , ∆w a,d ) (see eqs.31-32 again for the meaning of ∆w a,p (diff) and ∆w a,d (diff) as well as ∆w a,p and ∆w a,d ). In this case, the interaction of central and non-central spikes that belong to the same synchronous event dominates. However, when CV events gets large, ∆w a,p (diff) and ∆w a,d (diff) get strong, and our calculation on ρ P D (diff) helps to gain insight on ρ P D in this case: the decrease of s.d. mean ratio of ∞ l=1 exp(− τ l τ ST DP ) seems to be the key reason for the uniform decrease of ρ P D when CV events > 0.7 (Supplementary Figure 10A).
The coupling factor f P D significantly increases with CV events , especially when τ cross ≤ τ delay (Supplementary Figure 10B). As c II = 1 + 2ρ P D f P D , and ρ P D < 0, the increase of f P D also contributes to the decrease of c II .
By definition, Var a (∆w a,p ) · Var a (∆w a,d ) Var a (∆w a,p ) + Var a (∆w a,d ) therefore the increase of f P D reflects the decrease of the difference of Var a (∆w a,p ) and Var a (∆w a,d ) relative to their total value. From the equation above, we know that the large difference between Var a (w a,p ) and Var a (w a,d ) is the reason why f P D is small when CV events is small (Supplementary Figure 10B). From our discussions above, we know that when CV events is small, both Var a (w a,p (diff)) and Var a (w a,d (diff)) are small, so Var a (∆w a,p ) and Var a (∆w a,d ) are largely determined by Var a (w a,p (same)) and Var a (w a,d (same)) in this case, which is the variance caused by the interaction of central and non-central spikes that belong to the same synchronous event. In this case, the large difference between Var a (w a,p ) and Var a (w a,d ) reflects the large difference between Var a (w a,p (same)) and Var a (w a,d (same)). This difference always exists as long as τ delay = 0, but is particularly strong when τ cross ≤ τ delay , because in this case Var a (w a,p (same))/N s ≈ 0 (with N s being the number of synchronous events in the spike pattern), and it is easy to show that Var a (w a,d (same))/N s ≈ p 2 (p + 1) exp(− 2τ delay τcross ). When CV events increases, both Var a (w a,p (diff)) and Var a (w a,d (diff)) increase, and as τ delay is small comparing to inter-event interval, Var a (w a,p (diff)) ≈ Var a (w a,d (diff)). This reduces the difference of Var a (∆w a,p ) and Var a (∆w a,d ) relative to their total value, which is the reason for the increase of f P D with CV events .

Classifying Auto-correlation Structure under Synchronous Firing Using Rescaled Time Transform
As mentioned in Section 3.4 of the main text, auto-correlation structure may come into spike patterns with synchronous firing in three ways: 1) The broadness of the distribution of spike number per neuron per synchronous event (AT SpikeNum ).
2) The burstiness/regularity of the pieces of spike trains within synchronous events (AT WithinEvent ).
3) The burstiness/regularity of the occurrence of synchronous events (AT events ).
As we mentioned in the main text, using rescaled time transform ( Figure 10 of the main text), Auto-correlation structure under synchronous firing can be classified into two classes: the factors that contributes to CV rescale that does not influence P-D imbalance, which thereby do not contribute to DriftV under heterogeneity of rates, and the factors that contributes to CV events that influences P-D imbalance, which thereby contribute to DriftV under heterogeneity of rates.
It is easy to think that 1) CV events is influenced by AT events .
2) CV rescale is influenced by AT SpikeNum and AT WithinEvent .
The first point above is apparent. In this section, we will give numeric evidences on the second point above, and on the influence of CV rescale and CV events onto P-D imbalance.

The factors that change CV rescale
Given a spike pattern, we calculate CV rescale like this: we first order all the spikes in the pattern, then the CV rescale of a neuron is defined as the CV value of the indexes of the spikes of the neuron, then the CV rescale of the neuronal population is defined as the averaged CV rescale over all the neurons which fired more than 3 spikes during the simulation time.
We generated spike trains using Model Sync-Auto 1 (see Section 4.1), so that we could explicitly control both AT SpikeNum and AT WithinEvent . We can see that both of these two factors tend to increase CV rescale , but they hardly change P-D imbalance (Supplementary Figure 11AB). On the contrary, CV events can hardly influence CV rescale , but may strongly change P-D imbalance especially when CV events > 1 (Supplementary Figure 11CD).

The influence of CV events on P-D imbalance
To understand how CV events influences P-D imbalance, we generated spike trains using Model Sync-Auto 2 (see Section 7), and study how E a,T (∆w a ) changes with CV events . We found that this influence may be a little complicated (Supplementary Figure 12): 1) Suppose during a synchronous event S 0 , the central neuron fires at time t 0 . Because of the axonal delay τ delay , there is usually a time interval between t 0 and when the spikes from non-central neurons arrive at the axonal terminal, and the typical length of this interval is τ delay . If synchronous events are not close to each other, so that no non-central spikes from synchronous events other than S 0 arrive at the central neuron during this interval (i.e. different synchronous events do not overlap with each other), then E a,T (∆w a ) will increases (or decreases) with CV events if A p exp(τ delay /τ ST DP ) > A d exp(−τ delay /τ ST DP ) (or A p exp(τ delay /τ ST DP ) < A d exp(−τ delay /τ ST DP )). In this paper, typically τ delay < τ ST DP , so these conditions become A p > A d or A p < A d .
2) If synchronous events are allowed to overlap with each other, then CV events will increase the chance of this overlapping when it is too large (typically when CV events > 1). In this case, E a,T (∆w a ) will decrease (or increase) with CV events if τ delay > 0 (or τ delay < 0).

The case when synchronous events do not overlap with each other
Consider a central spike at time t 0 that belongs to a synchronous event S 0 . CV events controlls the burstiness of the occurrence of synchronous events, which influences the efficacy variability as well as P-D imbalance by   The same as C, but for drift per spike. Note that drift per spike hardly changes with CV pattern , but changes significantly with CV event especially when CV event > 1. In A-D, the size of the converging motif, the parameters for STDP, the parameters for synaptic homeostasis and the simulation time and trials are the same as in Figure 4 in the main text. Error bars represent s.e.m.  Figure 4 in the main text. Error bars represent s.e.m.
). After considering all the non-central spikes in all the synchronous events other than S 0 , we have that the trial-averaged change of the ath synapse caused by the interaction of t 0 with the spikes of the ath non-central neuron in all the synchronous events other than S 0 is with p being the mean spike number per neuron per synchronous event, and τ l being the duration of the interval between S 0 and the lth synchronous event before or after it. In Model Sync-Auto 2 (see Section 7), we use Gamma process of rate r 0 /p (with r 0 being the firing rate of a neuron) and coefficient of variance CV events to model the occurrence of synchronous events. Therefore, {τ l } (l = 0, 1, · · ·) is a Gamma process with rate r 0 /p and coefficient of variance CV events , starting from τ 0 = 0. Although Gamma process does not avoid synchronousevents overlap, such overlap is rare when CV events < 1. Therefore, we can still gain understanding on this nooverlap case by investigating the case when CV events < 1. From Supplementary Figure 10D, we see that continuously increases with CV events . From the equation above, this suggests that Because CV events does not change the STDP interaction between t 0 and the non-central spikes in S 0 , the change ∈S0 ∆w a,k (t 0 , t j )] with CV events also reflect the total change of the ath synapse.

The case when different synchronous events are allowed to overlap with each other
We will focus on the case when τ delay > 0 in the following discussions, the case when τ delay < 0 can be similarly understood.
Consider a central spike at time t 0 , and suppose that this spike belongs to a synchronous event S 0 whose mean spike time is att 0 . Because of the axonal delay τ delay , the non-central spikes in S 0 typically arrives at the central neuron at aroundt 0 + τ delay . If another synchronous events S −1 happens immediately before S 0 , then it is possible that during the time interval between t 0 andt 0 + τ delay , the non-central spikes in S −1 also arrive at the central neuron. In this case, these non-central spikes in S −1 will depress the synapses through their interactions with t 0 .
In Model Sync-Auto 2 (see Section 7), we use Gamma process of coefficient of variance CV events to model the occurrence of synchronous events. In this model, such synchronous-events overlap often occurs when CV events is large (typically CV events > 1), thereby depressing the synapses. In the following discussions, we will perform analytic calculations on E T (∆w a ) when the neurons in converging motifs fire according to the spike patterns generated by this model, compare the results with simulations, thereby understanding the effect of the synchronous-events overlap.
It is easy to show that the contribution to synaptic changes by the interaction between t 0 and S 0 is: 2) when τ delay > τ cross , Now we consider the synchronous events other than S 0 . Suppose that the synchronous event immediately before S 0 is S −1 , and the synchronous event immediately after S 0 is S 1 , then under the approximation that τ cross → 0, with q(x) = β α Γ(α) x α−1 exp(−βx) being the inter-event interval distribution with α = 1/CV 2 events and β = r 0 /(CV 2 events p). If we suppose that the synchronous events other than S −1 , S 0 and S 1 are far away from S 0 , so that the interval between them and t 0 is far larger than τ cross and τ delay , then their interaction with t 0 can be approximated by with {τ l } (l = 0, 1, · · ·) being a Gamma process with rate r 0 /p and coefficient of variance CV events , starting from τ 0 = 0. Note that the summation over l in the equation above starts from l = 2, which represents S 2 for depression (k = d) and S −2 for potentiation (k = p). Eq.37-eq.41 together give the approximation of k=p,d E T [ j ∆w a,k (t 0 , t j )], which can be solved numerically.
We compare the results of the calculations above with simulation results. We can see that our analytic calculation is able to qualitatively capture the change of E T (∆w a ) with CV events (Supplementary Figure 12). From Supplementary Figure 12, we can see that 1) Under the case τ delay = 1ms, if A p > A d (or A p < A d ),E T (∆w a ) may increase (or decrease) with CV events when CV events is not large. But when CV events is large (typically CV events > 1), E T (∆w a ) always decreases with CV events , reflecting the synaptic depression caused by synchronous-events overlaps.
2) Under the case τ delay = 0ms, such synchronous-events overlap cannot happen. In this case, E T (∆w a ) These results suggest that if τ delay > 0, the synapses can be depressed by synchronous-events overlaps when CV events is large. When τ delay < 0, the dendritic delay for the post-synaptic spikes to arrive at the dendritic end is longer than the delay for the pre-synaptic spikes to arrive at the axonal terminal. In this case, the time when the central spike t 0 arrives at the dendritic end is typically later than the time when the non-central spikes in S 0 arrives at the axonal 33 terminal by |τ delay |. If the synchronous events S 0 and S −1 are close to each other, so that a non-central spike t j in S −1 arrives at the axonal terminal during this interval, then the synapse will be potentiated by the interaction between t 0 and t j . The analysis for τ delay < 0 is similar to the analysis above for τ delay > 0.

Rates and Heterogeneity of Cross-correlations
The main effect of heterogeneity of rates and heterogeneity of cross-correlations is to induce DriftV. They may also influence the diffusion of synapses, making different synapses have different diffusion strengths. As DriftV ∝ t 2 and DiffV ∝ t, DriftV will dominate in a long run as along as DriftV = 0. For completeness, we briefly discuss the heterogeneity of diffusion strengths under heterogeneity of rates and heterogeneity of cross-correlation, which may play an important role when DriftV ≈ 0.
Till now, our study is based on eq.1 in the main text: with DiffV representing the average diffusion strength of all the synapses in the network. But the topic of this section goes into more details than DiffV, which investigates the heterogeneity of diffusion strengths caused by the heterogeneity of spike train statistics of different neurons.
We have already encoutered a similar problem in Section 2, where we studied how the heterogeneity of rates of the non-central neuron influences DiffV. We did this by supposing that the firing rates of all the non-central neurons are uniformly r s and the firing rate of the central neuron is kept at r 0 , and then investigating how the diffusion strength of a converging motif changes with r s (Supplementary Figure 1B, also see eqs.6-7). This strategy is general for studying the problem of heterogeneity of diffusion strengths. If the spike trains of the non-central neurons or the cross-correlations between the non-central neurons and the central neuron can be quantified by a set of parameters {p 1 , p 2 , · · · , p n }, then we can first suppose that the statistics of the spike patterns of all the non-central neurons are uniform and quantified by a dot P s in the n-dimensional parameter space, and study how the diffusion strength changes with P s , while keeping the statistics of the central neuron unchanged.
We emphasize again that the heterogeneity of diffusion strengths is important for the learning process only when DrV ≈ 0.

The Heterogeneity of Diffusion Strengths Caused by Heterogeneity of Rates
As we discussed in Section 3.5 of the main text, heterogeneity of rates induces DriftV by making use of P-D imbalance. Therefore, its contribution to DriftV is approximately zero when potentiation and depression almost balance with each other. In this case, its influence onto heterogeneity of diffusion strengths may become important.
A similar situation has already been discussed in Section 2, in which spike trains are stationary processes with heterogeneity of rates, and by observing Supplementary Figure 1B we come to the conclusion that: 1) The diffusion strength tends to increase with the firing rate of the non-central neurons; 2) When CV 1 (i.e. the spike trains are regular), the diffusion strength tends to sharply peak when the firing rate of the non-central neurons r s and the firing rate of the central neuron r 0 are equal, and may also peak when r s = 2r 0 , 1 2 r 0 etc.

34
We already explain these phenomena in Section 2. The first factor is because that more non-central spikes induce more freedom to increase the synaptic variability during STDP. The second factor is due to transient crosscorrelation. When synchronous firing is added into the spike pattern, transient cross-correlation may be fragile: as two pieces of spike trains in different synchronous events are hardly correlated with each other, the synaptic changes caused by two central spikes t m and t n (i.e. j ∆w a (t m , t j ) and j ∆w a (t n , t j )) are hard to be correlated if t m and t n belong to different synchronous events, even if m and n are nearby by index. The first factor, however, seems to be a general principle: the diffusion strength should be positively correlated with the firing rate of the non-central neurons.

The Heterogeneity of Diffusion Strengths Caused by Heterogeneity of Crosscorrelations
From eq.38 in the main text, we know that the DriftV caused by heterogeneity of cross-correlation is proportional to r 2 0 Var a [r a ∞ −∞ dτ H(τ )C a (−τ delay − τ )], with r 0 and r a respectively being the firing rate of the central and ath non-central neuron, and C a (τ ) being the unit cross-correlation between them (with lim τ →∞ C a (τ ) = 1). Therefore, DriftV = 0 if ∞ −∞ dτ H(τ )C a (−τ delay − τ ) = 0 for every a, which may happen when C a (−τ delay − τ ) is strictly asymmetric around the STDP time window H(τ ) (Supplementary Figure 13). In this case, C a (τ ) effectively induces synchronous firing between the central neuron and the ath non-central neuron, thus how diffusion strength change with C a (τ ) can be understood with the help of our results on synchronous firing after setting τ delay = 0.
Two important concepts of C a (τ ) are its strength (i.e. 2) As we focused on the case that τ cross τ ST DP when discussing synchronous firing, we did not explore the influence of τ cross onto DiffV in details in this paper. However, it is not hard to understand the influence of the time scale of C a (τ ) onto diffusion strength when τ delay = 0. Suppose C 1 (τ ) and C 2 (τ ) have the same strength, but the time window of C 1 (τ ) is narrower than that of C 2 (τ ) (Supplementary Figure 13), then the single-step change of the 1st synapse will be larger than that of the 2nd synapse during STDP, therefore the 1st synapse tends to diffuse farther away from its initial value than the 2nd synapse.
Mathematically, r a C a (τ ) means the probability density to find a spike of the ath non-central neuron at τ , given an central spike at time 0. If we suppose that the spike train of the ath non-central neuron is Poisson, then d (see eq.13 in the main text) can be written as )r a C a (t)dt If we suppose that  Figure 13: Schematic on how heterogeneity of cross-correlations causes heterogeneity of diffusion strengths. The STDP window is represented by the black curve. Two cross-correlations C 1 (τ ) and C 2 (τ ), indicated by the blue and red curve respectively, are symmetric around H(τ ), but have different widths. Both of them cause zero drift velocity of synaptic efficacies, but the diffusion strength of the 1st synapse is stronger than that of the 2nd one.
we see that d increases with A c but deceases with τ c .