Signal Variability Reduction and Prior Expectation Generation through Wiring Plasticity

In the adult mammalian cortex, a small fraction of spines are created and eliminated every day, and the resultant synaptic connection structure is highly non-random, even in local circuits. However, it remains unknown whether a particular synaptic connection structure is functionally advantageous in local circuits, and why creation and elimination of synaptic connections is necessary in addition to rich synaptic weight plasticity. To answer these questions, we studied an inference task model through theoretical and numerical analyses. We show that a connection structure helps synaptic weight learning when it provides prior expectations. We further demonstrate that an adequate network structure naturally emerges from dual Hebbian learning for both synaptic weight plasticity and wiring plasticity. Especially in a sparsely connected network, wiring plasticity achieves reliable computation by enabling efficient information transmission. Correlations between spine dynamics and task performance generated by the proposed rule are consistent with experimental observations. Author Summary A virtue of the brain that is missing from artificial machines is its ability to reorganize and improve the circuit structure. Neural circuits should be adequately tuned to perform information processing such as decoding of sensory signal from noisy sensory inputs, or motor command generation from stochastic premotor inputs. Activity-dependent modifications of synaptic efficiency through long-term potentiation submitted to bioRxiv 1/32 . CC-BY 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/024406 doi: bioRxiv preprint first posted online Aug. 12, 2015;


Introduction
The amplitude of excitatory and inhibitory postsynaptic potentials (EPSPs and IPSPs), often referred to as synaptic weight, is considered a fundamental variable in neural computation [1] [2].In the mammalian cortex, excitatory synapses often show large variations in EPSP amplitudes [3] [4] [5], and the amplitude of a synapse can be stable over trials [6] and time [7], enabling rich information capacity compared with that at binary synapses [8] [9].In addition, synaptic weight shows a wide variety of plasticity which depend primarily on the activity of presynaptic and postsynaptic neurons [10] [11].Correspondingly, previous theoretical results suggest that under appropriate synaptic plasticity, a randomly connected network is computationally sufficient for various tasks [12] [13].
On the other hand, it is also known that synaptic wiring plasticity and the resultant synaptic connection structure are crucial for computation in the brain [14] [15].Elimination and creation of dendritic spines are active even in the brain of adult mammalians.In rodents, the spine turnover rate is up to 15% per day in sensory cortex [16] and 5% per day in motor cortex [17].Recent studies further revealed that spine dynamics are tightly correlated with the performance of motor-related tasks [18] [19].
Previous modeling studies suggested that wiring plasticity helps memory storage [20] [21] [22].However, in those studies, EPSP amplitude was assumed to be a binary variable, and wiring plasticity was performed in a heuristic manner.Thus it remains unknown what should be encoded by synaptic connection structure when synaptic weights have a rich capacity for representation, and how such a connection structure can be achieved through a local spine elimination and creation mechanism.
To answer these questions, we constructed a theoretical model of an inference task.We found that the computational benefit of a connection structure depends on the sparseness of connectivity.In submitted to bioRxiv 2/32 particular, when a connection is sparse, the connection structure improves performance compared with that of a randomly connected network by reducing signal variability.Based on these insights, we proposed a local unsupervised rule for wiring and synaptic weight plasticity.In the rule, connection structure and synaptic weight learn different components under a dynamic environment, enabling robust computation.
The model also replicates various experimental results on spine dynamics.

Connection structure helps computation in sparsely connected networks
What should be represented by synaptic connections and their weights, and how are those representations acquired?To explore the answers to these questions, we studied a hidden variable estimation task (Fig 1A), which appears in various stages of neural information processing [23] [24] [25].In the task, at every time t, one hidden state is sampled with equal probability from p number of external states s t = {0, 1, ..., p − 1}.Neurons in the input layer show independent stochastic responses r t X,j ∼ N (θ jµ , σ x ) due to various noises (Fig 1B middle), where θ jµ is the average firing rate of neuron j to the stimulus µ, and σ x is the constant noise amplitude.Although, we used Gaussian noise for analytical purposes, the following argument is applicable for any stochastic response that follows a general exponential family, including Poisson firing (S1 Fig) .Neurons in the output layer estimate the hidden variable from input neuron activity and represent the variable with population firing.This task is computationally difficult because most input neurons have mixed selectivity for several hidden inputs, and the responses of the input neurons are highly stochastic (Fig 1C).Let us assume that the dynamics of output neurons are written as follows: where c ij (= 0 or 1) represents connectivity from input neuron j to output neuron i, w ij is its synaptic weight (EPSP size), and h w is the threshold.M and N are population sizes of the input and output layers, respectively.In the model, all feedforward connections are excitatory, and the inhibitory input is provided as the global inhibition I t inh .
If the feedforward connection is all-to-all (i.e., c ij = 1 for all i, j pairs), by setting the weights as w ij = q jµ = θ jµ /σ 2 x for output neuron i that represents external state µ, the network gives an optimal submitted to bioRxiv 3/32 Note that v i is the unnormalized log-likelihood, and the units on the y-axis are arbitrary.
inference from the given firing rate vector r t X , where the value q jµ represents how much evidence the firing rate of neuron j provides for a particular external state µ (for details, see Materials and methods).However, if the connectivity between the two layers is sparse, as it is in most regions of the brain, optimal inference is generally unattainable because each output neuron can obtain a limited set of information from the input layer.How should one choose connection structure and synaptic weights in such a case?We first considered two extreme examples for illustration purposes.One strategy is to use synaptic weight for approximating the optimal representation while keeping the connection random with a fixed connection probability (weight coding).In this case, c and w are given with Pr[c ij = 1] = ρ and w ij = w µj = q jµ /ρ, where the mean connectivity is given as ρ = γ q , and q is the average of the normalized mean response q jµ (i.e., q = 1 M ρ j µ q jµ ).Parameter γ is introduced to control the sparseness of connections, and here we assume that neuron i represents the external state , output neuron i represents the state µ).The other strategy is to use synaptic connectivity for the representation while fixing the synaptic weight (connectivity coding).
If we sort input neurons with their preferred external states, the diagonal components of the connection matrix show high synaptic weights in the weight-coding scheme, whereas the diagonal components show dense connection in the connection-coding scheme (Fig 2A).Note that neither of the realizations is submitted to bioRxiv 4/32 strictly the optimal solution under each constraint.However, as we discuss later, both of them are obtainable through biologically plausible local Hebbian learning rules.The upper left corner represents the performance of the connection-coding scheme (κ c = 1, κ w = 0), and the lower right corner corresponds to that of the weight-coding scheme (κ c = 0, κ w = 1).(F) Estimated log-likelihood ratio between the likelihood calculated in redundant representation and the likelihood derived from optimal inference.Log-likelihood was estimated by log p(s t =µ|{cij ,wij ,r t X,j }) p * (s t =µ|r t X ) i∈Ωµ j (c ij w ij − q jµ ) r t X,j i∈Ωµ .The graph was calculated for combined representations of weight coding and connection coding (i.e., κ = κ w = κ c ).
So which strategy gives a better representation?We evaluated the accuracy of the external state estimation using a bootstrap method (see Materials and methods).Under intermediate connectivity, this phenomenon, we evaluated the maximum transfer entropy of the feed forward connections: Because of limited connectivity, each output neuron obtained information only from the connected input neurons.Thus, the transfer entropy was typically lower under sparse than under dense connections in both strategies (Fig 2D).However, for the connectivity-coding scheme, each output neuron obtained information from relevant input neurons, suppressing the reduction in transfer entropy (orange line in Fig 2D ).Therefore, in the given inference model, the connection structure is helpful for improving performance when the structure increases the transfer entropy of the connections.
In the brain, synaptic connectivity and weights often have some redundancy.For example, the EPSP size of a connection in a clustered network is typically larger than the average EPSP size [6] [26].This positive correlation between connectivity and weight indicates redundancy in the neural representation, and a similar property is expected to hold for interlayer connections [27].Thus, we next considered the function of this redundancy.To this end, we mixed weight coding and connectivity coding as , where ρ µj = min (γ [κ c q jµ + (1 − κ c )q] , 1), and κ w and κ c are the degrees of weight and connectivity coding, respectively (0 ≤ κ w , κ c ≤ 1).Note that (κ w , κ c ) = (1, 0) corresponds to the weight coding, whereas (κ w , κ c ) = (0, 1) corresponds to connectivity coding.In these representations, the performance improved by combining the two schemes (Fig 2E), even if the representation was redundant (i.e., κ w + κ c > 1.0).The log-likelihood ratio with an optimal estimation became higher under a redundant representation (i.e., κ w = κ c > 0.5) for both correct (s t = µ) and incorrect (s t = µ) responses (Fig 2F; calculated for κ w = κ c = κ) because output neurons became overconfident on its decision.Nevertheless, as the amplitude of lateral inhibition became stronger, overall redundant representation was not harmful.

Connection structure enables rapid learning
In the last section, we showed that in a sparsely connected network, non-random connection structure could be beneficial for computation.But is there any benefit to having a connection structure in a dense network?The results in the previous section indicated that when connectivity was sufficiently dense (ρ > 0.4 in the simulation), both performance and the estimated transfer entropy saturated under an appropriate synaptic weight configuration, even if the connectivity was random.Thus, to consider the potential benefits of non-random connection structures, we next implemented synaptic weight learning in our model while fixing the connectivity.Synaptic weights should minimize KL-divergence between the true input distribution and the estimated input distribution to represent the internal model [28] [29].Thus, by submitted to bioRxiv 6/32 considering stochastic gradient descending, synaptic weight change ∆w ij = w t+1 ij − w t ij is given as: The first Hebbian term is derived from the gradient descending, and the second term is the homeostatic term heuristically added to constrain the average firing rates of output neurons [30] (see Materials and methods).We first performed this unsupervised synaptic weight learning on a randomly connected network.When the connectivity was moderately dense, the network successfully acquired a suitable

Dual Hebbian learning rule enables efficient information transmission
So far, we have revealed that in both sparse and dense networks, non-random connection structures can be beneficial for computation or at least for learning.However, in the previous sections, a specific connection structure was given a priori, although structures in local neural circuits are expected to be obtained with wiring plasticity through the elimination and creation of spines.Thus, we next investigated the underlying rewiring rules that can induce beneficial connection structures.To this end, for each combination (i, j) of presynaptic neuron j and postsynaptic neuron i, we introduced a variable ρ ij , which represents the connection probability.The biological correspondence of this variable is discussed below.If we randomly create a synaptic connection between neuron (i, j) with probability ρ ij /τ c and eliminate it with probability (1 − ρ ij )/τ c , on average there is a connection between neuron (i, j) with probability ρ ij , when the maximum number of synaptic connections is bounded by 1.This provides a wiring plasticity rule for a given ρ ij , but how should we choose ρ ij ?Because synaptic connection structure should be correlated with the external model, by considering stochastic gradient descendent by ρ ij on KL-divergence between the true input firing rate distributions and the estimated distribution, the learning submitted to bioRxiv 7/32 rule of ρ is given as Remarkably, although this rule does not maximize the transfer entropy of the connections, the directions of stochastic gradients of two objective functions are on average close to one another; therefore, the above stated rule does not reduce the transfer entropy of the connection on average (see Materials and methods).Fig 4A shows the typical behavior of ρ ij and w ij under this dual Hebbian rule defined by equations ( 2) and (3).When the connection probability is low, a connection between two neurons is rare, and, even when a spine is created due to probabilistic creation, the spine is rapidly eliminated.In the moderate connection probability, spine creation is more frequent, and the created spine survives longer.
When the connection probability is high enough, a connection is nearly always formed, and the synaptic weight of the connection is large because synaptic weight dynamics also follow a similar Hebbian rule.
We implemented the dual Hebbian rule in our model and compared the performance of the model with that of synaptic weight plasticity on a fixed random synaptic connection.Because spine creation and submitted to bioRxiv 8/32 Hebbian learning, a network can indeed acquire a connection structure that enables efficient information transmission between two layers; as a result, the performance increases when the connectivity is moderately sparse (Fig 4G,H).Although the performance was slightly worse than a fully-connected network, synaptic transmission consumes a large amount of energy [31], and synaptic connection is a major source of noise [32]; therefore, it is beneficial to achieve a similar level of performance using a network with fewer connections.
Connection structure can acquire constant components of stimuli and enable rapid learning We have shown that the dual Hebbian learning rule helps computation in a sparsely connected network.
But what happens in densely connected networks?To consider this issue, we extended the previous static external model to a dynamic one, in which at every interval T 2 , response probabilities of input neurons partly change.If we define the constant component as θ const and the variable component as θ var , then the total model becomes where the normalization term is given as Remarkably, some synapses developed connection probability ρ = 1, meaning that these synapses were almost permanently stable because the elimination probability (1 − ρ)/τ c became nearly zero.

Semi-dual Hebbian learning rule explains experimentally observed spine dynamics
The results to this point have revealed the functional advantages of dual Hebbian learning.However, we do not yet know whether the brain really uses such a dual learning rule.Although the dual Hebbian rule appears theoretically preferable, the effects of presynaptic and postsynaptic activity on spine creation and elimination remain unclear [15] [33].Thus we modified the rule such that spine dynamics do not directly depend on neural activities, and demonstrated that the model well replicates experimentally observed spine dynamics and the resultant animal behavior.Under the dual Hebbian rule, both synaptic weight and connection probability follow similar Hebbian-type plasticity rules (Equations ( 2) and ( 3)).Therefore, even if we assume that the change in the connection probability is given as a function of synaptic weight, the rule should still give a good approximation.Thus we defined the semi-dual Hebbian learning rule as The upper equation means that if there is a connection between two neurons, the change in connection probability solely depends on its synaptic weight.Previous experimental results suggest that a small spine is more likely to be eliminated [7] [33], and spine size often increases or decreases in response to LTP or LTD, respectively, with a certain delay [34] [35].Thus we can naturally assume that the connection probability ρ is proportional to spine size.In the absence of a synaptic connection (i.e., c ij = 0), we submitted to bioRxiv 11/32  We next examined the performance of the model in motor learning tasks.Appropriate motor commands are expected to be inferred in the motor cortex based on inputs from pre-motor regions [36] [37].In addition, the connection from layer 2/3 to layer 5 is considered a major pathway in motor learning [38].Thus we hypothesized that the input and output layers of our model roughly correspond to layers 2/3 and 5 of the motor cortex.We first studied the influence of training on spine submitted to bioRxiv 13/32 The 5-day survival rate of a spine was higher for spines created within a couple of days from the beginning of training compared with that of the control, whereas the survival rate converged to the control level after continuous training (Fig 7C).We next considered the relationship between spine dynamics and task performance [18].For this purpose, we compared task performance at the beginning

Discussion
The results of our study propose the following answers to the questions presented in the introduction.
When connections are sparsely organized, the synaptic connection structure should be organized such that the estimated transfer entropy becomes larger than that of a randomly connected network to reduce  D,E) Relationships between creation and elimination of spines and task performance.Performance was calculated from the activity within 2,000-7,000 time steps after the beginning of the test phase.In the simulation, the synaptic elimination was increased fivefold from day 1 to the end of training.

Model evaluation
Spine dynamics depend on the age of the animal [16], the brain region [17], spine shape [39], and many molecules play crucial roles [33] [40], making it difficult for any theoretical model to fully capture the complexity.Nevertheless, our simple mathematical model replicated many key features [7] [18] [19] [33].
For instance, small spines often show enlargement, while large spines are more likely to show shrinkage

Experimental prediction
In the developmental stage, both axon guidance [41] and dendritic extension [42] show Hebbian-type activity dependence, but in the adult cortex, both axons and dendrites seldom change their structures [15].Thus, although recent experimental results suggest some activity dependence for spine creation [43]  In addition, whether or not spine survival rate increases through training is controversial [18] [19].Our model implies that the stability of new spines highly depends on the similarity between new task and control behavior (S2 Fig A ).When the similarity is low, new spines would be expected to be more stable than those in the control case, because the synaptic connection structure also would need to be reorganized.By contrast, when the similarity is high, the stability of the new spines would be comparable to that of the control.Our model additionally replicates the effect of varying training duration for spine stability [18].When training was rapidly terminated, newly formed spines became less stable than those submitted to bioRxiv 16/32

Related studies
Several theoretical investigations have been conducted on phenomenological characteristics of synaptogenesis [45] [46] [47].Some studies further considered the functional implications [20] [22] or optimality in regard to wiring cost [48], but the functional significance of synaptic plasticity and the variability of EPSP size were not considered in those studies.
It was previously determined that learning with two variables on different timescales is beneficial under a dynamic environment [49].In our model, both fast and slow variables played important roles, whereas in previous studies, only one variable was usually effective, depending on the context.In addition, our model provides a biologically plausible interpretation of the learning process with two variables.

Model
Model dynamics We first define the model and the learning rule for general exponential family, and derive equations for two examples (Gaussian and Poisson).In the task, at every time t, one hidden state s t is sampled from prior distribution p(s).Neurons in the input layer show stochastic response r t X,j that follows probabilistic distribution f (r X,j |s t ): Neurons in output layer estimate the hidden variables from input neuron activity.Here we assume maximum likelihood estimation for decision making unit, as the external state is a discrete variable.In this framework, in order to detect the hidden signal, firing rate of neuron i should be proportional to posterior where σ i represents the index of the hidden variable preferred by output neuron i [23] [24].Due to Bayes rule, estimation of s t is given by, submitted to bioRxiv 17/32 where q jµ ≡ h(θ µj ), α(q µj ) ≡ A h −1 (q jµ ) .If we assume the uniformity of hidden states as log p(s t = µ) : const , and 1 M M j=1 α(q µj ) = α o , the equation above becomes q µj g(r t X,j ) + B(r t X,j ) − log p(r t X ) + const.
Let us assume that, at every time t, firing rate of output neurons follow, where, If connection is all-to-all, w ij = q jµ gives optimal inference, because Note that h w is not necessary to achieve optimal inference, however, under a sparse connection, h w is important for reducing the effect of connection variability.In this formalization, even in non-all-to-all network, if the sparseness of connectivity stays in reasonable range, near-optimal inference can be performed for arbitrary feedforward connectivity by adjusting synaptic weight to w ij = w µj ≡ q jµ /ρ µj where Synaptic weight learning To perform maximum likelihood estimation from output neuron activity, synaptic weight matrix between input neurons and output neurons should provide a reverse model of input neuron activity.If the reverse model is faithful, KL-divergence between the true input and the estimated distributions D KL [p * (r t X )||p(r t X |C, W )] would be minimized [28] [29].Therefore, synaptic weights submitted to bioRxiv 18/32 learning can be performed by θ C,W j,µ in the second line is the average response estimated from connectivity matrix C, and weight matrix ).If we approximate the estimated parameter q C,W jµ with q C,W jµ ρ o w ij , by using the average connectivity ρ o , a synaptic weight plasticity rule is given by stochastic gradient descending as As we were considering population representation, in which the total number of output neuron is larger than the total number of external states, there is an redundancy in representation.To make use of most of population, homeostatic constraint is necessary.For homeostatic plasticity, we set a constraint on the output firing rate.By combining two terms, synaptic weight plasticity rule is given as By changing the strength of homeostatic plasticity b h , the network changes its behavior.The learning rate is divided by γ, because the mean of w is proportional to 1 γ .Although, this learning rule is unsupervised, each output neuron naturally selects an external state in self-organisation manner.

Synaptic connection learning
the update rule of connection probability is given as submitted to bioRxiv 19/32 Here, we approximated w ij with its average value w o .In this implementation, if synaptic weight is also plastic, convergence of D KL is no longer guaranteed.However, as shown in Dual Hebbian rule and estimated transfer entropy The results in the main texts suggest that non-random synaptic connection structure can be beneficial either when that increases estimated transfer entropy or is correlated with the structure of the external model.To derive dual Hebbian rule, we used the latter property, yet in the simulation, estimated transfer entropy also increased by the dual Hebbian rule.Here, we consider relationship of two objective functions.Estimation of the external state from the sampled inputs is approximated as j ρ ij q µj g(r t X,j ) − α(q µj ) + B(r t X,j ) ν p(s t = ν) exp j c ij q νj g(r t X,j ) − α(q νj ) + B(r t X,j ) Therefore, by considering stochastic gradient descending, an update rule of ρ ij is given as If we compare this equation with the equation for dual Hebbian rule, both of them are monotonically increasing function of r t Y,i and have the same dependence on g(r t X,j ) although normalization terms are different.Thus, under an adequate normalization, the inner product of change direction is on average positive.Therefore, although dual Hebbian learning rule does not maximize the estimated maximum transfer entropy, the rule rarely diminish it.
Learning rules for synaptic weight and connection are given as submitted to bioRxiv 21/32 Note that the first term of the synaptic weight learning rule coincides with a previously proposed optimal learning rule for spiking neurons [29] [50].In calculation of model error, we error was calculated as , where estimated parameter {q} was given by qjµ = q * jµ qjµ q j qjµ/(pM ) .
Non-normalized estimator qjµ is calculated as qjµ =

Analytical evaluation
Performance In Gaussian model, we can analytically evaluate the performance in two coding schemes.
As the dynamics of output neurons follows membrane potential variable u i , which is defined as determines firing rates of each neuron.Due to normalization 1 M M j=1 q 2 jµ = (r o X ) 2 , mean and variance of {θ jµ } are given as where µ M and σ M are the mean and variance of the original non-normalized truncated Gaussian distribution.Because both r X,j and θ jµ approximately follow Gaussian distribution, u i is expected to follow Gaussian.Therefore, by evaluating its mean and variance, we can characterize the distribution of u i for a given external state [51].
In weight coding In weight coding scheme, w ij and c ij are defined as Similarly, the variance of u i is If s t = µ, as w ij and r x,j are independent, In addition to that, due to feedforward connection, output neurons show noise correlation.If output neuron i belongs to i ∈ Ω µ where s t = µ, whereas l ∈ Ω µ , the covariance between u i and u l satisfies Therefore, approximately (u i , u l ) follows a multivariable Gaussian distributions In maximum likelihood estimation, the estimation fails if a non-selective output neuron shows higher firing rate than the selective neuron.Probability for such a event when there are two output neuron is In connectivity coding In connectivity coding, w ij and c ij are given as From a similar calculation done above, hands, in the connectivity coding, the second term of signal variance is negative, and does not depend on the connectivity.As a result, in the adequately sparse regime, firing rate variability of selective output neuron become smaller in connectivity coding, and the estimation accuracy is better.In the sparse limit, the first term of variance becomes dominant and both schemes do not work well, consequently, the advantage for connectivity coding disappears.Coefficient of variation calculated for signal terms is indeed smaller in connectivity coding scheme (blue and red lines in If we ignore fluctuation of ρ caused by stochastic firing, life expectancy T of a spine with connection probability ρ follows, submitted to bioRxiv 24/32 where Z(ρ) is a normalization factor.Thus, spine age distribution is given as, where T o is time steps corresponding to one day.As expected, 5 days survival rate was higher for older spines in both analytical calculation and simulation (Fig 6E).

Details of simulation
Model settings In the simulation, the external variable s t was chosen from 10 discrete variables (p = 10) with equal probability (Pr[s t = q] = 1/p, for all q).The mean response probability θ jµ was given first by randomly chosen parameters { θµ=0,...,p−1 j=1,...,M } from the truncated normal distribution Accuracy of estimation The accuracy was measured with the bootstrap method.By using data from t − T o ≤ t < t, the selectivity of output neurons was first decided.Ω µ was defined as a set of output neurons that represents external state µ.Neuron i belongs to set Ω µ if i satisfies where operator [X] tof returns 1 if X is true; otherwise, it returns 0. By using this selectivity, based on data from t <= t < t + T o , the accuracy was estimated as In the simulation, T o = 10 3 was used because this value is sufficiently slow compared with weight change but sufficiently long to suppress variability.
Model error Using the same procedure, model error was estimated as where θjq represents the estimated parameter.θjq was estimated by submitted to bioRxiv 26/32 In Fig 5E, the estimation of the internal model from connectivity was calculated by Similarly, the estimation from the synaptic weight was performed with

Figure 1 .
Figure 1.Description of the model.(A) Schematic diagram of the model.(B) An example of model behavior calculated at ρ = 0.16, when the synaptic connection is organized using the weight-coding scheme.The top panel represents the external variable, which takes an integer 0 to 9 in the simulation.The middle panel is the response of input neurons, and the bottom panel shows the activity of output neurons.In the simulation, each external state was randomly presented, but here the trials are sorted in ascending order.(C) Examples of neural activity in a simulation.Graphs on the top row represent the average firing rates of five randomly sampled input neurons for given external states (black lines) and their standard deviation (gray shadows).The bottom graphs are subthreshold responses of output neurons that represent the external state s = 1.Because the boundary condition for the membrane parameterv i ≡ j c ij w ij r t x,j − h w is introduced as v i > max l {v l − v d }, v i is typically bounded at −v d .Note that v i is the unnormalized log-likelihood, and the units on the y-axis are arbitrary.

Figure 2 .
Figure 2. Connection structure helps computation in sparsely connected networks.(A) Examples of synaptic weight matrices in weight-coding (W-coding) and connectivity-coding (C-coding) schemes.X-neurons were sorted by their selectivity for external states.(B) Comparison of the performance between connectivity-coding and weight-coding schemes at various sparseness of connectivity.Orange and cyan lines are simulation results.The error bars represent standard deviation over 10 independent simulations.In the following panels, error bars are trial variability over 10 simulations.Red and blue lines are analytical results.(C) Analytically evaluated coefficient of variation (CV) of output firing rate and corresponding simulation results.For simulation results, the variance was evaluated over whole output neurons from their firing rates for their selective external states.(D) Estimated maximum transfer entropy for two coding strategies.Black horizontal line is the maximal information log e p. (E)Relationships between the performance and the degree of weight coding (κ w ) and connection coding (κ c ).The upper left corner represents the performance of the connection-coding scheme (κ c = 1, κ w = 0), and the lower right corner corresponds to that of the weight-coding scheme (κ c = 0, κ w = 1).(F) Estimated log-likelihood ratio between the likelihood calculated in redundant representation and the likelihood derived from optimal inference.Log-likelihood was estimated by both strategies showed reasonably good performance (as in Fig 1B bottom).Intriguingly, in sparse cases, connectivity coding outperformed weight coding, despite its binary representation (Fig 2B cyan/orange lines).The analytical results confirmed this tendency (Fig 2B red/blue lines) and indicated that the firing rates of output neurons selective for the given external state show less variability in connectivity coding than in weight coding, enabling more reliable information transmission (Fig 2C).To further understand submitted to bioRxiv 5/32 representation (Fig 3A), and the model error (Materials and methods) eventually converged (Fig 3B).Especially under a sufficient level of homeostatic plasticity (Fig 3C), the average firing rate showed a narrow unimodal distribution (Fig 3D top), and most of the output neurons acquired selectivity for one of external states (Fig 3D bottom).However, when a part of the true model was given as the connection structure with ρ µj = min (γ [λq jµ + (1 − λ)q] , 1) , at larger λ, the initial performance became higher and the convergence was faster (Fig 3E, Fig 3F; λ = 0 corresponds to the model with random connectivity).Note that the low correlation between the external model and the connection structure (λ ∼ 0.4) was sufficient to observe this effect.This result suggests that an adequate connection structure can induce fast learning if the structure is correlated with the external model.

Figure 3 .
Figure 3. Synaptic weight learning on random or non-random connection structures.(A) An example of output neuron activity before (top) and after (bottom) synaptic weight learning at connectivity ρ = 0.4.(B) Model error decreases with synaptic weight learning regardless of connectivity.(C) Selectivity and accuracy of estimation at various strengths of homeostatic plasticity at ρ = 0.4.(D) Histogram of average firing rates of output neurons (top), and selectivity of each neuron.Selectivity was defined as for in the simulation depicted in A. (E) Relationship between learning curve and connection structure at connectivity ρ = 0.4 and the strength of homeostatic plasticity b h = 1.0.The parameter λ represents the similarity between the connection structure and the external model.(F) Model error calculated from synaptic weights for the simulation depicted in E.

Figure 4 .
Figure 4. Dual Hebbian learning for synaptic weights and connections.(A) Examples of spine creation and elimination.In all three panels, green lines show synaptic weights, and blue lines are connection probability.When there is not a synaptic connection between two neurons, the synaptic weight becomes zero, but the connection probability can take a non-zero value.Simulation was calculated at ρ = 0.48, η ρ = 0.001, and τ c = 10 5 .(B) Change in connectivity due to synaptic elimination and creation.Number of spines eliminated (red) and created (green) per unit time was balanced (top).As a result, connectivity did not appreciably change due to rewiring (bottom).Black lines in the bottom graph are the mean connectivity at γ = 0.1 and γ = 0.101 in the model without rewiring.(C,D) Accuracy of estimation (C) and the estimated maximum transfer entropy (D) for the model with/without wiring plasticity.For the dual Hebbian model, the sparseness parameter was set as γ = 0.1, whereas γ = 0.101 was used for the weight plasticity model to perform comparisons at the same connectivity (see B). (E) Synaptic weight matrices before (left) and after (right) learning.Both X-neurons (input neuron) and Y-neurons (output neurons) were sorted based on their preferred external states.(F) Accuracy of estimation with various timescales for rewiring τ c .Note that the simulation was performed only for 5 × 10 6 time steps, and the performance did not converge for the model with a longer timescale.(G,H) Comparison of the performance (G) and the maximum estimated transfer entropy (H) between the dual Hebbian model and the model implemented with synaptic plasticity only at various degrees of connectivity.Horizontal line in H represents the total information log e p.
5A).In this case, when the learning was performed only with synaptic weights based on fixed random connections, although the performance rapidly improved, every time a part of the model changed, the performance dropped dramatically and only gradually returned to a higher level (cyan line in Fig 5B).By contrast, under the dual Hebbian learning rule, the performance immediately after the model shift (i.e., the performance at the trough of the submitted to bioRxiv 10/32 oscillation) gradually increased, and convergence became faster (Fig 5B,C), although the total connectivity stayed nearly the same (Fig 5D).After learning, the synaptic connection structure showed a higher correlation with the constant component than with the variable component (Fig 5E; see Materials and methods).By contrast, at every session, synaptic weight structure learned the variable component better than it learned the constant component (Fig 5F).The timescale for synaptic rewiring needed to be long enough to be comparable with the timescale of the external variability T 2 to capture the constant component.Otherwise, connectivity was also strongly modulated by the variable component of the external model (Fig 5G), and unable to provide the expectation.After sufficient learning, the synaptic weight w and the corresponding connection probability ρ roughly followed a linear relationship (Fig 5H).

Figure 5 .
Figure 5. Dual learning under a dynamic environment.(A) Examples of input neuron responses.Blue lines represent the constant components θ const , green lines show the variable components θ var , and magenta lines are the total external models θ calculated from the normalized sum.(B) Learning curves for the model with or without wiring plasticity, when the variable components change every 105 time steps.(C) Accuracy of estimation for various ratios of constant components.Early phase performance was calculated from the activity within 10,000 steps after the variable component shift, and the late phase performance was calculated from the activity within 10,000 steps before the shift.As in B, orange lines represent the dual Hebbian model, and cyan lines are for the model with weight plasticity only.(D) Trajectories of connectivity change.Connectivity tends to increase slightly during learning.Dotted lines are mean connectivity at (κ m , γ) = (0.0, 0.595), (0.2, 0.625), (0.4, 0.64), (0.5, 0.64), (0.6, 0.635), and (0.8, 0.620).In C, these parameters were used for the synaptic plasticity only model, whereas γ is fixed at γ = 0.6 for the dual Hebbian model.(E,F) Model error calculated from connectivity (E) and synaptic weights (F).Note that the timescale of E is the duration in which the variable component is constant, not the entire simulation.(G) Model error calculated from connectivity for various rewiring timescales τ c .For a large τ c , the learning process does not converge during the simulation.(H) Relationship between synaptic weight w and connection probability ρ at the end of learning.When the external model is stable, w and ρ have a more linear relationship than that for the variable case.

Figure 6 .
Figure 6.Spine dynamics of the semi-dual Hebbian model.(A) Comparison of performances among the model without wiring plasticity (cyan), the approximated model (purple), and the dual Hebbian model (orange).(B) Relative change of connection probability within 10 5 time steps.If the original connection probability is low, the relative change after 10 5 time steps has a tendency to be positive, whereas spines with a high connection probability are more likely to show negative change.The black line at the bottom represents eliminated spines (i.e., relative change = -1).(C) Synaptic weight distribution (top), connection probability distribution (middle), and non-bounded connection probability distribution (bottom).Histograms were scaled by 1/(7 × 10 5 ) for normalization.In the bottom panel, for connections with ρ > 1, non-bounded values were defined by ρ est = wγ 2 .See Materials and Methods for details of the analytical evaluation.(D, E) Relationships between spine age and the mean connection probability (D) and the 5-day survival rate (E).As expected from the experimental results, survival rate is positively correlated with spine age.

survival [ 19 ]
(Fig 7A).Below, to compare with experimental results, we defined 10 5 time steps as one day, and the training and control were defined as two independent external models θ ctrl and θ train .In both training and control cases, newly created spines were less stable than pre-existing spines (solid lines vs. dotted lines in Fig 7B), because older spines tended to have larger connection probability (Fig 6D).By continuous training, pre-existing spines became less stable than those in the control case, while new spines became more stable compared with those in the control case (red lines vs. lime lines in Fig 7B).
of the test period among simulations with various training lengths (Fig 7D).Here, we assumed that spine elimination was enhanced during continuous training, as is observed in experiments [18] [19].The performance was positively correlated with both the survival rate at day 7 for new spines formed during the first 2 days and the elimination rate of existing spines (left and right panels of Fig 7E).By contrast, the performance was independent from the total ratio of newly formed spines from day 0 to 6 (middle panel of Fig 7E).Without the assumption of enhanced elimination, total new spines were also positively correlated with the performance (S2 Fig B).These results demonstrate that complex spine dynamics are well described by the semi-dual Hebbian rule, suggesting that the brain uses a dual learning mechanism.

32 Figure 7 .
Figure 7. Influence of training on spine dynamics.(A) Schematic diagrams of the simulation protocols for B,C, and examples of spine dynamics for pre-existing spines and new spines.(B) Spine survival rates for control and training simulations.Dotted lines represent survival rates of pre-existing spines (spines created before day 0 and existing on day 2), and solid lines are new spines created between day 0 and day 2. (C) The 5-day survival rate of spines created at different stages of learning.(D,E) Relationships between creation and elimination of spines and task performance.Performance was calculated from the activity within 2,000-7,000 time steps after the beginning of the test phase.In the simulation, the synaptic elimination was increased fivefold from day 1 to the end of training.

(
Fig 6B).Older spines tend to have a large connection probability, which is proportional to spine size (Fig 6D), and they are more stable (Fig 6E).In addition, training enhances the stability of newly created spines, whereas it degrades the stability of older spines (Fig 7B).
[44], it is still unclear to what extent spine creation depends on the activity of presynaptic and postsynaptic neurons.Our model indicates that in terms of performance, spine creation should fully depend on both presynaptic and postsynaptic activity (Fig 6A).However, it is possible to replicate a wide range of experimental results on spine dynamics without assuming activity dependence of spine creation (Fig 6, 7).
Wiring plasticity of synaptic connection can be given in a similar manner.As shown in Fig 3E, if the synaptic connection structure of network is correlated with the external model, the learning performance gets better.Therefore, by considering Fig 2E, redundant representation yields better performance, thus this approximation is reasonable.To keep the detailed balance of connection probability, creation probability c p (ρ) and elimination probability e p (ρ) need to satisfy (1 − ρ)c p (ρ) = ρe p (ρ) The simplest functions that satisfy above equation is c p (ρ) ≡ ρ/τ c , e p (ρ) ≡ (1 − ρ)/τ c .In the simulation, we implemented this rule by changing c ij from 1 to 0 with probability ≡ (1 − ρ)/τ c for every connection with c ij = 1, and shift c ij from 0 to 1 with probability ρ/τ c for non-existing connection (c ij = 0) at every time step.

)
In the simulation, there are p − 1 distractors per one selective output neuron.Thus, approximately, accuracy of estimation was evaluated by (1 − w ) p−1 .In Fig 2B, we numerically calculated this value for submitted to bioRxiv 23/32 the analytical estimation.
Fig 2C), and the same tendency is observed in simulation (cyan and orange lines in Fig 2C).Spine dynamics In the Gaussian model, because the response probability of input neurons approximately follows a Gaussian distribution, at the equilibrium state, connection probabilities should follow:

34 )
Fig 6D shows the mean connection probability for various spine ages.As seen in previous experimental studies, older spines tend to have larger connection probability.In the evaluation of analytical results, we used an approximation

Figs 1 - 4 , where q = 1 M p j µ θ jµ /σ 2 x
Figs 1-4, where q = 1 M p j µ θ jµ /σ 2 x , and as h w = r o X /γ in Figs 5-7, as the mean of θ depends on κ m .Average connectivity ρ was calculated from the initial connection matrix of each simulation.In the calculation of the dynamics, for the membrane parameter v i ≡ j c ij w ij r t X,j − h w , a boundary condition v i > max l {v l − v d } was introduced for numerical convenience, where v d = −60.In addition, synaptic weight w was bounded to a non-negative value (w > 0), and the connection probability was