Incorporating structural plasticity into self-organization recurrent networks for sequence learning

Introduction Spiking neural networks (SNNs), inspired by biological neural networks, have received a surge of interest due to its temporal encoding. Biological neural networks are driven by multiple plasticities, including spike timing-dependent plasticity (STDP), structural plasticity, and homeostatic plasticity, making network connection patterns and weights to change continuously during the lifecycle. However, it is unclear how these plasticities interact to shape neural networks and affect neural signal processing. Method Here, we propose a reward-modulated self-organization recurrent network with structural plasticity (RSRN-SP) to investigate this issue. Specifically, RSRN-SP uses spikes to encode information, and incorporate multiple plasticities including reward-modulated spike timing-dependent plasticity (R-STDP), homeostatic plasticity, and structural plasticity. On the one hand, combined with homeostatic plasticity, R-STDP is presented to guide the updating of synaptic weights. On the other hand, structural plasticity is utilized to simulate the growth and pruning of synaptic connections. Results and discussion Extensive experiments for sequential learning tasks are conducted to demonstrate the representational ability of the RSRN-SP, including counting task, motion prediction, and motion generation. Furthermore, the simulations also indicate that the characteristics arose from the RSRN-SP are consistent with biological observations.


. Introduction
Spiking neural networks, inspired by biological neural networks, are deemed to possess strong information processing abilities due to temporal encoding (as shown in Figure 1) and variable connection pattern (Zhang et al., 2018b(Zhang et al., , 2021Bellec et al., 2020), which are driven by multiple neural plasticities, such as STDP (Frémaux and Gerstner, 2016;Brzosko et al., 2019), structural plasticity (Caroni et al., 2012;Milano et al., 2020), and homeostatic plasticity (Delvendahl and Müller, 2019;Haşegan et al., 2022). STDP enables the network to modulate its connection weight based on spike timing, while homeostatic plasticity can regulate the excitability of neurons within an appropriate range. Structural plasticity can endow the network with robust adaptability by fine-tuning its mesoscopic connection pattern during the lifecycle. However, it is nontrivial to achieve a stable training procedure for spiking neural networks incorporating multiple neural plasticity mechanisms. The temporal encoding of biological neural networks is a sophisticated information encoding method, which needs to cooperate with . /fnins. .
various neural plasticities to exert strong information processing ability. Although many sophisticated spiking neuron models have been designed, there is a lack of research on neural plasticities. Many current spiking neural networks are simple abstraction of biological neural networks, which makes it not an easy task to train the network. For example, changes in input may occasionally cause a sharp increase or decrease in the firing rate of neurons, resulting in probabilistic non-convergence (Pfeiffer and Pfeil, 2018;Xing et al., 2019) and a lack of adaptation to input (Wang et al., 2020). To address these questions, it is crucial to understand how these plasticities interact to shape neural networks and affect neural signal processing. But, it is difficult and expensive to directly observe the biological neural network at the mesoscopic level, since it consists of a large number of neurons that are dynamically connected through synapses.
To handle the problem mentioned above, it is a potential way to establish a biologically reasonable spiking neural network model that incorporates multiple neural plasticity mechanisms. For example, Lazar et al. proposed a self-organization recurrent network (SORN) driven by multiple neural plasticities (Lazar et al., 2009), which only consists of a recurrent layer and an output layer. The recurrent layer is mainly adapted by STDP and homeostatic plasticity. STDP is used to adjust synaptic weights based on postsynaptic spike activity. In detail, synaptic weight is strengthened when pre-synaptic spike activity is followed by post-synaptic spike activity, while the reverse pattern makes synaptic weight weak. Homeostatic plasticity induces a competition among synaptic connections and maintains spike firing. The simulation results of SORN show that STDP and homeostatic plasticity lead to some non-statistical characteristics of spiking neural networks, such as lognormal-like distribution of synaptic weights, long-term persistence of strong synaptic connections, and power-law distribution of synaptic lifetimes. Inspired by SORN, Aswolinskiy et al. proposed a reward-modulated selforganization recurrent network (RM-SORN) (Aswolinskiy and Pipa, 2015;Dora et al., 2016), in which synaptic weights are adjusted by R-STDP and homeostatic plasticity. R-STDP refers that the outcome of STDP, induced by pre-synaptic and post-synaptic spike activity, is gated by external reward, and the resulting learning rules are no longer unsupervised (Izhikevich, 2007;Anwar et al., 2022).
Despite the fact that the SORN and RM-SORN models are self-organization networks, their connection patterns do not alter continually during the training phase. It means that these spiking neural network models do not really incorporate structural plasticity, and structural plasticity remains under-explored for existing SNNs. Therefore, in this work, we propose a novel rewardmodulated self-organization recurrent network with structural plasticity, in which the connection pattern is continuously adjusted along with the lifecycle. In detail, R-STDP is utilized to generate effective representations for inputs in the recurrent layer, which also helps to achieve efficient mapping in the output layer. Besides, homeostatic plasticity is used to stabilize the excitability of neurons. In particular, structural plasticity is further introduced to simulate the growth and pruning of connections in the recurrent layer, which could well explore the characteristics of structural plasticity for training SNNs. The representational ability of the RSRN-SP is evaluated on three sequence learning tasks, including counting task, motion prediction task, and motion generation task.
In summary, our contributions are as follows: (1) We propose a novel reward-modulated self-organization recurrent network with structural plasticity (RSRN-SP), in which structural plasticity is introduced from neurophysiology to enhance the variability of connection patterns; (2) We experimentally find that structural plasticity could improve the adaptability of the network and reduce the training difficulty; (3) We empirically reveal some characteristics arose from the RSRN-SP are consistent with biological observations, i.e., lognormal-like distribution of connection weight, power-law distribution of connection lifecycle, and a stable tendency for stronger connections; (4) Experiments on three sequence learning tasks show that our method achieve better representation ability than the same type of spiking neural networks such as SORN and RM-SORN. Further analyzes are utilized to demonstrate the effectiveness of structural plasticity.

. Related works
There have been many researches on spiking neuron models and learning rules (Yu et al., 2014;Zhang et al., 2018a;Ju et al., 2020;Xu et al., 2021), where neuron models, learning rules, and network architectures are three essential factors for designing spiking neural networks.

. . Neuron models
The human brain contains billions of neurons, which form structurally complex and computationally efficient networks through dynamic synaptic connections (Bassett and Sporns, 2017). There are various spiking neural models to simulate the temporal coding of neurons in the brain, i.e., Hodgkin-Huxley (HH) model (Izhikevich, 2004), leaky integrate-and-fire (LIF) model (Yu et al., 2014), and binary neuron model (Dayan and Abbott, 2001). The HH model focuses on the microscopic mechanism of spikes, while the LIF model focuses on the computational complexity of spikes. The binary neuron model is developed from the LIF model, has lower computational complexity, and is suitable for building large-scale networks. Therefore, in this work, the binary neuron model is used to construct a spiking neural network.

. . Learning rules
Inspired by biological observations, there are mainly two braininspired learning rules suitable for SNNs (Caporale and Dan, 2008;Frémaux and Gerstner, 2016): Hebb learning rule and STDP learning rule. The former suggests that neurons activated at the same time should have a closer relationship. The latter indicates that synaptic weight is adjusted based on the spike timing of presynaptic and post-synaptic neurons. STDP learning rule can be described as follows, . /fnins. .

FIGURE
The principle of temporal encoding. Given a simple neural network in the dashed box, n pre-synaptic neurons are connected to one post-synaptic neuron. The pre-synaptic neuron pre i generates a spike at time t i , which causes a signal u i continuously sent to the post-synaptic neuron. Once the signal received by the post-synaptic neuron exceeds the threshold, a spike is generated, and the corresponding spiking time is marked as t j . According to neuroscience, information is thought to be encoded in the spiking time sequence, such as t , t , ..., t n , t j .
where w ij represents the weight change of connection from presynaptic neuron j to post-synaptic neuron i. A + , A − , τ + , and τ − are dimensionless constants, which are obtained by fitting neurophysiological data. t = t f i − t f j represents the error between the last spike timing of post-synaptic neuron i and the last spike timing of pre-synaptic neuron j.

. . Spiking neural networks
Spiking neural networks are known as the third-generation neural networks. Due to the brain-like temporal encoding and multiple neural mechanisms, they are considered to have a strong information processing capability. However, this also makes the training of spiking neural networks difficult (Wang et al., 2020). The widely used gradient descent algorithm is difficult to apply to spiking neural networks (Taherkhani et al., 2020). Therefore, many researches combine various mathematical optimization techniques with spiking neural networks, trying to propose a new learning paradigm suitable for SNNs. For example, Xing et al. adopted the ANN-to-SNN strategy to migrate the parameters of the trained ANNs to the SNNs of the same architecture (Xing et al., 2019;Gao et al., 2023). Anwar et al. applied reinforcement learning to spiking neural networks to perform specific tasks, such as Pong and Cartpole game playing (Bellec et al., 2020;Anwar et al., 2022;Haşegan et al., 2022). These spiking neural networks combine biologically reasonable neural mechanisms with reinforcement learning and mathematical optimization to complete sophisticated tasks. Some studies also employ spiking neural networks containing biologically reasonable neural mechanisms to explore the principle of biological neural network information processing. For example, the SORN and RM-SORN models are proposed to explore the coordination of neural mechanisms such as R-STDP, structural plasticity, and homeostatic plasticity (Lazar et al., 2009;Aswolinskiy and Pipa, 2015;Dora et al., 2016).

. Preliminary
Consider a binary neuron model, the neuron state (0 and 1) changes over the inputs. The binary neuron state s t at discrete time t is updated as follows: where θ is the threshold and ψ is the sum of inputs. Once ψ reaches the threshold, the neuron state will be activated as 1, otherwise x = 0. Besides, ψ is calculated as follows: where w ij is the weight between neuron i and neuron j. u i is the external signal received by neuron i. The notations used in this work are explained in Table 1.

. . . Counting task
The task objective is to predict the subsequent element by modeling the structure of a recurrent sequence. Consider a m times recurrent sequence as follows: where each subsequence contains a start flag, an end flag and a fixed number of a repeated element (i.e., a, c, and b). Taking an element as the input, the model aims to accurately predict the next element.
For example, given a as the input, and the ground-truth is b. This task is designed to test the memory property of the model.

. . . Motion prediction task
The task objective is to predict the subsequent element by modeling the structure of a recurrent sequence, in which all elements are associated with different spatial positions. Consider a m times recurrent sequence as follows: where each subsequence contains n integers from 1 to n. Taking an element as the input, the model aims to accurately predict the next element. Because the elements are associated with different spatial positions, this task can be interpreted as the left to right motion of an object along an axis.

. . . Motion generation task
The task objective is to generate the m times recurrent sequence same as (5), with the output serving as the input and no external teaching signals. For example, if the current output of the model is equal to 1 and serves as the next input, the next output should be 2.

. . Overview architecture
The overall architecture of RSRN-SP is shown as in Figure 2, which is composed of two general modules, i.e., a recurrent layer, and an output layer.

. . . Recurrent layer
The recurrent layer extracts features of inputs and stores them in its variable connection patterns and weights. The neurons in the layer are divided into two groups: excitatory and inhibitory neurons. N E and N I are utilized to denote the numbers of them, N I = 0.2 × N E . They are connected through weighted connections, which is denoted as a weight matrix W. The element w ij in the weight matrix is the connection weight from neuron j to neuron i. The connections among excitatory neurons are sparse, while full connections exist between excitatory and inhibitory neurons. The initial connection density among excitatory neurons is controlled by the average connection fraction p c of neurons. It should be noted that there exist no self-connections and connections among inhibitory neurons.

. . . Output layer
The output layer maps the features stored in the recurrent layer into interpretable and specific-task outputs. It only contains excitatory neurons without connections among each other. There is a feedback connection from the output layer to the recurrent layer.

. . Evolution
The input of RSRN-SP is a time sequence that contains different symbols (letters or digits). Each symbol corresponds to a subset of neurons in the recurrent layer, which and only which can receive the corresponding input symbol. When a certain symbol in the sequence is input to the model, only neurons in the corresponding subset are activated, while other neurons remain silent. The state updating for different types of neurons are as follows: where is the Heaviside step function, and u k (t) is the external signal of neuron k at time t. The weights are uniformly drawn from [0, 1] and initially normalized as N j w ij = 1.

. . Learning rule of R-STDP
According to the learning rule of R-STDP (Frémaux and Gerstner, 2016;Yuan et al., 2018), the change of connection weight is not only controlled by spikes, but also regulated by reward signal, as illustrated in Figure 2B, which can be described as follows: Where η is the learning rate of the weight. e denotes a synaptic eligibility trace to temporarily store the outcome of R-STDP, which can be still available for a delayed reward signal. The eligibility trace is computed as: Frontiers in Neuroscience frontiersin.org . /fnins. . where s i and s j are the activation of pre-and post-synaptic neurons, respectively. τ e is a time constant. f is a dimensionless parameter (f = 1 for W EE and f = 0.01 for W OE ). According to Equation (9), when neuromodulation factor M = R − b is not equal to 0, i.e., reward R deviates from baseline b, the connection weight will be updated. In our model, for the counting task and motion prediction task, the reward R is set to 1 for correct output, while either 0 or −1 for incorrect output. The baseline b is set to the moving average of R. For the generation task, when the target sequence is correctly generated, the reward R that is proportional to the length of the correctly generated sequence will be given.

. . Structural plasticity
Structural plasticity is a fundamental neural mechanism of the biological neural network in the brain, which is demonstrated to have a critical role in regulating circuit connection during learning (Caroni et al., 2012). Structural plasticity refers that old synaptic connections may be pruned and new synaptic connections formed during the self-organization of neural networks (Lamprecht and LeDoux, 2004). In our model, we apply structural plasticity to the connections among excitatory neurons. New connections will be added between two unconnected neurons with a probability p sp ∈ (0, 1), and their weights are initialized as 0.001. p sp is finetuned as a hyper-parameter to stabilize the recurrent layer. Old . /fnins. .
connections will be pruned if their weights are less than a near-zero threshold w th ∈ (0, 1). Structural plasticity can be formulated as follows: where c ij ∈ {0, 1} indicates whether a connection from neuron j to neuron i exists or not. If c ij = 1, the connection exists; if c ij = 0, the connection does not exist. rnd ∈ (0, 1) is a uniformly distributed random number. w ij denotes the weight of connection from neuron j to neuron i.

. . Homeostatic plasticity
Homeostatic plasticity is critical to alleviate the instability of neural networks. Two common homeostatic mechanisms are utilized in our method: synaptic normalization and intrinsic plasticity. The synaptic normalization is formulated as: Where the weights of all afferent connections to a neuron are proportionally scaled to make their sum equal to 1. Synaptic normalization can promote healthy competition among connections that connect to the same neuron.
The intrinsic plasticity enables to adjust the thresholds of excitatory neurons by an average firing rate µ ip , which can be formulated as follows: Where η ip is the learning rate of the threshold. For excitatory neurons in the recurrent layer, µ ip is fine-tuned in a range of [0.05, 0.25]. In the output layer, µ ip is uniquely set for each neuron, corresponding to the expected occurrence probability of the symbol represented by the neuron. Due to the intrinsic plasticity, the threshold of a neuron in our model will increase if it is too active; otherwise, the threshold will decrease.

. . Two-stage training
A two-stage training scheme is proposed for RSRN-SP. In the first stage, the model is trained with R-STDP, homeostatic plasticity, and structural plasticity. In the second stage, the connection pattern and weight of the recurrent layer are fixed. The connection weight of the output layer is fine-tuned for specific tasks. The first stage is alternated with the second stage. In each alternation, the model takes about 100 steps at the first stage, and then proceeds to the second stage to take about 20,000 steps. During training, the alternation will be repeated about 200 times. During inference, the model will be evaluated on testing data. For the motion generation task, the output of the model is fed back as its input during training, and the model is used to generate desired output during inference. The algorithm for the first stage is listed here. The second-stage algorithm is similar, which only applies R-STDP and homeostatic plasticity at the output layer.
Require: Sequence that meets the counting task, motion prediction task, and motion generation task.
Ensure: Letter or integer predicted or generated based on the sequence. compute e ij based on Equation (10) 10:  . Experiments

. . Experimental settings
Our model is evaluated on three tasks: counting task (Lazar et al., 2009), motion prediction (Aswolinskiy and Pipa, 2015), and motion generation (Aswolinskiy and Pipa, 2015). The comparison approaches mainly involve SORN (Lazar et al., 2009) and RM-SORN (Aswolinskiy and Pipa, 2015). Notably, in SORN and RM-SORN, the weights to the output layer are trained with linear Frontiers in Neuroscience frontiersin.org . /fnins. . regression, and there is no structural plasticity applied in the recurrent layer. The model is implemented in Python programming on a Windows 10 computer with NVIDIA RTX 1080Ti. Source code and parameters are released at Github.
. . Experiments for counting task . . . Evaluation protocol Two kinds of evaluation protocols are used: (1) overall performance is to evaluate the matching of all letters in a sequence; (2) counting performance is to evaluate the prediction accuracy for all subsequences in a sequence. The bold values indicate that the method outperforms other models under the same conditions.

. . . Results
The results for the counting task are shown in Figure 3 and Table 2. It can be observed that as the number n of the repetition of a letter in a subsequence increases, the overall performance fluctuates around 90% within a relatively narrow range, while the counting performance declines. This is because the value of n is proportional to the number of input patterns that the recurrent layer needs to learn. Larger n increases the difficulty of predicting the last letter of a subsequence, but reduces the difficulty of predicting the other letters of this subsequence. The counting performance gap among our model, SORN, and RM-SORN can be explained by the difference between reward-modulated learning and offline linear regression. SORN and RM-SORN try to learn all separable input patterns and minimize global mapping errors, whereas the reward of our model is computed as moving average within a time window, in which each individual input pattern is more effectively learned.
. . Experiments for motion prediction task . . . Results Figure 4A and Table 3, when n is small, our model, SORN and RM-SORN have high prediction accuracy. As The bold values indicate that the method outperforms other models under the same conditions. n increases, the accuracy of the three models gradually decreases, and the accuracy of our model becomes higher than RM-SORN and lower than SORN. It is worth noting that in our model, increasing the number of neurons can greatly increase the performance of the model. Considering that our model has much less training difficulty than SORN, it can achieve an accuracy similar to SORN by increasing the number of neurons when n is very big.

As shown in
. . Experiments for motion generation task . . . Evaluation protocol The performance is calculated as the percentage of the symbols belonging to the target sequence to the total number of symbols. For example, assuming that the desired sequence is 1234. If the desired sequence is generated, the model receives the full reward of unit 1. Otherwise, it receives the reward of 3 4 for the sequence x123, 2 4 for the sequence xx12 and 1 4 for the sequence xxx1.

. . . Results
As shown in Figure 4B and Table 4, without external teaching signals, our model can still generate the desired sequence The bold values indicate that the method outperforms other models under the same conditions.
accurately. Success in this task shows that the model can generate an arbitrary sequence with the same symbol distribution as in the motion sequence. As a result of the self-organization driven by multiple plasticity, the recurrent layer can an create effective representation of inputs. The model containing 400 neurons outperforms that containing 200 neurons, suggesting that the memory capacity of the model is closely related to the number of neurons.

. . Influence of structural plasticity
To study structural plasticity, the models with different N and p c are constructed in the counting task. Figure 5 and Table 5 suggest that the recurrent layer with structural plasticity outperforms that without structural plasticity. The performance improvement is larger when the recurrent layer is initialized with sparse connectivity. The recurrent layer with structural plasticity has great advantage, which is prominent when the connection patterns are not reasonably initialized. In the case of initial p c = 0.002, the   (0) is denoted as "(mean ± std)".

. . Synaptic connection characteristics of RSRN-SP
The connection pattern of cortex exhibits some fundamental characteristics (Zheng et al., 2013), e.g., lognormal-like distribution of synaptic weight, power-law distribution of synaptic lifecycle, and a tendency for stronger connections to be more stable. To study whether these characteristics exist in RSRN-SP, we simulated a model of 200 and 400 on the counting task of n = 8. As shown in Figures 6A, B, the synaptic weights exhibit lognormal-like distribution, which is consistent with biological observations (Song et al., 2005;Loewenstein et al., 2011). Figures 6C, D demonstrated that the distribution of lifecycle of newly formed connections can be roughly fitted by a power law. Most of newly formed connections tend to disappear and only a few of them can persist and become strong.

. Discussion and conclusion
To understand how multiple plasticities interact to shape biological neural networks and affect neural signal processing, we proposed a novel spiking neural network incorporating with multiple neural plasticity from neurophysiology, e.g., reward-modulated spike timing-dependent plasticity, homeostatic plasticity, and structural plasticity. In particular, homeostatic plasticity and reward-modulated spike timing-dependent plasticity are used to promote the consistency between the network updating and brain learning, which help to guide the updating of connection weight during training SNNs. Specially, structural plasticity is introduced to simulate the growth and pruning of connections in . /fnins. . the network, which could guarantee the consistency between the network structure and brain structure.
Here, our work attempts to combine R-STDP with other plasticity mechanisms to achieve better training results. The simulations demonstrated that (1) reward-modulated spike timing-dependent plasticity, structural plasticity, and homeostatic plasticity can work in coordination to empower neural networks to learn; (2) structural plasticity weakens the network connection stability but enhances its ability to adapt to the input; (3) RSRN-SP could effectively learn the representation of the input, and achieves better performance on sequence learning tasks than the same type of spiking neural network including SORN and RM-SORN. Furthermore, the simulations also indicate that the characteristics arose from RSRN-SP are consistent with biological observations. Compared to the widely used artificial neural networks, our spiking neural network is not easy to train due to complex temporal encoding, variable connection pattern, and diverse plasticity mechanisms. One challenge stems from the temporal encoding, which allows information to be processed in the form of spikes in SNNs. However, spikes are not mathematically differentiable, making it difficult to apply traditional gradient-based optimization algorithms. The generation of new connections and the disappearance of old connections also increase the difficulty of network training. To address these challenges, some studies have explored R-STDP (Frémaux and Gerstner, 2016), which is considered a biologically plausible learning algorithm suitable for SNNs. Nevertheless, how efficient learning of SNNs can be achieved by R-STDP, while maintaining sustained balanced network activity remains an open question.

Data availability statement
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.