Explaining cocktail party effect and McGurk effect with a spiking neural network improved by Motif-topology

Network architectures and learning principles have been critical in developing complex cognitive capabilities in artificial neural networks (ANNs). Spiking neural networks (SNNs) are a subset of ANNs that incorporate additional biological features such as dynamic spiking neurons, biologically specified architectures, and efficient and useful paradigms. Here we focus more on network architectures in SNNs, such as the meta operator called 3-node network motifs, which is borrowed from the biological network. We proposed a Motif-topology improved SNN (M-SNN), which is further verified efficient in explaining key cognitive phenomenon such as the cocktail party effect (a typical noise-robust speech-recognition task) and McGurk effect (a typical multi-sensory integration task). For M-SNN, the Motif topology is obtained by integrating the spatial and temporal motifs. These spatial and temporal motifs are first generated from the pre-training of spatial (e.g., MNIST) and temporal (e.g., TIDigits) datasets, respectively, and then applied to the previously introduced two cognitive effect tasks. The experimental results showed a lower computational cost and higher accuracy and a better explanation of some key phenomena of these two effects, such as new concept generation and anti-background noise. This mesoscale network motifs topology has much room for the future.


. Introduction
Spiking neural networks (SNNs) are considered the third generation of artificial neural networks (ANNs) (Maass, 1997). The biologically plausible network architectures, learning principles, and neuronal or synaptic types of SNNs make them more complex and powerful than ANNs (Hassabis et al., 2017). It has been reported that even a single cortical neuron with dendritic branches needs at least a 5-to-8-layer deep neural network for finer simulations (Beniaguev et al., 2021), whereby non-differential spikes and multiply-disperse synapses make SNNs powerful on tools for spatially-temporal information processing. In the field of spatially-temporal information processing, there has been much research progress significant amounts of research into SNNs for auditory signal recognition (Shrestha and Orchard, 2018;Sun et al., 2022) and visual pattern recognition Zhang M. et al., 2021). This paper highlights two fundamental elements of SNNs and the main differences between SNNs and ANNs: specialized network design and learning principles. The SNNs encode spatial information using fire rate and temporal information using spike timing, providing hints and inspiration that SNNs can integrate into visual and audio sensory data.
For the network architecture, specific cognitive topologies developed via evolution are highly sparse and but efficient in SNNs (Luo, 2021), whereas equivalent ANNs are densely recurrent. Many researchers attempt have tried to understand the biological nature of efficient multi-sensory integration by focusing on the visual and auditory pathways in biological brains (Rideaux et al., 2021). These structures are adapted for some specific cognitive functions, e.g., efficient actions. For example, an impressive sparse network filtered from the C. Elegans connectome can outperform other dense networks during reinforcement learning of the Swimmer task. Some biological discoveries can further promote the research development of structure-based artificial operators, including but not limited to lateral neural interaction (Cheng et al., 2020), the lottery hypothesis (Frankle and Carbin, 2018), and meta structure of network motif (Hu et al., 2022;Jia et al., 2022). ANNs using these structure operators can then be applied in different spatial or temporal information processing tasks, such as image recognition (Frankle et al., 2019;Chen et al., 2020), auditory recognition, and heterogeneous graph recognition (Hu et al., 2022). Furthermore, when only focusing on the learning of weight, the weight agnostic neural network (Gaier and Ha, 2019;Aladago and Torresani, 2021) is a representative of the methods that only train the connections instead of weights.
For the learning principles, SNNs are more tuned affected by learning principles from biologically plausible plasticity principles, such as spike-timing dependent plasticity (STDP) (Zhang et al., 2018a), short-term plasticity (STP) (Zhang et al., 2018b), and reward-based plasticity (Abraham and Bear, 1996), instead of by the pure multi-step backpropagation (BP) (Rumelhart et al., 1986) of errors in ANNs. The neurons in SNNs will be activated once the membrane potentials reach their thresholds, which makes them energy efficient. SNNs have been successfully applied on to visual pattern recognition Zeng et al., 2017;Zhang et al., 2018aZhang et al., ,b, 2021a, auditory signal recognition Wang et al., 2023), probabilistic inference (Soltani and Wang, 2010), and reinforcement learning (Rueckert et al., 2016;Zhang D. et al., 2021).
For the two classic cognitive phenomena, the cocktail party effect describes the phenomenon that in a high-noise environment (e.g., noise from the environment or other speakers), the listener learns to filter out the background noise (including music noise and sounds from other speakers) and concentrate on only the target speaker, as shown in Figure 1A. The McGurk effect introduces the concept that the voice may be misclassified when the auditory stimulus conflicts with visual cues. A classic example of the McGurk effect describes how the new concept [da] can be generated by the integration of specific auditory input [ba] and visual cues [ga], as shown in Figure 1B.
This work focuses on the key characteristics of SNNs in information integration, categorization, and cognitive phenomenon simulation. We analyzed Motifs (Milo et al., 2002) in SNNs to reveal the essential functions of key meta-circuits in SNNs and biological networks and then used Motif structures to build loop modules in SNNs. Furthermore, a Motif-topology improved SNN (M-SNN) is proposed for simulating cocktail party effects and McGurk effects. To the best of our knowledge, we are the first to solve the problem using combinations of highly abstract Motif units. The following are the primary contributions of this paper: • Networks with specific spatial or temporal types of Motifs can improve the accuracy of spatial or temporal classification tasks compared with networks without Motifs, making the multi-sensory integration easier by integrating two types of Motifs. The remaining parts are grouped as follows: Section 2 reviews the research about on the architecture, learning paradigms, and two classic cognitive phenomena. Section 3 describes the pattern of Motifs, the SNN model with neuronal plasticity, and learning principles. Section 4 verifies the convergence, the advantage of M-SNN in simulating cognitive phenomena, and the computational cost. Finally, a short conclusion is given in Section 5.

. Related works
For the architecture, the lateral interaction of neural networks, the lottery hypothesis, and the network motif circuits are novel operators in structure research. In the research on lateral interaction, most studies have taken the synapse as the basic unit, including the lateral interaction in the convolutional neural network (Cheng et al., 2020) or that in the fully connected network . However, these methods take synaptic connections as the basic unit and only consider learning effective structures without considering meta-structure composition.
Network motifs (Milo et al., 2002;Prill et al., 2005) use primary n-node circuit operators to represent the complex network structures. The feature of the network (e.g., visual or auditory pathways) could be reflected by the number of different Motif topologies, which is called Motif distribution. To calculate the Motif distribution, the first Motif tool is mfinder, which implements the algorithm of full enumeration (randomly picking the edges from . /fnins. . the graph and counting the probability of n-node subgraphs). Then the FANMOD (Wernicke and Rasche, 2006) was introduced as a more efficient tool for finding reliable network motifs. For learning paradigms, there are many methods have been proposed, such as the ANN-to-SNN conversion (i.e., directly training ANNs and then equivalently converting to SNNs; , proxy gradient learning (i.e., replacing the non-differential membrane potential at firing threshold by an infinite gradient value; Lee et al., 2016), and the biologicalmechanism inspired algorithms [e.g., the SBP (Zhang et al., 2021a) which was inspired by the synaptic plasticity rules in the hippocampus, the BRP (Zhang et al., 2021b), which was inspired by the reward learning mechanism, and the GRAPES, that inspired by the synaptic scaling (Dellaferrera et al., 2022)]. Compared to other learning algorithms, biologically inspired algorithms are more similar to the process of how the human brain learns.
For the cocktail party effect, many effective end-to-end neural network models have been proposed (Ephrat et al., 2018;Chao et al., 2019;Hao et al., 2021;Wang et al., 2021). However, the analysis of why these networks work is very difficult since the functional structures in these black-box models are very dense without clear function diversity. As a comparison, the network motif constraint in neural networks might resolve this problem to some extent, which until now and as far as we know, however this has not yet been well-introduced. For the McGurk effect, only a limited number of research papers have discussed the artificial simulation of it, partly caused by the simulation challenge, especially on the conflict fusion of visual and auditory inputs (McGurk and MacDonald, 1976;Hirst et al., 2018), e.g., self-organized mapping (Gustafsson et al., 2014).

. . Spiking dynamics
The leaky integrated-and-fire (LIF) neuron model is biologically plausible and is one of the simplest models to simulate spiking dynamics. It includes non-differential membrane potential and the refractory period, as shown in Figure 1D. The LIF neuron model simulates the neuronal dynamics with the following steps.
First, the dendritic synapses of the postsynaptic LIF neuron will receive presynaptic spikes and convert them to a postsynaptic current (I syn ). Second, the postsynaptic membrane potential will be leaky or integrated, depending on its historical experience. The classic LIF neuron model is shown as the following Equation (1).
where V t represents the dynamical variable of membrane potential with time t, dt is the minimal simulation time slot (set as 0.01ms), τ m is the integrative period, g L is the leaky conductance, g E is the excitatory conductance, V L is the leaky potential, V E is the reversal potential for excitatory neuron, and I syn is the input current received from the synapses in the previous layer. We set values of conductance (g E , g L ) to be 1 in our following experiments for simplicity, as shown in Equation (3).
Third, the postsynaptic neuron will generate a spike once its membrane potential V t reaches the firing threshold V th . At the same time, the membrane potential V will be reset as the reset potential V reset , shown as the following Equation (2).
where the refractory time T ref will be extended to a larger predefined T 0 after firing.
In our experiments, the three steps for simulating the LIF neurons were integrated into the Equation (3).
where C is the capacitance parameter, S i,t is the firing flag of neuron i at timing t, V i,t is the membrane potential of neuron i at timing t, V rest is the resting potential, and W i,j represents the synaptic weight between the neuron i and j.

. . Motif topology
The n-node (n ≥ 2) meta Motifs have been proposed in past research. Here, we use the typical 3-node Motifs to analyze the networks, which have been widely used in biological and other systems (Milo et al., 2002;Shen et al., 2012;Zhang et al., 2017). Figure 1F displayed all 13 varieties of 3-node Motifs. In previous studies, network topology had been transformed into parameter embeddings in the network (Liu et al., 2018). In our SNNs, the Motifs were used by the Motif masks and then applied into the recurrent connection at the hidden layer. The typical Motif mask is a matrix padded with 1 or 0, where 1 and 0 represent the connected and non-connected pathways, respectively. We introduce the Motif circuits into the hidden layer, and the Motif mask in the rdimension hidden layer l at time t is represented as the M r,l t as shown in Equation (4) where f (·) is the indicator function. Once the variable in f (·) satisfies the conditions, the function value would be set as one; otherwise, zero. m i,j , (i, j = 1, · · · r) are elements of synaptic weight W r,l t . The network motif distribution is calculated by counting the occurrence frequency of network motif types. We enumerate every 3-node assembly (including Motifs and other non-Motif types) and only count the 13-type 3-node connected subgraphs of Motifs with the help of FANMOD (Wernicke and Rasche, 2006). In order to integrate the Motifs learned from different visual and auditory datasets, we propose a multi-sensory integration algorithm by integrating Motif masks with different types learned from visual or auditory classification tasks. Hence, the integrated Motif connections have both visual and auditory network patterns, as shown in Figure 2. Equation (5) shows the integrated equation with visual and auditory Motif masks.
. /fnins. . are not. The membrane potentials in the hidden multi-sensoryintegration layer are updated by both feed-forward potential and recurrent potential, shown in the following Equation (6): where C is for capacitance, S i,t is the firing flag of neuron i at time t, S f i,t and S r i,t are the firing flags of neuron i in the feedforward process and recurrent process, respectively, V i,t denotes the membrane potential of neuron i at timing t, which includes feed-forward V f i,t and recurrent V r i,t , V rest is the resting potential, W f i,j is the feed-forward synaptic weight from the neuron i to the neuron j, and W r i,j is the recurrent synaptic weight from the neuron i to the neuron j. M r,l t is the mask incorporating Motif topology to further alter feed-forward propagation further. The historical information is saved in the forms of recurrent membrane potential V r i,t , where spikes are created after the potential reaches a firing threshold, as illustrated in Equation (7). where and S r i,t are introduced in the previous Equation (6). V reset is the reset membrane potential. τ ref is the refractory period. t s f is the previous feed-forward spike timing and t s r is the previous recurrent spike timing. T 1 and T 2 are time windows.

. . Neuronal plasticity and learning principle
We use three key mechanisms during network learning: neuronal plasticity, local plasticity, and global plasticity.
Neuronal plasticity emphasizes spatially-temporal information processing by considering the inner neuron dynamic characteristics , different from traditional synaptic plasticities such as STP and STDP. The neuronal plasticity for SNNs approaches the biological network and improves the learning power of the network. Rather than being a constant value, the firing threshold is set by an ordinary differential equation shown as follows: where S f t is the input spikes from the feed-forward channel. S r t is the input spikes from the recurrent channel. a i,t is the dynamic threshold, which has an equilibrium point of zero without input spikes or − β α−1 with input spikes S f +S r from the feed-forward and recurrent channels. Therefore, the membrane potential of adaptive LIF neurons is updated as follows: where the dynamic threshold a i,t is accumulated during the period from the resetting to the membrane potential firing and finally attains a relatively stable value a * i,t = β 1−α (S f t + S r t ). Because of the −γ a i,t , the maximum firing threshold could reach up to V th + γ a i,t .
We set α = 0.9 to guarantee that the coefficient of a i,t is −0.1, β = 0.1 to ensure that the spike has the same weight as a i,t , and set γ to the common value of 1. Accordingly, the stable a * t = 0 for no input spikes, a * t = 1 for one input spike, and a * t = 2 for input spikes from two channels. When a i,t < (S f t + S r t ), the threshold a i,t will increase, otherwise, the threshold a i,t will decrease. It is clear that the threshold will change in the process of the neuron's firing, and as the firing frequency of the neuron increases, the threshold will also elevate, or vice versa.
For local plasticity, the membrane potential at the firing time is a non-differential spike, so local gradient approximation (pseudo-BP) (Zhang et al., 2021b) is usually used to make the membrane potential differentiable by replacing the non-differential part with a predefined number, shown as follows: where Grad local is the local gradient of membrane potential at the hidden layer, S i,t is the spike flag of neuron i at time t, V i,t is the membrane potential of neuron i at time t, and V th is the firing threshold. V win is the range of parameters for generating the pseudo-gradient. This approximation makes the membrane potential V i,t differentiable at the spiking time between an upper bound of V th + V win and a lower bound of V th − V win . For global plasticity, we used reward propagation, which has been proposed in our previous work (Zhang et al., 2021b). As shown in Figure 1C, the gradient of the hidden layer in training is generated from the input type-based expectation value and output error-based expectation value by transformed matrix (input type-based expectation matrix and output error-based expectation matrix), respectively, then the gradient signal will be directly given to all hidden neurons without layer-to-layer backpropagation, shown as follows: where h f ,l is the current state of layer l and, R t is the predefined input-type based expectation value. A predefined random matrix B f ,l rand is designed to generate the reward gradient Grad R l . Grad R L is the gradient of the last layer, B f ,L is the predefined identity matrix, and e f ,L is the output error. W f ,l t represents the synaptic weight at layer l in feed-forward phase, W r,l t is the recurrent-type synaptic modification at layer l which represents defined by both Grad R l by reward learning and Grad t+1 by iterative membrane-potential learning, and the Grad t+1 means the gradient obtained at t + 1 moment (Werbos, 1990). The M r,l t is the mask incorporating Motif topology to influence the propagated gradients further.

. . The learning procedure of M-SNN
The overall learning procedures of the M-SNN were shown in Algorithm 1, including the raw signal encoding, Motif structure integration, and cognitive effect simulation.

. Experiments . . Visual and auditory datasets
The MNIST dataset (LeCun, 1998) was selected as the visual sensory dataset. The MNIST dataset contains 60,000 28×28 onechannel grayscale images of handwritten digits from zero to nine for training, and there are also 10,000 of the same type of data for testing. The TIDigits dataset (Leonard and Doddington, 1993) was selected as the auditory sensory dataset, containing 4,144 spoken digit recordings from zero to nine. Each recording was sampled at 20 kHz for around one second and then transformed to the frequency domain with 28 frames and 28 bands by the Mel Frequency Cepstral Coefficient (MFCC) (Sahidullah and Saha, 2012). Some examples were shown in Figure 1E.

. . Experimental configurations
The SNNs were built in Pytorch, and the network architectures for MNIST and TIDigits were the same, containing one input encoding layer, one convolutional layer (with a kernel size of 1. Initialize the network by resetting weights and all related parameters. e.g., initial membrane potential V i , simulation time T, learning rates η = η f = η r .

Encode raw numbers of datasets to spike trains.
3. Learn the synaptic weights w ij and Motif masks M r,l t by BP (Rumelhart et al., 1986) in two single-sensory tasks to get the spatial mask M r,l t (s) and temporal mask M r,l t (t). 5×5, and two input channels constructed by convolutional layer), one full-connection integrated layer (with 200 LIF neurons), and one output layer (with ten output neurons). Among the network, the capacitance C was 1µF/cm 2 , conductivity g was 0.2 nS, time constant τ ref was 1 ms, and resting potential V rest was equal to reset potential V reset with 0 mV. The learning rate was 1e-4, the firing threshold V th was 0.5 mV, the simulation time T was set as 20 ms, and the gradient approximation range V win was 0.5 mV.
As shown in Figure 1E, for the visual dataset, before being given to the input layer, the raw data were encoded to spike trains first by comparing each number with a random number generated from Bernoulli sampling at each time slot of the time window T. For the auditory dataset, the input data would first be transformed to the frequency spectrum in the frequency domain by the MFCC (Mel frequency cepstrum coefficient; Sahidullah and Saha, 2012). Then the spectrum would be split according to the time windows. Finally, the sub-spectrum would be converted into normalized value and randomly sampled with Bernoulli sampling to spike trains.
There are two SNNs concluded in our experiment as follows: • M-SNN. The Motif mask is generated randomly and then updated during the learning of synaptic weights in a Standard-SNN.  training, the generated visual and temporal Motif masks were shown in Figures 3A, B, where the black dot in the visualization of the Motif mask indicated that there was a connection between the two neurons shown at the X-axis and Y-axis. The white dot meant there was not. This result showed that the visual Motif mask connections were sparse, with only about half of the neurons being connected. Furthermore, the connection in the Motif mask is 64.39% for auditory TIDigits dataset, and 28.24% for visual MNIST dataset. For the temporal TIDigits dataset, the generated temporal Motif mask after training was shown in Figure 3B, where the learned Motif mask was denser than that on the visual MNIST in Figure 3A. It is consistent with the biological finding that temporal Motifs are denser than visual ones (Vinje and Gallant, 2000;Hromádka et al., 2008). These differences between spatial and temporal Motif masks indicated that the network needed a more complex connection structure to deal with sequential information. In addition, the connection points in the spatial and temporal Motif masks in Figures 3A, B seemed to be divided into several square regions, similar to the brain regions, which, to some extent, shows the similarity between artificial and biological neural networks at the brain region scale.
The information presented by Motif masks is relatively limited. For further analysis of the Motif structures by Motif distribution, we used the "Plausible Frequency" instead of the standard frequency to calculate the significant Motifs after comparing them to the random networks. The "Plausible Frequency" was defined by multiplying the occurrence frequency and 1 − P, where the P was the P-value of a selected Motif after comparing it to 2,000 repeating control tasks with random connections. The "repeating control tasks" meant generating many matrixes (e.g., 2000) that each element was sampled from a uniformly random distribution. Furthermore, the P-value index showed the statistical significance of the concerning results, whereas a lower P-value indicated the more plausible result.
The Motif distributions corresponding to the Motif masks were shown in Figures 3C, D, where the spatial and temporal Motifs were distributed differently. For spatial Motifs, the 3rd, 6th, 7th, and 10th units were all prominent in spatial Motifs, while the 13th Motif was the most prominent in temporal Motifs. The abundant 3rd, 6th, 7th, and 10th Motifs in SNN revealed the balance of feedforward and recurrent connections for the spatial tasks. The Motif distribution reveals the difference in the abundance of micro-loops in different networks, indicating that temporal tasks require more complex network connections than spatial tasks. To some extent, the Motif distribution here can mitigate the "black box" problem of ANNs by clearly showing loop-level network differences. The plausible frequency eliminated the interference from the random connection. Figures 3E, F showed that M-SNN networks using Motif topologies can be convergent, where the accuracy of M-SNN was significantly higher than the accuracy of Standard-SNN after a few training epochs.
. . M-SNN contribute to solving the cocktail party e ect The cocktail party effect consists of two conditions. The first condition involves focusing on one person's conversation and .
/fnins. . excluding other conversations or noise in the background. Second, it refers to the response of our hearing organs to a certain stimulus. The human attention mechanism has much to do with how the cocktail party effect happens. In our SNN, we simulated the first situation of the cocktail party effect. We used the MNIST dataset to represent the visual input and the TIDigits dataset for the phonetic input. We modeled two scenes to simulate the simplified cocktail party effect. The first scene was a simulation of the cocktail party effect, where both the visual and auditory inputs were messed up by random noise. The second scene simulated a cocktail party effect in which the visual and auditory inputs were simultaneously disrupted by the real image and voice.
. . . Visual and auditory inputs are interfered with the stochastic noise In our experiment, we trained the network with pure image and voice inputs and tested the network with input disturbed by stochastic noise. In the simulation process, we used the method of superimposing random numbers between [0, 1] into the image or speech input to simulate the interference effect of noise. With the different values of the added random numbers, different interference effects were formed, ranging from 0 to 90%, and the influence gradually increased. As shown in Figure 4A, when the influence of noise was relatively low, whether to adding Motifs into the network had little effect on the experimental results (99.00 ± 0.00% for the network with Motifs, 98.50 ± 0.22% for Standard-SNN, and 99.14 ± 0.03% for LISNN; Cheng et al., 2020). As shown in Figure 4A, with the increase of noise ratio, the recognition ability of the network to the input target signal decreased gradually. When the proportion of noise was increased to 60%, the accuracy of the M-SNN was 95.64 ± 0.29%, which was markedly higher than the accuracy of Standard-SNN (57.84 ± 0.68%) and was comparable with LISNN (93.88 ± 0.46%). The higher accuracy indicated that the Motifs in M-SNN had a positive effect on solving the cocktail party effect compared with Standard-SNN. Furthermore, LISNN with lateral interaction in the convolution layer could get a comparable effect with M-SNN.

. . . Visual and auditory inputs are interfered with the real image and voice
We used the MNIST and TIDigits datasets without noise when training the network. We used "8" from the handwritten digital image and human voice in the simulation process instead of the stochastic noise as interference. As shown in Figure 4B, in the . /fnins. . case of a few other interfering sounds, the effect of M-SNN on maintaining accuracy was insignificant. However, with the increase in the proportion of different interfering sounds, the impact of M-SNN on maintaining the recognition of the network was becoming more and more significant. When the noise ratio reached 50%, the recognition accuracy of M-SNN became 77.77 ± 3.94 %, while the Standard-SNN could only reach the an accuracy of 56.75 ± 0.67%, and the accuracy achieved by LISNN was 67.83 ± 1.58%. In these situations, the maximal increased accuracy was 7.5% when the proportion of "8" was 50%.

. . M-SNN for explainable McGurk e ect
The McGurk effect described the psychological phenomenon that occurs when human speech input and image input are inconsistent, whereby most people would judge the input as neither a speech label nor a visual label but a novel concept. It had been shown that, for adults, the error rate in judging inconsistent audiovisual input as novel concepts was more than 90% (McGurk and MacDonald, 1976 First, consistent audio-visual inputs were used to train the network weights. After training, the inconsistent audio-visual information would be fed into the network. In the integrated layer, we used TSNE (Maaten and Hinton, 2008) to reduce the dimension of the high-dimensional features. We conducted four experiments to verify the influence of learning rules and structures on the McGurk effect simulation: networks trained with reward learning with Motif ( Figure 5A), networks trained with reward learning without Motif ( Figure 5B), networks trained with BP learning with Motif ( Figure 5C), and networks trained with BP without Motif ( Figure 5D). As shown in Figure 5, the histogram showed the distribution of samples with different labels in the integration layer. The x-axis represents the distance between the feature point and the reference point on the 2D plane (using TSNE for clustering For comparing the stimulating effect of the McGurk effect, we compared additional algorithms as shown in Table 1. According to our knowledge, the SOM approach in the paper (Gustafsson et al., 2014) is the only unsupervised learning method that replicates the McGurk effect. In contrast, our M-SNN is the only supervised learning method.

. . Lower computational cost for M-SNN during training
We referred to the method in paper (Zhang et al., 2021a) to calculate the computational cost of the network during training for algorithm i, (i=1, 2), where the average training cost of the network was represented by the average epoch multiplied by the number of parameters of the network. A schematic for the mean epoch was shown in Figure 6A, and the equation was shown as follows: where Argmin i (·) is the argument when · is the minimum,    Figure 6B, indicating that the increased noise ratio brought a higher computational cost to the network. In addition, the result showed that the Motifs in M-SNN could save on computational cost when network training (the training cost convergence curves of M-SNN was always below the convergence curves of Standard-SNN). When the noise ratio was 10%, M-SNN achieved the maximum cost-saving ratio of 72.6%. M-SNN achieved the most significant absolute cost savings (save 4.1 × 10 7 ) when the noise ratio reached 30%.

. Conclusion
In this paper, we propose a model of Motif A more profound analysis of the Motifs helps us understand more about the critical functions of the structures in SNNs. This inspiration from Motifs describes the sparse connection in the cell assembly that reveal the importance of the micro-scale structures. Motif topologies are patterns for describing the topologies of a system (e.g., biological cognitive pathways), including the n-node meta graphs that uncover the bottom features of the networks. We find that biological Motifs are beneficial for improving the accuracy of networks in visual and auditory data classification. Significantly, the 3-node Motifs are typical and concise, which could assist in analyzing the function of different network modules. The research on the variability of Motifs will give us more ideas and inspiration toward buildings for a better network. The simulation of different cognitive functions by SNNs with biologically plausible Motifs has much in store to offer in future.