Edited by: Giacomo Indiveri, Universität Zürich, Switzerland
Reviewed by: Sadique Sheik, University of California, San Diego, United States; Xianghong Lin, Northwest Normal University, China
This article was submitted to Neuromorphic Engineering, a section of the journal Frontiers in Neuroscience
†These authors have contributed equally to this work.
This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Spiking neural networks (SNNs) are promising in ascertaining brainlike behaviors since spikes are capable of encoding spatiotemporal information. Recent schemes, e.g., pretraining from artificial neural networks (ANNs) or direct training based on backpropagation (BP), make the highperformance supervised training of SNNs possible. However, these methods primarily fasten more attention on its spatial domain information, and the dynamics in temporal domain are attached less significance. Consequently, this might lead to the performance bottleneck, and scores of training techniques shall be additionally required. Another underlying problem is that the spike activity is naturally nondifferentiable, raising more difficulties in supervised training of SNNs. In this paper, we propose a spatiotemporal backpropagation (STBP) algorithm for training highperformance SNNs. In order to solve the nondifferentiable problem of SNNs, an approximated derivative for spike activity is proposed, being appropriate for gradient descent training. The STBP algorithm combines the layerbylayer spatial domain (SD) and the timingdependent temporal domain (TD), and does not require any additional complicated skill. We evaluate this method through adopting both the fully connected and convolutional architecture on the static MNIST dataset, a custom object detection dataset, and the dynamic NMNIST dataset. Results bespeak that our approach achieves the best accuracy compared with existing stateoftheart algorithms on spiking networks. This work provides a new perspective to investigate the highperformance SNNs for future brainlike computing paradigm with rich spatiotemporal dynamics.
Spiking neural network encodes information in virtue of the spike signals and shall be promising to effectuate more complicated cognitive functions in a way most approaching to the processing paradigm of brain cortex (Allen et al.,
Yet the SNNs training still remains challenging because of the quite complicated dynamics and nondifferentiable nature of the spike activity. In a nutshell, the existing training methods for SNNs fall into three types: (1) unsupervised learning; (2) indirect supervised learning; (3) direct supervised learning. The first one is originated from the weight modification of biological synapses, e.g., spike timing dependent plasticity (STDP) (Querlioz et al.,
In this paper, a direct supervised learning method is proposed for SNNs, combining both the spatial domain (SD) and temporal domain (TD) in the training phase. First and foremost, an iterative LIF model with SNNs dynamics is established which is appropriate for gradient descent training. On that basis, both the spatial dimension and temporal dimension are considered during the error backpropagation (BP) to evidently improves the network accuracy. An approximated derivative is introduced to address the nondifferentiable issue of the spike activity. We test our SNNs model through adopting both the fully connected and convolutional architecture on the static MNIST dataset and a custom object detection dataset, as well as the dynamic NMNIST dataset. Our method can make full use of spatiotemporaldomain (STD) information that captures the nature of SNNs, thus avoiding any complicated training skill. Experimental results indicate that the proposed method could achieve the best accuracy on both static and dynamic datasets compared with existing stateoftheart algorithms. The influence of TD dynamics and different methods for the derivative approximation are analyzed systematically. This work enables to explore the highperformance SNNs for future brainlike computing paradigms with rich STD dynamics.
We focus on how to efficiently train SNNs by taking full advantage of the spatiotemporal dynamics. In this section, we propose a learning algorithm that enables us to apply spatiotemporal BP for training spiking neural networks. To this end, subsection 2.1 firstly introduces an iterative leaky integrateandfire (LIF) model that are suitable for the error BP algorithm; subsection 2.2 gives the details of the proposed STBP algorithm; subsection 2.3 proposes the derivative approximation to address the nondifferentiable issue.
It is known that Leaky IntegrateandFire (LIF) is the most commonly used model at present to describe the neuronal dynamics in SNNs, and it can be simply governed by:
Illustration of the spatiotemporal characteristic of SNNs. In addition to the layerbylayer spatial dataflow like ANNs, SNNs are famous for the rich temporal dynamics. The existing training algorithms primarily fasten more attention on one side, either the spatial domain such as the supervised ones via backpropagation, or the temporal domain such as the unsupervised ones via timingbased plasticity. This causes the performance bottleneck. Therefore, how to build a framework for training highperformance SNNs by making full use of the STD information forms the major motivation of this work.
However, directly obtaining the analytic solution of LIF model in (1) makes it inconvenient to train SNNs based on BP with discrete dataflow. This is because the whole network presents complex dynamics in continuous TD. To address this issue, firstly we solve the linear differential Equation (1) with the initial condition
As we know, the efficiency of error BP for training DNNs greatly benefits from the iterative representation of gradient descent which yields the chain rule for layerbylayer error propagation in the SD backward pass. This motivates us to propose an iterative LIF based SNNS in which the iterations occur in both the SD and TD as follows:
In above formulas, the upper index
Actually, formulas (4)–(5) are also inspired from the LSTM model (Hochreiter and Schmidhuber,
In order to present STBP training framework, we define the following loss function
By combining Equations (3)–(9) together it can be seen that
Error propagation in the STD.
Now, we discuss how to obtain the complete gradient descent in the following four cases. Firstly, we denote that:
In this case, the derivative
The derivation with respect to
In this case, the derivative
Similarly, the derivative
In this case, the derivative
In this case, the derivative
Based on the four cases, the error propagation procedure (depending on the above derivatives) is shown in Figure
In the previous sections, we have presented how to obtain the gradient information based on STBP, but the issue of nondifferentiable points at each spiking time is yet to be addressed. Actually, the derivative of output gate
Derivative approximation of the nondifferentiable spike activity.
Thus,
In section 3.3, we will analyze the influence on the SNNs performance with different curves and different values of
The initialization of parameters, such as the weights, thresholds and other parameters, is crucial for stabilizing the firing activities of the whole network. We should simultaneously ensure timely response of presynaptic stimulus but avoid too much spikes that reduces the neuronal selectivity. As it is known that the multiplyaccumulate operations of the prespikes and weights, and the threshold comparison are two key computation steps in the forward pass. This indicates the relative magnitude between the weights and thresholds determines the effectiveness of parameter initialization. In this paper, we fix the threshold to be constant in each neuron for simplification, and only adjust the weights to control the activity balance. Firstly, we initial all the weight parameters by sampling from the standard uniform distribution:
Then, we normalize these parameters by:
The set of other parameters is presented in Table
Parameters set in our experiments.
Time window  30 ms  
Threshold (MNIST/object detection dataset/NMNIST)  1.5, 2.0, 0.2  
τ  Decay factor (MNIST/object detection dataset/NMNIST)  0.1, 0.15, 0.2 ms 
Derivative approximation parameters (Figure 
1.0  
Simulation time step  1 ms  
Learning rate (SGD)  0.5  
β_{1}, β_{2}, λ  Adam parameters  0.9, 0.999, 110^{−8} 
We test the STBP training framework on various datasets, including the static MNIST dataset, a custom object detection dataset as well as the dynamic NMNIST dataset.
The MNIST dataset of handwritten digits (Lecun et al.,
Static dataset experiments.
Table
Comparison with the stateoftheart spiking networks with similar architecture on MNIST.
Spiking RBM (STDP) (Neftci et al., 
78450040  None  93.16 
Spiking RBM(pretraining 
78450050010  None  97.48 
Spiking MLP(pretraining 
7841200120010  Weight normalization  98.64 
Spiking MLP(pretraining 
78450020010  None  98.37 
Spiking MLP(BP) (O'Connor and Welling, 
78420020010  None  97.66 
Spiking MLP(STDP) (Diehl and Cook, 
7846400  None  95.00 
Spiking MLP(BP) (Lee et al., 
78480010  Error normalization/ parameter regularization  98.71 
Spiking MLP(STBP)  78480010  None 
Comparison with the typical MLP over object detection dataset.
Nonspiking MLP(BP)  78440010  98.31%  [97.62%, 98.57%] 
Spiking MLP(STBP)  78440010  [ 
Compared with the static dataset, dynamic dataset, such as the NMNIST (Orchard et al.,
Dynamic dataset of NMNIST.
Table
Comparison with stateoftheart networks over NMNIST.
Nonspiking CNN(BP) (Neil et al., 
  None  95.30 
Nonspiking CNN(BP) (Neil and Liu, 
  None  98.30 
Nonspiking MLP(BP)(Lee et al., 
34 × 34 × 280010  None  97.80 
LSTM(BPTT) (Neil et al., 
  Batch normalization  97.05 
PhasedLSTM(BPTT) (Neil et al., 
  None  97.38 
Spiking CNN(pretraining 
  None  95.72 
Spiking MLP(BP) (Lee et al., 
34 × 34 × 280010  Error normalization/parameter regularization  98.74 
Spiking MLP(BP) (Cohen et al., 
34 × 34 × 21000010  None  92.87 
Spiking MLP(STBP)  34 × 34 × 280010  None 
In contrast, SNNs could naturally handle event stream patterns, and via better use of spatiotemporal features, our proposed STBP method achieves best accuracy of 98.78% when compared all the reported ANNs and SNNs methods. The greatest advantage of our method is that we did not use any complex training skill, which is beneficial for future hardware implementation.
Extending our framework to convolution neural network structure allows the network going deeper and grants network more powerful SD information. Here we use our framework to establish the spatiotemporal convolution neural network. Compared with our spatiotemporal fully connected network, the main difference is the processing of the input image, where we use the convolution in place of the weighted summation. Specifically, in the convolution layer, each convolution neuron receives the convoluted results as input and updates its state according to the LIF model. In the pooling layer, because the binary coding of SNNs is inappropriate for standard max pooling, we use the average pooling instead.
Our spiking CNN model are tested on the MNIST dataset as well as the object detection dataset. In the MNIST, our network contains two convolution layers with kernel size of 5 × 5 and two average pooling layers alternatively, followed by one full connected layer. And like traditional CNN, we use the elastic distortion (Simard et al.,
Comparison with other spiking CNN over MNIST.
Spiking CNN (pretraining 
28 × 28 × 112C5P264C5P210  99.12% 
Spiking CNN(BP) (Lee et al., 
28 × 28 × 120C5P250C5P220010  99.31% 
Spiking CNN (STBP)  28 × 28 × 115C5P240C5P230010 
Comparison with the typical CNN over object detection dataset.
Nonspiking CNN(BP)  28 × 28 × 16C330010  98.57%  [98.57%, 98.57%] 
Spiking CNN(STBP)  28 × 28 × 16C330010  [ 
In subsection 2.3, we introduce different curves to approximate the ideal derivative of the spike activity. Here we try to analyze the influence of different approximation curves on the testing accuracy. The experiments are conducted on the MNIST dataset, and the network structure is 784−400−10. The testing accuracy is reported after training 200 epochs. Firstly, we compare the impact of different curve shapes on model performance. In our simulation we use the mentioned
Comparisons of different derivation approximation curves.
Furthermore, we use the rectangular approximation as an example to explore the impact of curve steepness (or peck width) on the experiment results. We set
A major contribution of this work is introducing the temporal domain into the existing spatial domain based BP training method, which makes full use of the spatiotemporal dynamics of SNNs and enables the highperformance training. Now we quantitatively analyze the impact of the TD item. The experiment configurations keep the same with the previous section (784 − 400 − 10) and we also report the testing results after training 200 epochs. Here the existing BP in the SD is termed as SDBP.
Table
Comparison for the SDBP model and the STBP model on different datasets.
Spiking MLP  Objective recognition  78440010  None  97.11%  [96.04%,97.78%] 
(SDBP)  MNIST  78440010  None  98.29%  [98.23%, 98.39%] 
Spiking MLP  Objective recognition  78440010  None  [ 

(STBP)  MNIST  78440010  None  [ 
In this work, we propose a spatiotemporal backpropagation (STBP) algorithm that allows to effective supervised learning for SNNs. Although existing supervised learning methods have considered either SD feature or TD feature (Gtig and Sompolinsky,
Furthermore, we introduce an approximated derivative to address the nondifferentiable issue of the spike activity. Previous works regard the nondifferentiable points as noise (Vincent et al.,
Since the NMNIST converts the static MNIST into a dynamic eventdriven version by the relative movement of DVS, in essence this generation method could not provide sufficient temporal information and additional data feature than original database. Hence it is important to further apply our model to tackle more convincing problems with temporal characteristics, such as TIMIT (Garofolo et al.,
We also evaluate our model on CIFAR10 dataset. Here we do not resort to any data argument methods and training techniques (e.g., batch normalization, weight decay). Considering the training speed, we adopt a smallscale structure with 2 convolution layers (20 channels with kernel 5 × 5  30 channels 5 × 5), 2 × 2 averagepooling layers after each convolution layer, followed by 2 fully connected layers (256 and 10 neurons, respectively). Testing accuracy is reported after 100 training epochs. The spiking CNN achieves 50.7% accuracy and the ANN with same structure achieves 52.9% accuracy. It suggests that SNN is able to obtain comparable performance on larger datasets. To the best of our knowledge, currently few works report the results on CIFAR10 for direct training of SNNs (not including those pretrained ANN models). The difficulty of this problem mainly involves two aspects. Firstly, it is challenging to implement BP algorithm to train SNNs directly at this stage because of the complex dynamics and nondifferentiable spike activity. Secondly, although it is energy efficient to realize SNN on specialized neuromorphic chips, it is very difficult and timeconsuming to simulate the complex kinetic behaviors of SNN on computer software (about ten times or even hundred times the runtimes compared to the same structure ANN). Therefore, accelerating the supervised training of large scale SNNs based on CPU/GPU or neuromorphic substrates is also worth studying in the future.
YW and LD proposed the idea, designed and did the experiments. YW, LD, GL, and JZ conducted the modeling work. YW, LD, and GL wrote the manuscript, then JZ and LS revised it. LS directed the projects and provided overall guidance.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The work was partially supported by National Natural Science Foundation of China (61603209), the Study of BrainInspired Computing System of Tsinghua University program (20151080467), Beijing Natural Science Foundation (4164086), Independent Research Plan of Tsinghua University (20151080467), and by the Science and Technology Plan of Beijing, China (Grant No. Z151100000915071).