Edited by: Hesham Mostafa, University of California, San Diego, United States
Reviewed by: Bodo Rückauer, ETH Zürich, Switzerland; Eric Hunsberger, University of Waterloo, Canada; David Kappel, Dresden University of Technology, Germany
This article was submitted to Neuromorphic Engineering, a section of the journal Frontiers in Neuroscience
†These authors have contributed equally to this work
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Spiking Neural Networks (SNNs) have recently emerged as a prominent neural computing paradigm. However, the typical shallow SNN architectures have limited capacity for expressing complex representations while training deep SNNs using input spikes has not been successful so far. Diverse methods have been proposed to get around this issue such as converting off-the-shelf trained deep Artificial Neural Networks (ANNs) to SNNs. However, the ANN-SNN conversion scheme fails to capture the temporal dynamics of a spiking system. On the other hand, it is still a difficult problem to directly train deep SNNs using input spike events due to the discontinuous, non-differentiable nature of the spike generation function. To overcome this problem, we propose an approximate derivative method that accounts for the leaky behavior of LIF neurons. This method enables training deep convolutional SNNs directly (with input spike events) using spike-based backpropagation. Our experiments show the effectiveness of the proposed spike-based learning on deep networks (VGG and Residual architectures) by achieving the best classification accuracies in MNIST, SVHN, and CIFAR-10 datasets compared to other SNNs trained with a spike-based learning. Moreover, we analyze sparse event-based computations to demonstrate the efficacy of the proposed SNN training method for inference operation in the spiking domain.
Over the last few years, deep learning has made tremendous progress and has become a prevalent tool for performing various cognitive tasks such as object detection, speech recognition, and reasoning. Various deep learning techniques (LeCun et al.,
Spiking Neural Network (SNN) is one of the leading candidates for overcoming the constraints of neural computing and to efficiently harness the machine learning algorithm in real-life (or mobile) applications. The concepts of SNN, which is often regarded as the 3rd generation neural network (Maass,
We can divide SNNs into two broad classes: (a) converted SNNs and (b) SNNs derived by direct spike-based training. The former one is SNNs converted from the trained ANN for the efficient event-based inference (ANN-SNN conversion) (Cao et al.,
The main contributions of our work are as follows. First, we develop a spike-based supervised gradient descent BP algorithm that employs an approximate (pseudo) derivative for LIF neuronal function. In addition, we leverage the key idea of the successful deep ANN models such as LeNet5 (LeCun et al.,
The rest of the paper is organized as follows. In section 2.1, we provide the background on fundamental components and architectures of deep convolutional SNNs. In section 2.2.1, we detail the spike-based gradient descent BP learning algorithm. In section 2.2.2, we describe our spiking version of the dropout technique. In section 3.1–3.2, we describe the experiments and report the simulation results, which validate the efficacy of spike-based BP training for MNIST, SVHN, CIFAR-10, and N-MNIST datasets. In section 4.1, we discuss the proposed algorithm in comparison to relevant works. In section 4.2–4.4, we analyze the spike activity, inference speedup and complexity reduction of directly trained SNNs and ANN-SNN converted networks. Finally, we summarize and conclude the paper in section 5.
Leaky-Integrate-and-Fire (LIF) neurons (Dayan and Abbott,
where
where
where
The illustration of Leaky Integrate and Fire (LIF) neuron dynamics. The pre-spikes are modulated by the synaptic weight to be integrated as the current influx in the membrane potential that decays exponentially. Whenever the membrane potential crosses the firing threshold, the post-neuron fires a post-spike and resets the membrane potential.
List of notations.
θ | Spike event |
The sum of pre-spike events over time | |
Synaptic weight | |
Membrane potential | |
Neuronal firing threshold | |
Input current at each time step | |
Total current influx over time | |
Activation of spiking neuron | |
Loss function | |
δ | Error gradient |
In this work, we develop a training methodology for convolutional SNN models that consist of an input layer followed by intermediate hidden layers and a final classification layer. In the input layer, the pixel images are encoded as Poisson-distributed spike trains where the probability of spike generation is proportional to the pixel intensity. The hidden layers consist of multiple convolutional (C) and spatial-pooling (P) layers, which are often arranged in an alternating manner. These convolutional (C) and spatial-pooling (P) layers represent the intermediate stages of feature extractor. The spikes from the feature extractor are combined to generate a one-dimensional vector input for the fully-connected (FC) layers to produce the final classification. The convolutional and fully-connected layers contain trainable parameters (i.e., synaptic weights) while the spatial-pooling layers are fixed
Illustration of the simplified operational example of
The two major operations used for pooling are max and average. Both have been used for SNNs, e.g., max-pooling (Rueckauer et al.,
Deep networks are essential for recognizing intricate input patterns so that they can effectively learn hierarchical representations. To that effect, we investigate popular deep neural network architectures such as VGG (Simonyan and Zisserman,
The basic building blocks of the described convolutional SNN architectures.
The spike-based BP algorithm in SNN is adapted from standard BP (Rumelhart et al.,
In forward propagation, spike trains representing input patterns are presented to the network for estimating the network outputs. To generate the spike inputs, the input pixel values are converted to Poisson-distributed spike trains and delivered to the network. The input spikes are multiplied with synaptic weights to produce an input current that accumulates in the membrane potential of post neurons as in Equation (1). Whenever its membrane potential exceeds a neuronal firing threshold, the post-neuron generates an output spike and resets. Otherwise, the membrane potential decays exponentially over time. The neurons of every layer (excluding output layer) carry out this process successively based on the weighted spikes received from the preceding layer. Over time, the total weighted summation of the pre-spike trains (i.e.,
where
Clearly, the sum of post-spike train (
Next, we describe the backward propagation for the proposed spike-based backpropagation algorithm. After the forward propagation, the loss function is measured as a difference between target labels and outputs predicted by the network. Then, the gradients of the loss function are estimated at the final layer. The gradients are propagated backward all the way down to the input layer through the hidden layers using recursive chain rule, as formulated in Equation (7). The following Equations (7–27) and
Illustration of the forward and backward propagation phase of the proposed spike-based BP algorithm in a multi-layer SNN comprised of LIF neurons. In the forward phase, the LIF neurons (in all layers) accumulate the weighted sum of the pre-spikes in the membrane potential, which decays exponentially over time. In addition, the LIF neurons in hidden layers generate post-spikes if the membrane potential exceeds a threshold and reset the membrane potential. However, the LIF neurons in the final layer, do not generate any spike, but rather accumulate the weighted sum of pre-spikes till the last time step to quantify the final outputs. Then, the final errors are evaluated by comparing the final outputs to the label data. In the backward phase, the final errors are propagated backward through the hidden layers using the chain rule to obtain the partial derivatives of final error with respect to weights. Finally, the synaptic weights are modified in a direction to reduce the final errors.
The prediction error of each output neuron is evaluated by comparing the output distribution (
In SNN, the “activation function” indicates the relationship between the weighted summation of pre-spike inputs and post-neuronal outputs over time. In forward propagation, we have different types of neuronal activation for the final layer and hidden layers. Hence, the estimation of neuronal activations and their derivatives are different for the final layer and hidden layers. For the final layer, the value of
During back-propagating phase, we consider the leak statistics of membrane potential in the final layer neurons as noise. This allows us to approximate the accumulated membrane potential value for a given neuron as equivalent to the total input current (i.e.,
For the hidden layers, we have post-spike trains as the neuronal outputs. The spike generation function is non-differentiable since it creates a discontinuity (because of step jump) at the time instance of firing. Hence, we introduce a pseudo derivative method for LIF neuronal activation (
The spike generation function of IF neuron is a hard threshold function that generates the output signal as either +1 or 0. The IF neuron fires a post-spike whenever the input currents accumulated in membrane potential exceed the firing threshold (note, in case of IF neuron, there is no leak in the membrane potential). Hence, the membrane potential of a post-neuron at time instant
where
The spike generation function of both the IF and LIF neuron models are the same, namely the hard threshold function. However, the effective neuronal thresholds are considered to be different for the two cases, as shown in
To compute ϵ, the ratio (β) between the total membrane potential (
where
Hence, in IF neuron case, the evolution of membrane potential over time
By solving the Equations (17, 18, 20), the inverse ratio (
where the first term (unity) indicates the effect of average input currents (that is observed from the approximate derivative of IF neuron activation, namely the straight-through estimation) and the second term
In summary, the approximations applied to implement a spike-based BP algorithm in SNN are as follows:
During the back-propagating phase, we consider the leaks in the membrane potential of final layer neurons as noises so that the accumulated membrane potential is approximated as equivalent to the total input current (
For hidden layers, we first approximate the activation of an IF neuron as a linear function (i.e., straight-through estimation). Hence, we are able to estimate its derivative of IF neuron's activation (Bengio et al.,
To capture the leaky effect of a LIF neuron (in hidden layers), we estimate the scaled time derivative of the low-pass filtered output spikes that leak over time, using the function
We obtain an approximate derivative for LIF neuronal activation (in hidden layers) as a combination of two derivatives. The first one is the straight-through estimation (i.e., approximate derivative of IF neuron activation). The second one is the leak correctional term that compensates the leaky effect in the membrane potential of LIF neurons. The combination of straight-through estimation and the leak correctional term is expected to be less than 1.
Based on these approximations, we can train SNNs with direct spike inputs using a spike-based BP algorithm.
At the final layer, the error gradient, δ
The derivative of
Dropout (Srivastava et al.,
Forward propagation with dropout at each iteration in SNN
1: |
2: |
3: // Define the random subset of units (with a probability 1 − |
4: |
5: |
6: |
7: // Set input of first layer equal to spike train of a mini-batch data |
8: |
9: |
10: // Integrate weighted sum of input spikes to membrane potential |
11: |
12: // If |
13: |
14: // Membrane potential resets if the corresponding neuron fires a spike |
15: |
16: |
17: |
18: // Else, membrane potential decays over time |
19: |
20: |
The primary goal of our experiments is to demonstrate the effectiveness of the proposed spike-based BP training methodology in a variety of deep network architectures. We first describe our experimental setup and baselines. For the experiments, we developed a custom simulation framework using the Pytorch (Paszke et al.,
Parameters used in the experiments.
Time Constant of Membrane Potential (τ |
100 time-steps |
BP Training Time Duration | 50–100 time-steps |
Inference Time Duration | Same as training |
Mini-batch Size | 16–32 |
Spatial-pooling Non-overlapping Region/Stride | 2 × 2, 2 |
Neuronal Firing Threshold | 1 (hidden layer), ∞ (final layer) |
Weight Initialization Constant (κ) | 2 (non-residual network), 1 (residual network) |
Learning rate (η |
0.002–0.003 |
Dropout Ratio ( |
0.2–0.25 |
We demonstrate the efficacy of our proposed training methodology for deep convolutional SNNs on three standard vision datasets and one neuromorphic vision dataset, namely the MNIST (LeCun et al.,
Benchmark datasets.
MNIST | 28 × 28, gray | 60,000 | 10,000 | 10 |
SVHN | 32 × 32, color | 73,000 | 26,000 | 10 |
CIFAR-10 | 32 × 32, color | 50,000 | 10,000 | 10 |
N-MNIST | 34 × 34 × 2 ON and OFF spikes | 60,000 | 10,000 | 10 |
We use various SNN architectures depending on the complexity of the benchmark datasets. For MNIST and N-MNIST datasets, we used a network consisting of two sets of alternating convolutional and spatial-pooling layers followed by two fully-connected layers. This network architecture is derived from LeNet5 model (LeCun et al.,
The deep convolutional spiking neural network architectures for MNIST, N-MNIST, and SVHN dataset.
Convolution | 1 × 5 × 5 | 20 | 1 | Convolution | 3 × 3 × 3 | 64 | 1 | Convolution | 3 × 3 × 3 | 64 | 1 |
Average-pooling | 2 × 2 | 2 | Convolution | 64 × 3 × 3 | 64 | 2 | Average-pooling | 2 × 2 | 2 | ||
Average-pooling | 2 × 2 | 2 | |||||||||
Convolution | 20 × 5 × 5 | 50 | 1 | Convolution | 64 × 3 × 3 | 128 | 1 | Convolution | 64 × 3 × 3 | 128 | 1 |
Average-pooling | 2 × 2 | 2 | Convolution | 128 × 3 × 3 | 128 | 2 | Convolution | 128 × 3 × 3 | 128 | 2 | |
Convolution | 128 × 3 × 3 | 128 | 2 | Skip convolution | 64 × 1 × 1 | 128 | 2 | ||||
Average-pooling | 2 × 2 | 2 | |||||||||
Convolution | 128 × 3 × 3 | 256 | 1 | ||||||||
Convolution | 256 × 3 × 3 | 256 | 2 | ||||||||
Skip convolution | 128 × 1 × 1 | 256 | 2 | ||||||||
Fully-connected | 200 | Fully-connected | 1024 | Fully-connected | 1024 | ||||||
Output | 10 | Output | 10 | Output | 10 |
The deep convolutional spiking neural network architectures for a CIFAR-10 dataset.
Convolution | 3 × 3 × 3 | 64 | 1 | Convolution | 3 × 3 × 3 | 64 | 1 | Convolution | 3 × 3 × 3 | 64 | 1 |
Convolution | 64 × 3 × 3 | 64 | 1 | Average-pooling | 2 × 2 | 2 | Average-pooling | 2 × 2 | 2 | ||
Average-pooling | 2 × 2 | 2 | |||||||||
Convolution | 64 × 3 × 3 | 128 | 1 | Convolution | 64 × 3 × 3 | 128 | 1 | Convolution | 64 × 3 × 3 | 128 | 1 |
Convolution | 128 × 3 × 3 | 128 | 1 | Convolution | 128 × 3 × 3 | 128 | 1 | Convolution | 128 × 3 × 3 | 128 | 1 |
Average-pooling | 2 × 2 | 2 | Skip convolution | 64 × 1 × 1 | 128 | 1 | Skip convolution | 64 × 1 × 1 | 128 | 1 | |
Convolution | 128 × 3 × 3 | 256 | 1 | Convolution | 128 × 3 × 3 | 256 | 1 | Convolution | 128 × 3 × 3 | 256 | 1 |
Convolution | 256 × 3 × 3 | 256 | 1 | Convolution | 256 × 3 × 3 | 256 | 2 | Convolution | 256 × 3 × 3 | 256 | 2 |
Convolution | 256 × 3 × 3 | 256 | 1 | Skip connection | 128 × 1 × 1 | 256 | 2 | Skip convolution | 128 × 1 × 1 | 256 | 2 |
Average-pooling | 2 × 2 | 2 | |||||||||
Convolution | 256 × 3 × 3 | 512 | 1 | Convolution | 256 × 3 × 3 | 512 | 1 | ||||
Convolution | 512 × 3 × 3 | 512 | 2 | Convolution | 512 × 3 × 3 | 512 | 1 | ||||
Skip convolution | 256 × 1 × 1 | 512 | 2 | Skip convolution | 512 × 1 × 1 | 512 | 1 | ||||
Convolution | 512 × 3 × 3 | 512 | 1 | ||||||||
Convolution | 512 × 3 × 3 | 512 | 2 | ||||||||
Skip convolution | 512 × 1 × 1 | 512 | 2 | ||||||||
Fully-connected | 1024 | Fully-connected | 1024 | Fully-connected | 1024 | ||||||
Output | 10 | Output | 10 | Output | 10 |
As mentioned previously, off-the-shelf trained ANNs can be successfully converted to SNNs by replacing ANN (ReLU) neurons with Integrate and Fire (IF) spiking neurons and adjusting the neuronal thresholds with respect to synaptic weights. In the literature, several methods have been proposed (Cao et al.,
For the static vision datasets (MNIST, SVHN, and CIFAR-10), each input pixel intensity is converted to a stream of Poisson-distributed spike events that have equivalent firing rates. Specifically, at each time step, the pixel intensity is compared with a uniformly distributed random number (in the range between 0 and 1). If pixel intensity is greater than the random number at the corresponding time step, a spike is generated. This rate-based spike encoding is used to feed the input spikes to the network for a given period of time during both training and inference. For color image datasets, we use the pre-processing technique of horizontal flip before generating input spikes. These input pixels are normalized to represent zero mean and unit standard deviation. Thereafter, we scale the pixel intensities to bound them in the range [–1,1] to represent the whole spectrum of input pixel representations. The normalized pixel intensities are converted to Poisson-distributed spike events such that the generated input signals are bipolar spikes. For the neuromorphic version of the dataset (N-MNIST), we use the original (unfiltered and uncentered) version of spike streams to directly train and test the network in the time domain.
As mentioned in section 3.1.4, we generate a stochastic Poisson-distributed spike train for each input pixel intensity for event-based operation. The duration of the spike train is very important for SNNs. We measure the length of the spike train (spike time window) in time-steps. For example, a 100 time-step spike train will have approximately 50 random spikes if the corresponding pixel intensity is half in a range of [0,1]. If the number of time-steps (spike time window) is too less, then the SNN will not receive enough information for training or inference. On the other hand, a large number of time-steps will destroy the stochastic property of SNNs and get rid of noise and imprecision at the cost of high latency and power consumption. Hence, the network will not have much energy efficiency over ANN implementations. For these reasons, we experimented with the different number of time-steps to empirically obtain the optimal number of time-steps required for both training and inference. The experimental process and results are explained in the following subsections.
A spike event can only represent 0 or 1 in each time step, therefore usually its bit precision is considered 1. However, the spike train provides temporal data, which is an additional source of information. Therefore, the spike train length (number of time-steps) in SNN can be considered as its actual precision of neuronal activation. To obtain the optimal #time-steps required for our proposed training method, we trained VGG9 networks on CIFAR-10 dataset using different time-steps ranging from 10 to 120 (shown in
Inference performance variation due to
To obtain the optimal #time-steps required for inferring an image utilizing a network trained with our proposed method, we conducted similar experiments as described in section 3.1.5. We first trained a VGG9 network for CIFAR-10 dataset using 100 time-steps (optimal according to experiments in section 3.1.5). Then, we tested the network performances with different time-steps ranging from 10 to 4,000 (shown in
In this section, we analyze the classification performance and efficiency achieved by the proposed spike-based training methodology for deep convolutional SNNs compared to the performance of the transformed SNN using ANN-SNN conversion scheme.
Most of the classification performances available in the literature for SNNs are for MNIST and CIFAR-10 datasets. The popular methods for SNN training are “Spike Time Dependent Plasticity (STDP)” based unsupervised learning (Brader et al.,
Comparison of the SNNs classification accuracies on MNIST, N-MNIST, and CIFAR-10 datasets.
Hunsberger and Eliasmith ( |
Offline learning, conversion | 98.37% | – | 82.95% |
Esser et al. ( |
Offline learning, conversion | – | – | 89.32% |
Diehl et al. ( |
Offline learning, conversion | 99.10% | – | – |
Rueckauer et al. ( |
Offline learning, conversion | 99.44% | – | 88.82% |
Sengupta et al. ( |
Offline learning, conversion | – | – | 91.55% |
Kheradpisheh et al. ( |
Layerwise STDP + offline SVM classifier | 98.40% | – | – |
Panda and Roy ( |
Spike-based autoencoder | 99.08% | – | 70.16% |
Lee et al. ( |
Spike-based BP | 99.31% | 98.74% | – |
Wu et al. ( |
Spike-based BP | 99.42% | 98.78% | 50.70% |
Lee et al. ( |
STDP-based pretraining + spike-based BP | 99.28% | – | – |
Jin et al. ( |
Spike-based BP | 99.49% | 98.88% | – |
Wu et al. ( |
Spike-based BP | – | 99.53% | 90.53% |
This work | Spike-based BP | 99.59% | 99.09% | 90.95% |
For a more extensive comparison, we compare the inference performances of trained networks using our proposed methodology with the ANNs and ANN-SNN conversion scheme for same network configuration (depth and structure) side by side in
Comparison of classification performance.
(Diehl et al., |
(Sengupta et al., |
|||||
MNIST | LeNet | 99.57% | 99.55% | 99.59% | 99.49% (Jin et al., |
99.59% |
N-MNIST | LeNet | – | – | – | 99.53% (Wu et al., |
99.09% |
SVHN | VGG7 | 96.36% | 96.33% | 96.30% | – | 96.06% |
ResNet7 | 96.43% | 96.33% | 96.40% | – | 96.21% | |
CIFAR-10 | VGG9 | 91.98% | 91.89% | 92.01% | 90.53% (Wu et al., |
90.45% |
ResNet9 | 91.85% | 90.78% | 91.59% | 90.35% | ||
ResNet11 | 91.87% | 90.98% | 91.65% | 90.95% |
After initializing the weights, we train the SNNs using a spike-based BP algorithm in an end-to-end manner with Poisson-distributed spike train inputs. Our evaluation of MNIST dataset yields a classification accuracy of 99.59%, which is the best compared to any other SNN training scheme and also identical to other ANN-SNN conversion schemes. We achieve ~96% inference accuracy on SVHN dataset for both trained non-residual and residual SNN. Inference performance for SNNs trained on SVHN dataset has not been reported previously in the literature. We implemented three different networks, as shown in
In order to analyze the effect of network depth for SNNs, we experimented with networks of different depths while training for SVHN and CIFAR-10 datasets. For SVHN dataset, we started with a shallow network derived from LeNet5 model (LeCun et al.,
Accuracy improvement with network depth for
In this section, we compare our proposed supervised learning algorithm with other recent spike-based BP algorithms. The spike-based learning rules primarily focus on directly training and testing SNNs with spike-trains, and no conversion is necessary for applying in real-world spiking scenario. In recent years, there are an increasing number of supervised gradient descent method in spike-based learning. The Panda and Roy (
There are several points that distinguish our work from others. First, we use a pseudo derivative method that accounts for leaky effect in membrane potential of LIF neurons. We approximately estimate the leaky effect by comparing total membrane potential value and obtain the ratio between IF and LIF neurons. During the back-propagating phase, the pseudo derivative of LIF neuronal function is estimated by combining the straight through estimation and leak correctional term as described in Equation (22). Next, we construct our networks by leveraging frequently used architectures such as VGG (Simonyan and Zisserman,
The most important advantage of event-based operation of neural networks is that the events are very sparse in nature. To verify this claim, we analyzed the spiking activities of the direct-spike trained SNNs and ANN-SNN converted networks in the following subsections.
The layer-wise spike activities of both SNN trained using our proposed methodology, and ANN-SNN converted network (using scheme 1) for VGG9 and ResNet9 are shown in
Layer-wise spike activity in direct-spike trained SNN and ANN-SNN converted network for CIFAR-10 dataset:
We can observe from
From
#Spikes/Image inference and spike efficiency comparison between SNN and ANN-SNN converted networks for benchmark datasets trained on different network models.
(Diehl et al., |
(Sengupta et al., |
(Diehl et al., |
(Sengupta et al., |
|||
MNIST | LeNet | 5.52E+04 | 3.4E+04 | 2.9E+04 | 0.62x | 0.53x |
7.3E+04 | 1.32x | |||||
SVHN | VGG7 | 5.56E+06 | 3.7E+06 | 1.0E+07 | 0.67x | 1.84x |
1.9E+07 | 1.7E+07 | 3.40x | 2.99x | |||
ResNet7 | 4.66E+06 | 3.9E+06 | 3.1E+06 | 0.85x | 0.67x | |
2.4E+07 | 2.0E+07 | 5.19x | 4.30x | |||
CIFAR-10 | VGG9 | 1.24E+06 | 1.6E+06 | 2.2E+06 | 1.32x | 1.80x |
8.3E+06 | 9.6E+06 | 6.68x | 7.78x | |||
ResNet9 | 4.32E+06 | 2.7E+06 | 1.5E+06 | 0.63x | 0.35x | |
1.0E+07 | 7.8E+06 | 2.39x | 1.80x | |||
ResNet11 | 1.53E+06 | 9.7E+06 | 1.8E+06 | 6.33x | 1.17x | |
9.2E+06 | 5.99x |
The
The comparison of “accuracy vs. latency vs. #spikes/inference” for ResNet11 architecture. In this figure, the solid lines are representing inference accuracy while the dashed lines are representing #spikes/inference. The slope of #spikes/inference curve of the proposed SNN is larger than ANN-SNN converted networks. However, since proposed SNN requires much less time-steps for inference, the number of spikes required for one image inference is significantly lower compared to ANN-SNN. The required #time-steps and corresponding #spikes/inference are shown using highlighted points connected by arrows. Log scale is used for x-axis for easier viewing of the accuracy changes for lower number of time-steps.
The time required for inference is linearly proportional to the #time-steps (
Inference #time-steps and corresponding speedup comparison between SNN and ANN-SNN converted networks for benchmark datasets trained on different network models.
(Diehl et al., |
(Sengupta et al., |
(Diehl et al., |
(Sengupta et al., |
|||
MNIST | LeNet | 50 | 180 | 200 | 3.6x | 4x |
500 | 10x | |||||
SVHN | VGG7 | 100 | 500 | 1,600 | 5x | 16x |
2,500 | 2,600 | 25x | 26x | |||
ResNet7 | 100 | 500 | 400 | 5x | 4x | |
3,000 | 2,500 | 30x | 25x | |||
CIFAR-10 | VGG9 | 100 | 500 | 800 | 5x | 8x |
2,500 | 3,600 | 25x | 36x | |||
ResNet9 | 100 | 800 | 600 | 8x | 6x | |
3,000 | 3,000 | 30x | 30x | |||
ResNet11 | 100 | 3500 | 600 | 35x | 6x | |
3,000 | 30x |
Deep ANNs struggle to meet the demand of extraordinary computational requirements. SNNs can mitigate this effort by enabling efficient event-based computations. To compare the computational complexity of these two cases, we first need to understand the operation principle of both. An ANN operation for inferring the category of a particular input requires a single feed-forward pass per image. For the same task, the network must be evaluated over a number of time-steps in the spiking domain. If regular hardware is used for both ANN and SNN, then it is evident that SNN will have computation complexity in the order of hundreds or thousands more compared to an ANN. However, there are specialized hardwares that account for the event-based neural operation and “computes only when required” for inference. SNNs can potentially exploit such alternative mechanisms of network operation and carry out an inference operation in the spiking domain much more efficiently than an ANN. Also, for deep SNNs, we have observed the increase in sparsity as the network depth increases. Hence, the benefits from event-based neuromorphic hardware are expected to increase as the network depth increases.
An estimate of the actual energy consumption of SNNs and comparison with ANNs is outside the scope of this work. However, we can gain some insight by quantifying the computational energy consumption for a synaptic operation and comparing the number of synaptic operations being performed in the ANN vs. the SNN trained with our proposed algorithm and ANN-SNN converted network. We can estimate the number of synaptic operations per layer of a neural network from the structure for the convolutional and linear layers. In an ANN, a multiply-accumulate (MAC) computation is performed per synaptic operation. While a specialized SNN hardware would perform simply an accumulate computation (AC) per synaptic operation only if an incoming spike is received. Hence, the total number of AC operations in a SNN can be estimated by the layer-wise product and summation of the average neural spike count for a particular layer and the corresponding number of synaptic connections. We also have to multiply the #time-steps with the #AC operations to get total #AC operation for one inference. For example, assume that there are
However, a MAC operation usually consumes an order of magnitude more energy than an AC operation. For instance, according to Han et al. (
Inference computation complexity comparison between ANN, ANN-SNN conversion and SNN trained with spike-based backpropagation. ANN computational complexity is considered as a baseline for normalization.
It is worth noting here that as the sparsity of the spike signals increases with an increase in network depth in SNNs. Hence, the energy-efficiency is expected to increase almost exponentially in both ANN-SNN conversion network (Sengupta et al.,
In section 4.3, we observe that SNNs trained with proposed method achieve significant speed-up in both max-accuracy and iso-accuracy condition. However, in section 4.2.2, we found that the proposed method is in some cases (in an iso-accuracy condition) not more efficient than ANN-SNN conversions in terms of #spike/inference. The reason behind it is that an iso-accuracy condition may not be optimal for the SNNs trained with proposed method. In an iso-accuracy case, we have used max-accuracy latency (50 time-steps for MNIST and 100 time-steps for other networks) for direct-spike trained SNN, whereas most of the conversion networks used much less latency than the max-accuracy condition. In view of this, there is a need to determine the circumstances where our proposed method performs as well as or better than the SNN-ANN conversion methods on spike count, time steps, and accuracy. Consequently, in this section we analyze another interesting comparison.
In this analysis, we compare our proposed method and ANN-SNN conversion methods (Diehl et al.,
Iso-spike comparison for optimal condition.
(Diehl et al., |
(Sengupta et al., |
(Diehl et al., |
(Sengupta et al., |
||||
MNIST | LeNet | 20 | 62 | 75 | 99.36 | 99.19 | 88.62 |
SVHN | VGG7 | 30 | 235 | 235 | 95.00 | 95.34 | 88.13 |
ResNet7 | 30 | 200 | 200 | 95.06 | 95.63 | 95.48 | |
CIFAR-10 | VGG9 | 50 | 228 | 260 | 89.33 | 69.53 | 61.08 |
ResNet9 | 50 | 390 | 490 | 89.52 | 89.51 | 90.06 | |
ResNet11 | 50 | 307 | 280 | 90.24 | 82.75 | 73.82 |
For comparatively shallower networks such as LeNet, VGG7 (VGG type) and ResNet7, ResNet9 (Residual type), the ANN-SNN conversion networks achieve as good as or slightly better accuracy at iso-spike condition compared to the SNNs trained with our proposed method. However, these ANN-SNN conversion networks require 3x-10x higher latency for inference. On the other hand, for deeper networks such as VGG9 and ResNet11, the ANN-SNN conversion networks achieve significantly lower accuracy compared to SNNs trained with our proposed method even with much higher latency. This trend indicates that for deeper networks, SNNs trained with our proposed method will be more energy-efficient than the conversion networks under an iso-spike condition.
In this work, we propose a spike-based backpropagation training methodology for popular deep SNN architectures. This methodology enables deep SNNs to achieve comparable classification accuracies on standard image recognition tasks. Our experiments show the effectiveness of the proposed learning strategy on deeper SNNs (7–11 layer VGG and ResNet network architectures) by achieving the best classification accuracies in MNIST, SVHN, and CIFAR-10 datasets among other networks trained with spike-based learning till date. The performance gap in terms of quality between ANN and SNN is substantially reduced by the application of our proposed methodology. Moreover, significant computational energy savings are expected when deep SNNs (trained with the proposed method) are employed on suitable neuromorphic hardware for the inference.
Publicly available datasets were analyzed in this study. This data can be found here: MNIST, N-MNIST, SVHN, CIFAR-10 datasets. The source code is publicly released at “
CL and SS implemented the algorithm and conducted the experiments. CL, SS, PP, GS, and KR discussed about the results and analysis, and wrote the manuscript.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
This manuscript has been released as a Pre-Print at Lee et al. (