Edited by: Malu Zhang, National University of Singapore, Singapore
Reviewed by: Timothée Masquelier, Centre National de la Recherche Scientifique (CNRS), France; Wenrui Zhang, University of California, Santa Barbara, United States
This article was submitted to Neuromorphic Engineering, a section of the journal Frontiers in Neuroscience
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Spiking neural networks with temporal coding schemes process information based on the relative timing of neuronal spikes. In supervised learning tasks, temporal coding allows learning through backpropagation with exact derivatives, and achieves accuracies on par with conventional artificial neural networks. Here we introduce spiking autoencoders with temporal coding and pulses, trained using backpropagation to store and reconstruct images with high fidelity from compact representations. We show that spiking autoencoders with a single layer are able to effectively represent and reconstruct images from the neuromorphically-encoded MNIST and FMNIST datasets. We explore the effect of different spike time target latencies, data noise levels and embedding sizes, as well as the classification performance from the embeddings. The spiking autoencoders achieve results similar to or better than conventional non-spiking autoencoders. We find that inhibition is essential in the functioning of the spiking autoencoders, particularly when the input needs to be memorised for a longer time before the expected output spike times. To reconstruct images with a high target latency, the network learns to accumulate negative evidence and to use the pulses as excitatory triggers for producing the output spikes at the required times. Our results highlight the potential of spiking autoencoders as building blocks for more complex biologically-inspired architectures. We also provide open-source code for the model.
Spiking neural networks (SNNs), hailed as the “third generation of neural networks” (Maass,
SNNs are of particular interest for the fields of neuromorphic hardware and computational neuroscience (Zenke et al.,
One way of encoding data into spiking neural networks employs temporal coding, which posits that information is encoded in the
Autoencoders are a type of representation learning that was first introduced in the context of restricted Boltzmann machines for dimensionality reduction (Hinton and Salakhutdinov,
Here show that SNNs with temporal coding can learn to behave as autoencoders using standard backpropagation techniques. We characterise one-layer spiking autoencoders that learn to reconstruct images from the MNIST dataset of handwritten digits (LeCun et al.,
We use a neuronal model previously described by Comşa et al. (
τ is a real-valued decay rate constant, fixed across the network, that scales the function in intensity and time.
This synaptic transfer function is inspired by recordings in biological neurons (Rall,
Illustration of membrane potential dynamics for a neuron with θ = 0.5 and τ = 1. The neuron receives input spikes at times
Consider a neuron receiving a sequence of inputs
On a regular computer architecture, simulating the spike times can be done in an event-based manner, without the need for discrete time steps. The spikes are processed one by one in chronological order and the membrane potential of any affected neuron is updated as required. One notable aspect of the simulation is finding the correct set of inputs that determine a neuron to spike; importantly, even if a set of input neurons determines the crossing of the threshold and hence predicts a future spike, this predicted spike may not occur, or may occur at a different time, if a new input spike comes between the last input spike and the predicted spike time.
Given a set of input spikes
where
The architecture of the spiking autoencoder is shown in
Architecture of the spiking autoencoder. The weights and the pulses are trainable.
The autoencoder is composed of three layers: an input layer, a hidden layer that acts as an encoder and is composed of fewer neurons compared to the input layer, and an output layer that acts as a decoder and has the same size as the input layer. The purpose of the autoencoder is to learn to reproduce the input image in the output layer. In other words, the hidden layer must learn to convert the input image into a compressed representation, from which the output image can be reconstructed as closely as possible to the original.
For the MNIST problem, we use hidden layer sizes of
In addition to the regular neurons of the SNN, we also connect a variable number of “synchronisation pulses” to each neuron of each non-input layer. The role of the pulses, which can be thought of as non-input neurons, is to provide a temporal bias and encourage regular neurons to spike in order for gradients to keep flowing during the training process. Just like regular neurons, the pulses connect using learnable weights, but their spike times are also learnable under the same learning scheme as the rest of the network (as described in section 2.5). The set of pulses connected to each layer is initialised with pulses that spike at time evenly distributed in the same interval as the inputs. In section 3, we elaborate on the role that the pulses learn to play in the image decoding process.
The temporal coding scheme posits that more salient information is encoded as earlier spike times. Given an image, we encode each of its individual pixels in the spike time of an individual neuron. The spike time is proportional to the brightness of the pixel. For example,
In the original MNIST dataset (LeCun et al.,
The idea of encoding more salient information as earlier spikes also appears in time-to-first-spike (TTFS) encoding schemes, which is often used in classification paradigms (Mostafa,
The aim of the decoder is to reproduce the input image from the compressed representation provided by the encoder. As the image is encoded in the spike times, we set a target latency
There exist alternative ways of choosing the temporal reference for decoding the output image. One possible alternative is to reconstruct with reference to the earliest spike in the output layer, which would give the SNN the freedom to self-regulate its spike times. Another possible way is to add an additional neuron to the output layer, which could explicitly act like a temporal reference. Here we opt for a fixed latency, which best allows us to study how the spike dynamics change as the model is required to wait for different times between producing the image reconstruction.
The aim of training the spiking autoencoder is to obtain a faithful reconstruction of the input image at the output layer, with a given target latency
where
As in the case of conventional backpropagation training for neural networks, we use the chain rule to compute the update rules for each neuronal spike time and weight in the network by expanding the expression across
By differentiating Equation (2), we can plug in the derivative of the loss function
where
We then plug these derivatives into Equation (3) to obtain the update quantities for each individual neuron and weight. Equation (4) can also be used for adjusting the spike times of the pulses. This is the same backpropagation procedure that is conventionally used in non-spiking ANNs.
If a neuron does not spike, then we add a small positive-valued penalty to each of the input weight derivatives, in order to encourage spiking. If an input neuron spikes after the output neuron, we do not compute derivatives corresponding to that input neuron or its weight.
As the derivative of each neuron can approach infinity when the membrane potential is close to the threshold θ, we clip the derivatives (Equations 4 and 5) during the training process using a fixed clipping value.
The training process consists of minimising the loss function, using an Adam optimiser, for 100 epochs. We use a modified form of Glorot initialisation (Glorot and Bengio,
We train spiking autoencoders to reconstruct images under noisy conditions. We add normally distributed noise to each pixel in the following form, where η is the noise factor and
The mean of the noise variable
We study spiking autoencoders trained on datasets with noise factors η ∈ {0, 0.2, 0.4, 0.6, 0.8}. The noisy images are used as training and test examples, while the training targets are the original (clean) images.
We have three variables controlling the setup for the spiking autoencoders: target latency
For each model analysed below, we report results obtained with the best hyperparameter combination for its setup. However, we can find sets of parameters with good performance on multiple setups at each target latency. These parameters are shown in
Hyperparameters that achieve an error within 0.01 of the best error on all the training configurations with the given latency.
[0.1, 2] | 0.3138976904122206 | 0.28781361955998486 | |
[0.1, 1.5] | 0.8011900124783229 | 0.9063259346518524 | |
[0, 10] | 10 | 8 | |
[−10, 10] | −9.533865719823941 | −6.971635832107275 | |
[−10, 10] | −8.08055538136939 | 9.978394158917038 | |
[1, 1000] |
3 | 27 | |
[1, 1000] | 247.36488789120077 | 373.3754658744521 | |
[0, 100] | 33.83286251355259 | 39.560790380375444 | |
[10−5, 1.0] |
0.0016762843980764315 | 0.00038521130189147893 | |
[10−5, 1.0] |
0.0014413603337483233 | 0.13300674961971326 |
We compare spiking autoencoders with conventional ANN (non-spiking) autoencoders of similar architecture. Specifically, a single hidden layer of hidden size 8, 16, or 32 is used, acting as the encoder. We use ReLU activation function in the encoder and a sigmoid activation function in the decoder (but see the Results for a brief exploration of other activation functions).
The ANNs are implemented in TensorFlow (Abadi et al.,
The input MNIST images are inverted in order to be fed into the spiking autoencoders, given that more salient information causes earlier spikes, so that the more central pixels should cause spikes closer to
The best reconstruction loss values obtained in the spiking autoencoders and the ANN autoencoders are similar (
Reconstruction errors for spiking (“snn”) and non-spiking (“ann”) autoencoders at different levels of noise, for embedding sizes 8, 16, and 32, on the MNIST and FMNIST datasets.
A digit from the MNIST test set reconstructed by a spiking autoencoder with embedding size 32 and target latency
We visualise in
Visualisation of MNIST embeddings produced by a spiking autoencoders with target latency
A practical use of embeddings comes from collapsing a high-dimensional input space, from which the training distribution is sparsely drawn, into a smaller space where basic operations like addition are meaningful.
Interpolating between four items from the MNIST and FMNIST test sets in embedding space. The embeddings are generated by a spiking autoencoder with hidden layer size 32, target latency
Finally, we use support-vector machines (SVMs) with Gaussian kernel to classify digits using either the original space or the embeddings as input features, at different levels of data noise. As shown in
Accuracy of an SVM classifying embeddings produced by spiking (“snn”) and non-spiking (“ann”) autoencoders at different levels of noise, for embedding sizes 8, 16, and 32, on the MNIST dataset. The baseline is the classification accuracy on the original set.
Having established that spiking autoencoders with temporal coding perform on par with their ANN counterparts qualitatively and quantitatively, we proceed to a more in-depth analysis of the trained SNN models. These analyses are performed on the MNIST dataset.
We investigate the models with different spike latencies trained to reconstruct original images (no noise). Intriguingly, we find that the distribution of the embedding (hidden layer) spikes does not shift away from the input distribution and toward the output distribution with higher target latency, but rather remains relatively early, as shown in
Spike distributions on the full test set in trained spiking autoencoders with embedding size 32, noise level η = 0, target latencies
The role of inhibition at the higher latency can be more directly observed in
Output potentials during the reconstruction of a test example by spiking autoencoders with embedding size 32, noise level η = 0, target latencies
Weight distributions in spiking autoencoders, for regular neurons and pulses. All models have embedding size
As mentioned in section 2, the default input to the temporally-coded SNN is the inverted version of the images (
We found that ANN autoencoders do not perform as well at reconstructions on the inverted version of the images. We explored multiple choices of activation function combinations, including ReLU, sigmoid, ELU, tanh, and Gaussian-shaped functions in either the encoder of the decoder. The best performing ANN autoencoders had Gaussian-Gaussian and ReLU-sigmoid activation functions in the encoder and decoder layers, respectively. As shown in
Reconstruction loss on the inverted-brightness MNIST dataset for spiking (“snn”) and non-spiking (“ann”) autoencoders. The embedding size is always
We have shown that spiking autoencoders that represent information in the timing of neuronal spikes can learn to reconstruct images through backpropagation learning. They perform on par with conventional artificial neural networks, and exceed their performance when the inputs are encoded such that the smaller values correspond to more salient information. We have illustrated the capabilities of spiking autoencoders through multiple examples, and we have underlined the important role of inhibition especially when the SNN is required to keep the information in memory (i.e., in the membrane potential of the spiking neurons) for a longer time.
We discuss here the choice of coding scheme in a spiking network in relation to biology of the brain, as well as some considerations on backpropagation.
There are multiple ways of encoding information in the form of spikes. Very often, information is encoded in the neuronal spike rates. In such coding schemes, a more salient stimulus is encoded as a higher spike rate of a particular neuron. ANNs can, in fact, be thought of as operating with neuronal spike rates averaged in time or in space. While rate coding has practical value for comparisons with currently spatially-constrained methods of neural recording (Yamins and DiCarlo,
On the other hand, the idea of temporal coding is supported by multiple pieces of biological evidence, in particular at sensory-level such as in the retina (Gollisch and Meister,
In this work, we used backpropagation to teach SNNs to reconstruct and remove noise from images of handwritten digits. The idea of backpropagation learning occurring in the biological brain is often questioned. However, it has been shown that random connections (Lillicrap et al.,
A finding of particular interest that emerges from this work is the interplay between inhibitory and excitatory connections in producing spikes with the required timing. We allowed each connection to learn its own weight, without fixing its polarity from the beginning, but we allowed the hyperparameter search to influence the initial distribution of weights in the pulses and in the regular neurons. The networks learned to use inhibition as a main mechanism in the regular neurons, whereas the pulses were used as excitatory triggers to elicit output spikes at the target latencies. We remark that inhibition was used in both the hidden (encoder) layer and in the output (decoder) layer, which suggests a double accumulation of inhibition; in other words, the encoder accumulated information about which inhibitory elements should be inhibited in the decoder. This feature was not hard-coded in the models. As a consequence, the output timing of the network can be adjusted by simply changing the timing of the pulses connected to the output layer.
Despite our model being an oversimplification over the many variables observed in real neurons, it is still relevant to note that the balance of inhibition and excitation is essential in producing the spike patterns routinely observed in biological networks. For example, the theta rhythm in the hippocampus, which modulates memory, is thought to be caused by an intricate play between excitatory and inhibitory sources (Buzsáki,
A significant challenge in scaling the current model and learning scheme is the computational demand during both the feedforward pass and the error backpropagation. Our model uses a synaptic function that has the advantage of being biologically faithful at the expense of requiring the computation of multiple exponentials. Moreover, a general drawback of spiking neural networks is that the event-based nature of the computation does not allow for full parallelisation in non-specialised hardware. In practice, our models can take a couple of minutes per epoch to train on regular hardware. Nevertheless, we have verified that deeper spiking autoencoders with up to five total layers can successfully learn the same datasets, although we chose to present here only single-layer experiments, which already achieved acceptable reconstructions. Convolutional variations are also possible, but they pose the same computational challenges on regular hardware.
Our work accrues evidence for the potential of spiking neural networks with biologically-inspired characteristics as building blocks for neuromorphic machine learning. The novelty of our work consists in showing for the first time that spiking autoencoders with temporal coding can achieve results on par with conventional autoencoders, as well as providing insights into their dynamics, including the important role of inhibition in memorising information over longer periods of time. The inhibition across the network is complemented by the excitatory role that pulses learn to play in order to trigger the network output at the required time. Further to single-layer architectures, autoencoders can be stacked (Vincent et al.,
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author/s.
IMC designed and performed the experiments, wrote the code, and wrote the first draft of the manuscript. LV contributed to data analysis, performance optimisation and code reviews, and led the open-sourcing. TF contributed to the conception of the study, literature survey, and code reviews. JA contributed to the conception, design, and resource availability for the study. All authors contributed to manuscript revision, read, and approved the submitted version.
All authors were employed by Google Research, Switzerland. Parts of the ideas presented here are covered by pending PCT Patent Application No. PCT/US2019/055848 (Temporal Coding in Leaky Spiking Neural Networks), filed by Google in 2019.
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
We thank our colleagues from the Neural Multimedia Compression team at Google Research for useful feedback on this work.