Conduction Delay Learning Model for Unsupervised and Supervised Classification of Spatio-Temporal Spike Patterns

Precise spike timing is considered to play a fundamental role in communications and signal processing in biological neural networks. Understanding the mechanism of spike timing adjustment would deepen our understanding of biological systems and enable advanced engineering applications such as efficient computational architectures. However, the biological mechanisms that adjust and maintain spike timing remain unclear. Existing algorithms adopt a supervised approach, which adjusts the axonal conduction delay and synaptic efficacy until the spike timings approximate the desired timings. This study proposes a spike timing-dependent learning model that adjusts the axonal conduction delay and synaptic efficacy in both unsupervised and supervised manners. The proposed learning algorithm approximates the Expectation-Maximization algorithm, and classifies the input data encoded into spatio-temporal spike patterns. Even in the supervised classification, the algorithm requires no external spikes indicating the desired spike timings unlike existing algorithms. Furthermore, because the algorithm is consistent with biological models and hypotheses found in existing biological studies, it could capture the mechanism underlying biological delay learning.


INTRODUCTION
As confirmed in biological studies, the precise timing of a neuronal spike plays a fundamental role in information processing in the central nervous system (Carr and Konishi, 1990;Middlebrooks et al., 1994;Seidl et al., 2010). In information coding called temporal coding, the timing of at least one generated spike represents an output. In contrast, in rate coding, spiking neural network (SNN) generates spikes repeatedly over a certain period and the number of generated spikes represents an output (Brader et al., 2007;Nessler et al., 2009;Beyeler et al., 2013;O'Connor et al., 2013;Diehl and Cook, 2015;Zambrano and Bohte, 2016). SNNs in rate coding are increasingly being investigated for efficient computational architectures in engineering applications. The SNN-based architectures consume less energy and require smaller hardware area than traditional artificial neural network architectures (Querlioz et al., 2013;Neftci et al., 2014;Cao et al., 2015). However, repeated spike generation increases the computational time (VanRullen and Thorpe, 2001). To improve the efficiency of SNN-based architectures, the SNN can be implemented in temporal coding rather than rate coding (Matsubara andTorikai, 2013, 2016).
When multiple pre-synaptic spikes simultaneously arrive at a post-synaptic neuron, they evoke a large excitatory postsynaptic potential (EPSP). In response, the post-synaptic neuron elicits a spike and thereby delivers the signal to the latter part of the SNN with a high probability (see Figure 1). In other words, a neuron behaves as a coincidence detector. As the membrane potential of the post-synaptic neuron increases rapidly with increasing efficacy of the projecting synapse, synaptic modification influences the post-synaptic spike timing. Many algorithms adjust the timing of post-synaptic spikes in a supervised manner by potentiating synapses that potentially evoke EPSPs at the desired timing, while depressing other synapses. Examples are the ReSuMe algorithm (Ponulak, 2005;Sporea and Grüning, 2013;Matsubara and Torikai, 2016), the Tempotron algorithm (Gütig and Sompolinsky, 2006;Yu et al., 2014), and algorithms proposed in Pfister et al. (2006) and Paugam-Moisy et al. (2008). A pre-synaptic spike arrives at a post-synaptic neuron through the axon of the pre-synaptic neuron. The delay incurred by traveling through the axon is called the axonal conduction delay (Waxman and Swadlow, 1976). Thus, temporal coding must consider the pre-synaptic spike timing plus the conduction delay (see left panel of Figure 1B). Izhikevich et al. (2004) and Izhikevich (2006b) demonstrated that even when the conduction delays are constant, an SNN with synaptic modification called spike timing-dependent plasticity (STDP) (Markram et al., 1997;Bi and Poo, 1998) selforganizes its characteristic responses to spatio-temporal spike patterns. In Gerstner et al. (1996), multiple pre-synaptic neurons driven by a single signal source deliver the signal to a postsynaptic neuron after various delays. Synaptic modification maintains the synapses that simultaneously evoke an EPSP by potentiating them and prunes other synapses by depressing them. SpikeProp (Bohte et al., 2002) models a similar process in a supervised manner. However, under the assumption of multiple connections, the SNN requires numerous unused paths for future development. Alternatively, the SNN accepts limited spatio-temporal spike patterns depending on the initial network connections and conduction delays. Fixing the delay time limits the flexibility and efficiency of the SNN (see right panel of Figure 1B).
As mentioned above, information processing in a neural circuit requires the synchronous arrival of spikes elicited by multiple pre-synaptic neurons, and thus optimal conduction delay in the axons is critical. Axons are surrounded by myelin, which works as an electrical insulator and reduces the conduction delay (Rushton, 1951). Myelination and demyelination of axons appear to depend on neuronal activity (Fields, 2005;Bakkum et al., 2008;Jamann et al., 2017). Fields (2015) and Baraban et al. (2016) hypothesized that synaptic modification occurs only after the arrival timings of the pre-synaptic spikes have been adjusted by activity-dependent myelination.
The adaptation of the conduction delay is called delay learning. Supervised delay learning algorithms such as the DL-ReSuMe algorithm (Taherkhani et al., 2015) directly adjust the conduction delay to suit the given spatio-temporal patterns of the pre-and post-synaptic spikes. After the adjustment, the SNN generates post-synaptic spikes at the desired timings (see Figure 1C). These algorithms answer the purpose to reproduce the spatio-temporal patterns, but are not suited for classification of the spatio-temporal patterns because the desired timings are generally unknown and should be manually adjusted with great care in the classification tasks. Hence, this approach reduces the flexibility in the context of machine learning and poorly represents biological systems that selfadapt to changing environments. In unsupervised delay learning, the SNN instead self-organizes in response to a given spike pattern (Hüning et al., 1998;Eurich et al., 1999Eurich et al., , 2000. However, none of the unsupervised delay learning algorithms tackled practical tasks such as classification and reproduction of given spike patterns (see Table 1 for comparison). Thus, a delay learning algorithm that classifies given spike patterns in both unsupervised and supervised manners is greatly desired.
The present study proposes an unsupervised learning algorithm that adjusts the conduction delays and synaptic weights of an SNN. In a theoretical analysis, the proposed learning algorithm is confirmed to approximate the Expectation-Maximization (EM) algorithm. Optimization algorithms in probabilistic models including EM algorithm have been analyzed SNNs with no or fixed delay (e.g., Nessler et al., 2009;Kappel et al., 2014Kappel et al., , 2015Rezende and Gerstner, 2014). The proposed learning algorithm can be considered to extend the earlier methods to the conduction delay optimization. When evaluated on several practical classification tasks, the proposed algorithm successfully discriminated the given spatio-temporal spike patterns in an unsupervised manner. Furthermore, the algorithm was adaptable to supervised learning for improved classification accuracy. Remarkably, even in the supervised classification, the algorithm does give no external spikes indicating the timings at which the SNN should generate spikes. The adjustment of the synaptic weight in the proposed algorithm mimics that of STDP and occurs after the conduction delay was adjusted, supporting the hypothesis of Fields (2015) and Baraban et al. (2016). Therefore, the proposed learning algorithm presents as a good hypothetical model of biological delay learning, and might contribute to further investigations of SNNs in temporal coding. Preliminary and limited results of this study were presented in our conference paper (Matsubara, 2017).

Spiking Neural Network (SNN)
An SNN consists of multiple pre-synaptic neurons x 0 , . . . , x N−1 connected to a post-synaptic neuron z (see Figure 1A). Let T be an experimental time period and δ be the time step. The discrete times are denoted by t = 0, δ, 2δ, . . . (t < T). The binary variable x i,s is set to 1 when a pre-synaptic neuron x i generates a spike at time s and 0 otherwise. Owing to the discrete time, a spike has a width equal to the time step δ. Hence, each set x = {x is } represents a spatio-temporal spike pattern (see Figure 1B). The pre-synaptic neuron x i is connected to the postsynaptic neuron z via a synapse with a synaptic weight W i ≥ 0 and a conduction delay τ i ≥ 0. A pre-synaptic spike x is = 1 arrives at the post-synaptic neuron z after a conduction delay τ i . After τ i , the spike evokes an excitatory post-synaptic potential FIGURE 1 | (A) Schematic of a spiking neural network (SNN) consisting of multiple pre-synaptic neurons x i and a post-synaptic neuron z. W 0 denotes the weight of the synapse projecting from the pre-synaptic neuron x 0 to the post-synaptic neuron z, and τ 0 denotes the conduction delay of the axon corresponding to the synapse. (B,C) Post-synaptic spikes in response to spatio-temporal spike patterns. Each vertical line denotes a spike elicited by a neuron. Dotted lines show the transmission paths of the pre-synaptic spikes to the post-synaptic neuron z. (B) When the conduction delays are fixed, the post-synaptic neuron responds to the spatio-temporal pattern of pre-synaptic spikes and elicits a spike (left panel), but is unresponsive to a second pattern (right panel). (C) When the conduction delays are plastic, the optimized SNN generates post-synaptic spikes at times depending on the given spatio-temporal patterns of the pre-synaptic spikes.  * Brader et al., 2007;Nessler et al., 2009;Beyeler et al., 2013;O'Connor et al., 2013;Querlioz et al., 2013;Neftci et al., 2014;Diehl and Cook, 2015;Zambrano andBohte, 2016. ** Bohte et al., 2002;Ponulak, 2005;Gütig and Sompolinsky, 2006;Pfister et al., 2006;Paugam-Moisy et al., 2008;Matsubara and Torikai, 2016. *** Hüning et al., 1998;Eurich et al., 1999Eurich et al., , 2000 (EPSP) proportional to the synaptic weight W i . The time course of the EPSP is expressed by a temporal relation function g( t) that depends on the temporal difference s − t and the conduction delay τ i (namely, t = s + τ i − t). For multiple pre-synaptic spikes, the EPSP is the linear sum of the EPSPs evoked by the pre-synaptic spikes. Then, the post-synaptic membrane potential v t at time t is then given by The binary variable z t is set to 1 when the post-synaptic neuron z generates a spike at time t and 0 otherwise. For the parameters τ = {τ i } and W = {W i } and a given spatio-temporal spike pattern x = {x is }, the post-synaptic neuron z is assumed to generate a single spike within the experimental time period 0 ≤ t < T, i.e., t z t = 1. Such a constraint is important and is commonly applied in studies of temporal coding (e.g., Bohte et al., 2002) where the output of the SNN in temporal coding is defined as the timing of the post-synaptic spike. The present study initially follows the previous studies, but later proposed an alternative model without the constraint (see section 2.5). This study also assumes that for a spatio-temporal spike pattern x = {x is }, the probability of generating a post-synaptic spike z t = 1 at time t is exponentially proportional to the EPSP normalized by a constant Z = t ′ exp(v t ′ ) as follows: Frontiers in Computational Neuroscience | www.frontiersin.org Note that p(z t = 1|x) > 0 at any time t. In other words, the post-synaptic neuron z can elicit a spike even before the first pre-synaptic spike arrives. In biological neural networks, such spikes are induced by physiological noise. Note also that the normalizing constant Z includes a term v t ′ denoting the membrane potential at a future time t ′ > t. Therefore, sampling the post-synaptic spike z t = 1 requires anti-causal information.
The alternative model proposed in section 2.5 overcomes this limitation.

Multinoulli-Bernoulli Mixture Model
This subsection introduces the mixture probabilistic model. Let x be a set of visible binary random variables {x is }, each following a Bernoulli distribution, and z be a set of latent binary random variables {z t } following a Multinoulli distribution, i.e., t z t = 1. Also let ω = {ω t } denote the mixture weights, and π = {π ist } denote the posterior probability of x is = 1 given z t = 1. The generative Multinoulli-Bernoulli mixture model p(z, x; ω, π ) is then expressed as Here, the posterior probability π ist is substituted by sigm(V ist ) using the sigmoid function sigm(u) = (1 + exp(−u)) −1 . In addition, V ist is modeled by the function W i g(s + τ i − t) − v where v is a bias parameter and the mixture weight w t is assumed to have a constant value e b . The log-probability is then rewritten as = log p(z, x; W, τ ).
When the experimental time period T is sufficiently long and the time t is sufficiently distant from the temporal boundaries (0 and T),b is independent of the post-synaptic spike timing t; that is, . The posterior probability of z t = 1 given x = {x is } is then equivalent to Equation (2);

Learning Algorithm Based on the EM Algorithm
The SNN model Equation (2) can be optimized through optimization of the Multinoulli-Bernoulli mixture model Equation (4) under the certain constraints . Let θ be the parameter set and X be a dataset of spatio-temporal spike patterns x. In general, training a generative model involves maximizing the model evidence E x∼X [log p(x; θ )] (Murphy, 2012). For an arbitrary distribution q(z), the evidence is expressed as and D KL (·) is the Kullback-Leibler divergence. The model evidence E x∼X log p(x; θ ) can be maximized by the Expectation-Maximization (EM) algorithm. In the E-step, q(z) is substituted by the posterior probability p(z|x; θ old ) of the mixture model with the currently estimated parameter set θ old . The M-step updates the estimated parameter set θ to maximize the evidence lower bound L(θ ). The present study employs a stochastic version of the EM algorithm (Sato, 1999;Nessler et al., 2009). Given a spatio-temporal spike patternx, the stochastic EM algorithm first generates a post-synaptic spikeẑ by Equation (2). This spike generation corresponds to the E-step of the EM algorithm. Next, the parameters θ = {W, τ } are updated to maximize the evidence lower bound L(θ ) using a gradient ascent algorithm. This parameter update corresponds to the M-step of the EM algorithm. From an SNN perspective, the parameter update corresponds to synaptic plasticity with delay learning. Note that, unlike the original EM algorithm, the M-step of the stochastic EM algorithm is not guaranteed to maximize the model evidence E x∼X log p(x; θ ) but at least maximizes the evidence lower bound L(θ ).
Based on Equation (4), the gradients of the evidence lower bound L(θ ) w.r.t. the conduction delay τ i and synaptic weight W i are respectively given by; As Equation (4) and the first term in Equation (5) are independent of the time step δ, the second term in Equation (5) (which does depend on δ) is multiplied by δ: This normalization makes Equation (5) robust to the size of the time step δ. The parameters θ are then updated as follows: where η is a learning rate. The conditions are similar in a supervised manner, except that the post-synaptic spikeẑ is sourced externally.

Classification
A spatio-temporal spike pattern x is associated with one of the timings t = 0δ, 1δ, . . . depending on the timing t of the generated post-synaptic spike z t = 1. However, a given dataset X of spatiotemporal spike patterns x should be classifiable into smaller groups. In this study, the decision boundaries between N groups are defined as the n-th N-quantiles (n = 1, 2, . . . , N − 1) of the generated post-synaptic spike timings t for which z t = 1. For example, suppose that a dataset X contains 100 spatio-temporal spike patterns x falling into estimated two groups. The spatiotemporal spike patterns x are then sorted by the timings t of the post-synaptic spikes z t = 1. After sorting, patterns 1-50 are classified into group 1 and the remainder are classified into group 2.

Alternative Spiking Neural Network
In the SNN introduced above, the Multinoulli distribution p(z|x; τ , W) must be sampled over time t to generate a postsynaptic spike z t = 1. In other words, whether a post-synaptic spike z t = 1 is generated at time t anti-causally depends on the EPSP v t ′ and post-synaptic spike z t ′ at a future time t ′ > t.
To remove this unrealistic assumption, this subsection slightly modifies the SNN formulation. The new formulation replaces the Multinoulli distribution with multiple independent Bernoulli distributions and introduces the following Bernoulli-Bernoulli mixture model with posterior probability In this case, given a spatio-temporal spike pattern x, the posterior probabilities p(z t |x; ω, π, ρ) of the post-synaptic spikes z t = 1 at times t are completely independent. As described above, the prior probability ω t of z t = 1 is replaced with a time-independent term e b and the posterior probability π ist of x is = 1 given z t = 1 is modeled as In addition, the posterior probability ρ ist is substituted by a constant sigm(v). The log-probability function becomes . The prior probability ω t is replaced by an independent variableb and a variableĉ that depends onb and v.
The posterior probability of z t = 1 given x is expressed as The variablê b is replaced by a variableb, which is treated as an independent parameter hereafter. The independent parameter b in the Bernoulli-Bernoulli mixture model is additional to the parameters in the Multinoulli-Bernoulli mixture model.
The evidence lower bound L(θ ) is given by . The gradients of L(θ ) w.r.t. the parameters τ i and W i are, respectively given by (8) Equations (7) and (8) are equivalent to Equations (4) and (5) respectively, but are summed over time t (they sum the postsynaptic spikes z t = 1). Therefore, given a spatio-temporal spike pattern x and a single post-synaptic spike z t = 1, the parameters are updated as in the Multinoulli-Bernoulli mixture model.
Unlike the SNN based on the Multinoulli-Bernoulli mixture model, the above SNN is unconstrained by the number of postsynaptic spikes (i.e., t z t ∈ Z). To prevent an excessive number or the complete absence of post-synaptic spikes, this subsection introduces homeostatic plasticity (Turrigiano et al., 1999;Turrigiano and Nelson, 2000), a biological mechanism that presumably maintains the excitability of a neuron within a regular range (Van Rossum et al., 2000;Abraham, 2008;Watt and Desai, 2010;Matsubara and Uehara, 2016). The additional parameterb in Equation (6) embodies the intrinsic excitability of the Bernoulli neuron z. The intrinsic excitabilityb is adjusted after every receipt of the spatio-temporal spike pattern x. When post-synaptic neuron z generates at least one spike (i.e., t z t > 0), its intrinsic excitabilityb reduces byb − ; conversely if no spikes are generated (i.e., t z t = 0), its intrinsic excitabilityb increases byb + . Algorithmically, this conditional expression is given bỹ The spatio-temporal spike patterns x are classified as described in section 2.4, with an important difference; each post-synaptic spike z t = 1 is weighted by the inverse number of generated postsynaptic spikes, i.e., 1/ t z t . In other words, a spatio-temporal spike pattern x is associated with a group by majority voting of multiple post-synaptic spikes z t = 1.

Windows of Plasticity
In this study, the temporal relation function g( t) was expressed as the following Gaussian function: Here, the parameters µ and σ denote the mean and standard deviation of the distribution, respectively. Figure 2A plots the temporal relation function g( t) when µ = 1.5 ms and σ = 1 ms. The EPSP is commonly modeled by the double-exponential function (e.g., Shouval et al., 2010), which has zero gradient for all t < 0. As the proposed SNN is trained according to the gradient of the temporal relation function g( t), the doubleexponential function is unsuitable for the purpose. In this study, the causality was preserved by clamping the temporal relation function g(s + τ i − t) to 0 for all s − t < 0. The experimental time period T was 50 ms, the time step δ was set to 0.05 ms. The remaining parameters were set to µ = 1.5 ms, σ = 1.0 ms, v = 10, and η = 0.001, unless otherwise stated. The gradients of the evidence lower bound L(θ ) w.r.t. the conduction delay τ i and synaptic weight W i were considered to be changes by the delay learning and synaptic plasticity, respectively. When plotted against the temporal difference t = s + τ i − t, the gradients represent the temporal windows of the delay learning and synaptic plasticity. As shown in Figure 2B, the conduction delay τ i increases (decreases) when the temporal difference t is smaller (larger) than 1.5 ms. The EPSP peaks at 1.5 ms, indicating that a post-synaptic spike x t = 1 attracts the nearby pre-synaptic spikes x is = 1. The temporal window of the synaptic weight W i is similar to that of the STDP ( Figure 2C). Specifically, the synapse is potentiated (depressed) when the temporal difference t is positive (negative), but is always depressed when the temporal difference t is large and positive, as noted in previous electrophysiological studies (Markram et al., 1997;Bi and Poo, 1998) and theoretical studies (Shouval et al., 2002;Wittenberg and Wang, 2006;Shouval et al., 2010). Figure 2D plots the amount of synaptic modification W i versus the current synaptic weight W i . The decreasing trend is again consistent with previous electrophysiological (Bi and Poo, 1998) and theoretical (Van Rossum et al., 2000;Gütig et al., 2003;Shouval et al., 2010;Matsubara and Uehara, 2016) studies.

Results for Toy Spike Patterns
First, the SNN and the proposed learning algorithm based on the Multinoulli-Bernoulli mixture model were evaluated on toy spike patterns generated by three pre-synaptic neurons. Figure 3A shows typical spatio-temporal spike patterns x. The pre-synaptic neurons x 0 , x 1 , and x 2 elicited spikes at 1+ξ 0 , 5+ξ 1 , and 13+ξ 2 ms respectively in spike pattern A, and at 13+ξ 0 , 9+ξ 1 , and 1+ξ 2 ms, respectively in spike pattern B. Here, ξ 0 , ξ 1 , and ξ 2 are the noise terms following a uniform distribution U(−1, 1). Fifty samples from each of spike patterns A and B were extracted as the training dataset; other fifty samples were reserved as the test dataset. All synaptic weights W i were uniformly initialized to 1, and the conduction delays τ i (ms) were sampled from the uniform distribution U(5, 15). The post-synaptic neuron z was repeatedly fed with the spatiotemporal spike patterns x, and generated spikes z t = 1. The gray shaded areas in Figure 3A depicts the posterior probability p(z t = 1|x; W, τ ) of spiking each millisecond. Figure 3B shows the probability distribution of post-synaptic spike timings z t = 1 for the test dataset before learning. The unlearned model cannot discriminate the spike patterns A and B using the post-synaptic spike timings z t = 1. Note that the proposed SNN is intrinsically probabilistic, meaning that a post-synaptic neuron can elicit a spike at an arbitrary time t. During the learning procedure, the synaptic weights W i and the conduction delays τ i were gradually changed by the proposed learning algorithm (Figure 3E). The conduction delays τ i (ms) were clamped to the range 0-20 ms but never reached the limits. Figure 3F shows random samples of the postsynaptic spike timings z t = 1, separated into spike patterns A and B by the decision boundary (the solid black line). After sufficiently many samples, the spike patterns converged into limited temporal ranges and the two groups were clearly demarcated. The post-synaptic spike timings z t = 1 and their distributions after the learning procedure are exemplified in Figures 3C,D, respectively. In this trial, the classification accuracy of both the training and test datasets converged to 100 %. The average classification accuracy over 100 trials was 99.6 ± 2.6% for the training dataset and 99.6 ± 2.5% for the test dataset ( Figure 3G). The results are summarized in Table 2.

Results for the Iris Flower Dataset
Next, the proposed learning algorithm was evaluated on the iris flower dataset (Fisher, 1936), which consists of the data of three species (Setosa, Versicolor, and Virginica). The dataset of each species comprises 50 data points, and each data point consists of four features (the lengths and widths of the sepals and petals). Accordingly, the number N of pre-synaptic neurons was set to 4. The spike timing of each feature was normalized to the range 0-10 ms. As the sepal length k 0 ranges from 4.3 to 7.9 cm, the pre-synaptic neuron x 0 generated a single spike at s 0 = 10 × k 0 −4.3 7.9−4.3 ms. The generated post-synaptic spikes z t = 1 were then classified into three groups corresponding to the three species. Fifteen randomly chosen data points were assigned as the test set; the remainder were used as the training dataset. The other conditions were those described in section 3.2. The results are summarized in Figure 4. The average classification accuracy over 100 trials was 89.5 ± 5.3% for the training dataset and 89.5 ± 7.5% for the test dataset.
For comparison, this subsection introduces the results of the multilayer ReSuMe algorithm (Sporea and Grüning, 2013), a supervised learning algorithm for multilayer SNNs in rate coding. The multilayer ReSuMe algorithm was also evaluated on the iris flower dataset. In this evaluation, a trial was deemed successful if the classification accuracy exceeded 95% on the training dataset. After weeding out the unsuccessful trials, the proposed learning algorithm and multilayer ReSuMe algorithm achieved a classification accuracy of 94.7 ± 6.5 and 94.0 ± 0.79% respectively, on the test dataset (see Table 3). Despite its singlelayer architecture, the SNN trained by the proposed learning algorithm classified the iris flower dataset at least as accurately as the multilayer SNN trained by the multilayer ReSuMe algorithm.

Results for the MNIST Dataset
In this subsection, the proposed learning algorithm was evaluated on the MNIST dataset (LeCun et al., 1998), which contains 28 × 28 grayscale images of 70,000 handwritten digits. In standard rate coding, the pixel intensities are represented by 28 × 28 = 784 pre-synaptic neurons (Nessler et al., 2009;Beyeler et al., 2013;O'Connor et al., 2013;Querlioz et al., 2013;Neftci et al., 2014;Diehl and Cook, 2015;Zambrano and Bohte, 2016). To emphasize the temporal coding, this study instead represented the rows and columns of the images by 28 pre-synaptic neurons and their 28 corresponding spike timings, respectively. When a pre-synaptic neuron x i generated a spike at time s, the intensity of the pixel in row i and column s of the image was considered to exceed 0.5 (see Figure 5A). For simplicity, only the digits 0 and 8 were included in the analysis. After removing the other digits, 13,728 images were available. This experiment employed a 10-fold cross-validation: 10 % of images were randomly chosen for the test set. The results are summarized in Figure 5. This experiment yielded much more pre-synaptic spikes x is = 1 than the previous two experiments. During transmission, the pre-synaptic spikes x is = 1 that largely contributed to the postsynaptic spike z t = 1 were emphasized. More specifically, if µ−1 < t = s+τ i −t < µ+1 for the post-synaptic spike z t = 1, the pre-synaptic spikes x is = 1 was emphasized with a thick color line, whereas other pre-synaptic spikes x is = 1 were attenuated (see Figures 5A,C). According to Figure 5C, the C-shaped edges are heightened, indicating that the SNN selectively detects and responds to these edges. In general, the C-shaped edge appears on the left side of digit 0 and in the center of digit 8, so the postsynaptic spikes z t = 1 responded earlier to "0" than to "8". In some trials, the SNN responded to reversed C-shaped edges (or to edges in digits such as "7"), and the post-synaptic spikes z t = 1 responded earlier to "8" than to "0" (see Figure 6). The average classification accuracy over 100 trials was 88.7 ± 3.4% for the training dataset and 88.7 ± 3.7% for the test dataset.

Results without Delay Learning
The proposed learning algorithm was also evaluated without delay learning. More specifically, the synaptic weights W i were updated by Equation (5), and the conduction delays τ i were clamped to their initial values (i.e., were not updated by Equation  4). The other conditions were those described in previous subsections. All of the accuracies were drastically reduced (see Table 4).

Results with Supervised Learning
The proposed learning algorithm is adaptable to supervised learning, in which synaptic plasticity and delay learning are not self-driven by generated post-synaptic spikes, but are driven by external spikesz t = 1 generated at the desired timings (Bohte et al., 2002;Ponulak, 2005;Gütig and Sompolinsky, 2006;Pfister et al., 2006;Paugam-Moisy et al., 2008;Taherkhani et al., 2015;Matsubara and Torikai, 2016). However, the desired timingsz t = 1 are generally unknown. In this study, the spatio-temporal spike patterns x are classified by whether the post-synaptic spike timing z t = 1 arrives earlier or later than the boundary decision. The group with the latest average post-synaptic spike timing should generate the most delayed post-synaptic spike. Therefore, the "late" group were given an external post-synaptic spikez t+d = 1 (where d > 0) after the generated post-synaptic spike z t = 1. Meanwhile, the "early" group were given a post-synaptic spikẽ z t−d = 1 before the generated post-synaptic spike z t = 1. These post-synaptic spikes z t+d = 1 and z t−d = 1 tend to delay or advance the generated post-synaptic spike z t = 1, respectively. In this study, the time parameter d was set to δ = 0.05 ms. The results of supervised learning are summarized in Tables 2, 3. In almost every case, the classification accuracy was improved, confirming that the proposed learning algorithm can be adapted to supervised learning without manually adjusting the spike timings.

Results with Multiple Post-synaptic Spikes
This subsection evaluates the SNN based on the Bernoulli-Bernoulli mixture model described section 2.5. The homeostatic plasticity parameters were set to b + = 0.01 and b − = 0.0001, indicating that, on average, one in every hundred spatio-temporal spike patterns induces no post-synaptic spike z t = 1. To determine the decision boundaries, the absence of the postsynaptic spike z t = 1 was considered to indicate an exceptionally late post-synaptic spike (e.g., z T = 1). As shown in Tables 3, 5, the classification accuracy was improved in the iris flower dataset but degraded in the MNIST dataset. As an example, Figure 7 shows the results of applying multiple post-synaptic spikes in the MNIST dataset.

DISCUSSION
Rate-coding SNNs have already succeeded in various unsupervised and supervised learning tasks (Brader et al., 2007;Nessler et al., 2009;Beyeler et al., 2013;O'Connor et al., 2013;Diehl and Cook, 2015;Zambrano and Bohte, 2016), and have provided the opportunity for efficient computational architectures (Querlioz et al., 2013;Neftci et al., 2014;Cao et al., 2015). However, rate coding requires repeated sampling of the generated spikes, which increases the computational time (VanRullen and Thorpe, 2001). The alternative approach is temporal coding, which encodes the information into a spatio-temporal spike pattern. Temporal coding requires the timing of at least one spike (Bohte et al., 2002;Ponulak, 2005;Gütig and Sompolinsky, 2006;Pfister et al., 2006;Paugam-Moisy et al., 2008;Yu et al., 2014;Taherkhani et al., 2015), potentially enabling more efficient computational architectures than those based on rate coding (Matsubara andTorikai, 2013, 2016). These studies computed gradients of spike timing or excitatory post-synaptic potential with respect to synaptic weight and/or conduction delay, and adjusted to them to elicit post-synaptic spikes at the desired timings. However, they focused exclusively on supervised learning algorithms, which always require teacher spikes at the desired timings. In general, the desired timings are unknown and require careful manual adjustment. Therefore, this approach limits the flexibility in the context of machine learning and poorly represents biological systems that self-adapt to changing environments. Unsupervised delay learning algorithms have received far less attention than supervised learning. The few studies published in this area have not approached practical tasks such as classification and reproduction of given spike patterns (Hüning et al., 1998;Eurich et al., 1999Eurich et al., , 2000. In contrast, the proposed unsupervised learning algorithm approximates the EM algorithm (see section 2), and can therefore classify the given spatio-temporal spike patterns by inferring their hidden causes. Under the proposed learning algorithm, the SNN also detected frequent patterns in the given spatio-temporal spike patterns. In addition, the proposed learning algorithm can be adapted to supervised learning without the application of externally determined spike timings. The proposed learning algorithm relates the synaptic weight modification W to the temporal difference t and the current synaptic weight W (see section 3.1). When the temporal difference t is positive, the synapse is potentiated; when t is negative or largely positive, it is depressed. This relationship is consistent with previous electrophysiological studies (Markram et al., 1997;Bi and Poo, 1998) and has been discussed in theoretical studies (Shouval et al., 2002;Wittenberg and Wang, 2006;Shouval et al., 2010). The amount of synaptic modification W i decreases with increasing current synaptic weight W i . This relationship is also consistent with previous electrophysiological study (Bi and Poo, 1998) and is considered to maintain neuronal activity by suppressing excessive potentiation (Van Rossum et al., 2000;Gütig et al., 2003;Shouval et al., 2010;Matsubara and Uehara, 2016). The Results of the iris flower dataset (summarized in Figure 4) demonstrate that during the early learning phase, the pre-synaptic spikes are not simultaneously delivered to the postsynaptic neuron, and the synaptic weights and conduction delays change only gradually. After 50,000 samples, the conduction delays rapidly changed and almost converged to certain values before 60,000 samples. At that time, the pre-synaptic spikes arrived at the post-synaptic neuron simultaneously. Following the changing conduction delays, the synaptic weights increase, post-synaptic spikes become clustered, and the classification accuracy becomes higher (see Figures 4E-G). Similar time courses were observed for the toy spike patterns (see Figure 3). These results support Fields (2015) and Baraban et al. (2016), who hypothesized that conduction delay is adjusted for the synchronous arrival of multiple pre-synaptic spikes before the synaptic modification dominates. They also demonstrated that clustered post-synaptic spikes and accurate classification require the combined optimization of conduction delay and synaptic weight. Before the synaptic weights increase, the classification accuracy is poor. Therefore, delay learning and synaptic plasticity are inherently linked as mentioned by Jamann et al. (2017). Although, the detailed of biological delay learning (i.e., activitydependent myelination) remains unclear, this study provides a good hypothetical explanation of the mechanism underlying this process.
With the MNIST dataset, the test accuracies of the proposed algorithm based on the Multinoulli-Bernoulli mixture model were almost equal to the corresponding train accuracies as summarized in Tables 2, 4. This could be because the MNIST has a limited number of local minima insensitively to the separation of the test dataset. Since the experiment employed a 10-fold crossvalidation, the average test accuracy becomes equal to the average train accuracy if the proposed algorithm always converges to the same local minimum. Actually, in all the 100 trials, the proposed algorithm apparently converged to one of two local minima shown in Figures 5, 6.
When the conduction delays were fixed, the classification accuracy of the proposed learning algorithm drastically reduced (see Table 4). This occurred because the classification depends on the timing t of the post-synaptic spike z t = 1, which remained almost static the delay modification. This result demonstrates one advantage of the proposed learning algorithm over other temporal coding learning algorithm such as the ReSuMe algorithm (Ponulak, 2005;Sporea and Grüning, 2013) and the Tempotron algorithm (Gütig and Sompolinsky, 2006;Yu et al., 2014), which learn by synaptic modification only. Gerstner et al. (1996) and Bohte et al. (2002) assumed multiple paths from a single source to a post-synaptic neuron with various delays. In such a situation, synaptic modification can adjust the post-synaptic spike timing by pruning the inappropriate paths leaving the appropriate ones only. However, an SNN based on this approach either requires numerous unused paths for future development, or its flexibility is limited by the initial network connections and conduction delays. SNNs optimized by delay learning algorithms such as the proposed learning algorithm are more efficient and flexible. A performance comparison between the two learning approaches is outside the scope of this paper, but is a worthwhile future task.
Despite the single-layer architecture, the classification accuracy of the SNN trained by the proposed learning algorithm is comparable to (even slightly superior to) that of multilayer ReSuMe, a supervised learning algorithm designed for multilayer SNNs (Sporea and Grüning, 2013). Supervised learning algorithms such as variants of ReSuMe (Ponulak, 2005;Sporea and Grüning, 2013;Taherkhani et al., 2015) and Tempotron (Gütig and Sompolinsky, 2006;Yu et al., 2014) require the desired timings of post-synaptic spikes, which must be determined before the learning procedure. As the desired timings are generally unknown, they should be adjusted manually and carefully in classification tasks. On the other hand, the proposed learning algorithm automatically determines the desired timing from the given dataset and the initial parameter values, even during a supervised learning. The removal of the need for external timing is another advantage of the proposed learning algorithm.
In temporal coding, the proposed learning algorithm requires fewer parameters and fewer spikes than existing learning algorithms in rate coding. Single-layer SNN in rate coding required 4 pre-synaptic and 3 post-synaptic neurons with 15 parameters (4 weight parameters and a bias parameter for each group) for the iris flower dataset, and more than 784 neurons and parameters for 2 digits in the MNIST dataset (e.g., Nessler et al., 2009;Neftci et al., 2014). In these algorithms, each presynaptic neuron corresponds to a feature or a pixel, and each post-synaptic neuron corresponds to a group. As a single neuron represents multiple values by its spike timing in temporal coding, the proposed neural network requires 4 pre-synaptic neurons and , and the post-synaptic spikes z t = 1 before learning. Thick colored lines indicate the pre-synaptic spikes x is = 1, elicited within µ − 1 < t = s + τ i − t < µ + 1 for z t = 1; other pre-synaptic spikes x is are depicted as thin gray lines (the left vertical axes). Gray shaded areas denote the probability of eliciting a post-synaptic spikes z t = 1 per 1 ms (the right vertical axes). (B) Test distributions of the post-synaptic spike timings z t = 1 before learning. (C) Post-synaptic spike timings z t = 1 after learning, given the spatio-temporal spike patterns x in (A). (D) Test distributions of the post-synaptic spike timing z t = 1 after learning. (E) Trajectories of the synaptic weights W i and conduction delays τ i during the learning procedure. The darkest lines denote the values of the parameters W 0 and τ 0 , and the lighter lines denote W i and τ i for i = 1, 2, . . . . (F) Random samples of the post-synaptic spike timings z t = 1. The black solid line marks the decision boundary between spike patterns the hand-written digits 0 and 8. (G) Classification accuracy in the training and test datasets, each averaged over 100 trials.
1 post-synaptic neuron with 10 parameters (4 synaptic weights W i , 4 conduction delays τ i , and 2 decision boundaries) for the iris flower dataset, and only 29 neurons and 57 parameters for 2 digits in the MNIST dataset. In addition, each feature is the iris flower dataset was encoded into a single spike timing with in the range 0-10 ms with a time step of δ = 0.05 ms. Therefore, the resolution of each feature was 200 stages. To achieve the same resolution in rate coding, the pre-synaptic spikes must be generated up to 200 times. The proposed learning algorithm requires far fewer pre-synaptic spikes. In future works, rate coding in the proposed and existing learning algorithms should be compared with similar numbers of parameters and spikes. Note that, as the proposed SNNs have only one post-synaptic neuron, they can classify two or three groups at most. To classify FIGURE 6 | Other results on the hand-written digits 0 and 8 in the MNIST dataset (LeCun et al., 1998). (A) Representative spatio-temporal spike patterns of digit 0 (top panel) and digit 8 (bottom panel), and the post-synaptic spikes z t = 1 after the learning procedure as per Figure 5C. Thick colored lines indicate the pre-synaptic spikes x is = 1, elicited within µ − 1 < t = s + τ i − t < µ + 1 for z t = 1; other pre-synaptic spikes x is are depicted as thin gray lines (the left vertical axes). Gray shaded areas denote the probability of eliciting a post-synaptic spikes z t = 1 per 1 ms (the right vertical axes). (B) Test distributions of the post-synaptic spike timing z t = 1 after learning. The SNN selectively responds to the reversed C-shaped edges, so the post-synaptic spikes z t = 1 responding to "8" are earlier than those responding to "0" (in contrast to Figure 5). more groups, the proposed SNN must be generalized to multiple interacting post-synaptic neurons.
This study proposed two types of SNNs; one based on the Multinoulli-Bernoulli mixture model, the other on the Bernoulli-Bernoulli mixture model. The former takes a single post-synaptic spike from the temporal distribution, whereas the latter can independently elicit a post-synaptic spike at every time step and exhibits a burst-like behavior. The SNN based on the Bernoulli-Bernoulli mixture model classified the iris flower dataset more accurately, and the MNIST dataset less accurately, than the SNN based on the Multinoulli-Bernoulli mixture model. Owing to the multiple generations of post-synaptic spikes, there was little variation among trials in the iris flower dataset, and the learning proceeded more robustly than in the SNN based on the Multinoulli-Bernoulli mixture model. Consequently, the accuracy was improved in this dataset. Conversely, the number of pre-synaptic spikes representing a single image varied widely in the MNIST dataset, generating a widely varying number of post-synaptic spikes. The SNN based on the Bernoulli-Bernoulli mixture model sometimes detected too many edges similar to the template edge in various sub-regions, and sometimes detected no edge. Meanwhile, the SNN based on the Multinoulli-Bernoulli mixture model always detected the single edge that best fitted the template edge. Therefore, the learning of the SNN based on the Bernoulli-Bernoulli mixture model proceeded less robustly, lowering the classification accuracy of the MNIST dataset. The SNN proposed in section 2.1 can also be implemented in continuous time. However, the number of pre-synaptic spikes x is = 1 is unlimited in this mode, and the joint probability p(x, z) of the Multinoulli-Bernoulli and Bernoulli-Bernoulli mixture models is difficult to define. For this reason, the SNN was implemented in discrete time. The EPSP of the proposed SNN is the linear sum of the EPSPs induced by multiple presynaptic spikes x is = 1, and the proposed learning algorithms in sections 2.3 and 2.5 were normalized by the time step δ. Hence, the proposed SNN is robust to the time step δ and can be approximated to continuous time as the time step δ approaches 0.
Note that Equation (4) and the first term in Equation (5) are independent of the time step δ but the second term in Equation (5) does depend on δ. When adjusting the time step δ.
In contrast, the behavior of dynamical spiking neurons described by an ordinary differential equation is sensitive to the  Figure 5C. Thick colored lines indicate the pre-synaptic spikes x is = 1, elicited within µ − 1 < t = s + τ i − t < µ + 1 for z t = 1; other pre-synaptic spikes x is are depicted as thin gray lines (the left vertical axes). Gray shaded areas denote the probability of eliciting a post-synaptic spikes z t = 1 per 1 ms (the right vertical axes). (B) Test distribution of mean timings of the post-synaptic spikes z t = 1 after learning. Like the Multinoulli-Bernoulli mixture model (see Figure 5), the SNN selectively responds to the C-shaped edges.
time step and the numerical simulation algorithm (Hansel et al., 1998). A continuous-time SNN and its corresponding generative model will be explored in future work.
The Bernoulli-Bernoulli mixture model independently draws a post-synaptic spike z t = 1 at each time step. However, biological studies have confirmed that the intervals between the generated spikes do not follow a Poisson distribution, implying that the spikes in biological neural networks are not independently generated (Burns and Webb, 1976;Levine, 1991). When a dynamical neuron generates a spike, it enters the relative refractory period. During this time, the neuron becomes less sensitive to stimuli and is less likely to generate a spike. After the relative refractory period, the neuron returns to its resting state and resumes its usual chance of generating a spike (Izhikevich, 2006a, Chapter 8). Incorporated with these dynamics, the SNN based on the Bernoulli-Bernoulli mixture model would be purged of its burst-like behavior in the MNIST dataset (which degraded the classification accuracy), and would be rendered more computationally efficient and more biologically realistic. A dynamic version of the proposed SNN is a further opportunity for future work.

CONCLUSION
This study proposed an unsupervised learning algorithm that adjusts the conduction delays and synaptic weights in an SNN. The proposed learning algorithm approximates the Expectation-Maximization (EM) algorithm and was confirmed to classify the spatio-temporal spike patterns in several practical problems. In addition, the proposed learning algorithm is adjustable to supervised learning, which improves its classification accuracy. The formulation of the proposed learning algorithm is partially consistent with the synaptic plasticity demonstrated in previous biological and theoretical studies. Therefore, the proposed learning algorithm is a strong candidate model of biological delay learning and will contribute to further investigations of SNNs in temporal coding.

AUTHOR CONTRIBUTIONS
The author confirms being the sole contributor of this work and approved it for publication.

FUNDING
This study was partially supported by the JSPS KAKENHI (16K12487), Kayamori Foundation of Information Science Advancement, The Nakajima Foundation, and SEI Group CSR Foundation.