ORIGINAL RESEARCH article
Unsupervised Learning by Spike Timing Dependent Plasticity in Phase Change Memory (PCM) Synapses
- 1Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano and IU.NET, Milano, Italy
- 2Research and Development Process, Micron Semiconductor Italia, Vimercate, Italy
We present a novel one-transistor/one-resistor (1T1R) synapse for neuromorphic networks, based on phase change memory (PCM) technology. The synapse is capable of spike-timing dependent plasticity (STDP), where gradual potentiation relies on set transition, namely crystallization, in the PCM, while depression is achieved via reset or amorphization of a chalcogenide active volume. STDP characteristics are demonstrated by experiments under variable initial conditions and number of pulses. Finally, we support the applicability of the 1T1R synapse for learning and recognition of visual patterns by simulations of fully connected neuromorphic networks with 2 or 3 layers with high recognition efficiency. The proposed scheme provides a feasible low-power solution for on-line unsupervised machine learning in smart reconfigurable sensors.
Neuromorphic engineering represents one of the most promising fields for developing new computing paradigms complementing or even replacing current Von Neumann architecture (Indiveri and Liu, 2015). Tasks such as learning and recognition of visual and auditory patterns are naturally achieved in the human brain, whereas they require a comparably long time and excessive power consumption in a digital central processor unit (CPU). To address the learning task, one approach is to manipulate the synaptic weights in a multilayer neuron architecture called perceptron, where neurons consist of CMOS analog circuits to perform spike integration and firing, while synapses serve as interneuron connections with reconfigurable weights (Suri et al., 2011; Kuzum et al., 2012; Indiveri et al., 2013; Wang et al., 2015). Recent advances in nanotechnology have provided neuromorphic engineers with new devices which allow for synaptic plasticity, such as resistive switching memory (RRAM; Waser and Aono, 2007; Jo et al., 2010; Ohno et al., 2011; Ambrogio et al., 2013; Prezioso et al., 2015), spin-transfer-torque memory (STT-RAM; Locatelli et al., 2014; Thomas et al., 2015; Vincent et al., 2015), or phase change memory (PCM; Suri et al., 2011; Bichler et al., 2012; Burr et al., 2014; Eryilmaz et al., 2014). In particular, recent works have shown the ability to train real networks for pattern learning, adopting backpropagation (Burr et al., 2014) and recurrently-connected network (Eryilmaz et al., 2014). The advantage of these devices over CMOS is the small area, enabling the high synaptic density which is required to achieve the large connectivity (i.e., ratio between synapses and neurons) and highly parallelized architecture of the human brain. In addition, nanoelectronic synapses allow for low-voltage operation in hybrid CMOS-memristive circuits, and for augmented functionality with respect to CMOS technology, thanks to the peculiar phenomena taking place in the memristive element. For instance, the CMOS-memristive synapse showed the ability to perform spike-timing dependent plasticity (STDP; Yu et al., 2011; Ambrogio et al., 2013), the transition from short-term to long-term learning (Ohno et al., 2011), a multilevel cell operation allowing for gradual weight update (Wang et al., 2015) and a stochastic operation suitable to redundant neuromorphic networks (Suri et al., 2012; Yu et al., 2013; Garbin et al., 2015; Querlioz et al., 2015).
In this context, PCM technology is an attractive solution for nanoelectronic synapse in high density neuromorphic systems. PCM is currently under consideration for stand-alone (Servalli, 2009) and embedded memories (Annunziata et al., 2009; Zuliani et al., 2013). Generally, the device appears with one-transistor/one-resistor (1T1R) architecture which allows for strong immunity to voltage variations as well as relatively compact structure. Either metal-oxide-semiconductor (MOS) or bipolar junction transistor (BJT) have been used in the 1T1R architecture. In some case, the one-diode/one-resistor (1D1R) structure has been demonstrated, capable of extremely small area and high density using the crosspoint architecture (Kau et al., 2009). The PCM technology platform has been used for computing applications for Boolean logic functions (Cassinerio et al., 2013) and arithmetic computation (Wright et al., 2011), including numerical addition, subtraction and factorization (Hosseini et al., 2015). Neuromorphic synapses have also been studied: Kuzum et al., have first demonstrated STDP in PCM by use of an ad-hoc train of pulses at either terminal of the device (Kuzum et al., 2012). Suri et al., have presented a 2-PCM synapse, where the 2 PCM devices serve as complementary potentiation and depression via gradual crystallization (Suri et al., 2011; Bichler et al., 2012). Supervised training and learning using back-propagation schemes were recently shown using PCM arrays (Burr et al., 2014; Eryilmaz et al., 2014). Despite the wealth of novel demonstrations of PCM technology, no STDP-based unsupervised learning and recognition with PCM synapse circuits has been presented so far.
Here we present a novel 1T1R synapse based on PCM capable of STDP. Potentiation of the synapse is achieved via partial crystallization enabling a gradual increase of synapse conductance, while synapse depression occurs by amorphization in the reset transition. STDP characteristics are demonstrated by experiments as a function of the initial resistance state and of the number of potentiating pulses. We demonstrate the ability to learn and recognize patterns in a fully-connected neuromorphic network and we propose for the first time the input noise as a means to depress background synapses, thus enabling on-line pattern learning, forgetting and updating. Training of the PCM synapse network with alternating and multiple visual patterns according to the MNIST data base is shown. Pattern recognition with multiple layers is finally addressed for improved learning efficiency.
Materials and Methods
Figure 1 shows the PCM device used in this work (a) and its characteristics. The PCM was fabricated with 45 nm technology and consists of an active Ge2Sb2Te5 (GST) layer between a confined bottom electrode (or heater) and a top electrode (Servalli, 2009). The PCM top electrode was made of a Cu/W/TiN multilayer connecting all cells along a row in the array, while the bottom electrode consisted of a tungsten plug and a sub-lithographic TiN heater connected to the GST layer. The active material GST is a well-known phase change material, which remains stable in 2 phases, namely the crystalline phase and the amorphous phase (Wong et al., 2010). The 2 phases differ by their respective resistance, as displayed by the I-V characteristics in Figure 1B: while the crystalline (set) state shows a relatively low resistance, the amorphous (reset) state shows high resistance and a typical threshold switching behavior at a characteristic threshold voltage VT (Ielmini and Zhang, 2007). To change the PCM state, positive voltage pulses are applied between the top electrode and the heater. Figure 1C shows the resistance R measured after the application of a rectangular write pulse as a function of the pulse amplitude V. The PCM device was initially prepared in the set state with R = 10 kΩ by application of a pulse with amplitude 1.2 V for 250 ns, before any applied pulse. Data show that R remains constant, until the applied voltage exceeds the voltage Vm for GST melting, causing amorphization, around 1.2 V, which corresponds to the melting voltage of the device. Above Vm, the applied pulse is able to induce melting, which leaves the GST volume in an amorphous phase as the voltage pulse is completed. The amorphous volume increases with V, thus leading to the increase of R with V in the characteristic of Figure 1C. To recover the initial crystalline phase, a rectangular pulse with voltage below Vm is applied. A voltage Vreset = 1.75 V is sufficient to induce a resistance change to about 20 MΩ, corresponding to a full reset state. Figure 1D shows the resistance R measured after a set pulse with voltage Vset = 1.05 V as a function of the pulse-width tP and for increasing initial R from 15 kΩ to 10 MΩ of the PCM (different colors in Figure 1D). In general, R decreases with increase in tP as a result of the increased crystalline fraction (Cassinerio et al., 2013). A pulse width of about 250 ns is generally sufficient to complete crystallization within the GST layer irrespective of the initial value of R, thus supporting the good quality of PCM in terms of fast memory, low write voltage and low power consumption.
Figure 1. Cross sectional view of a PCM obtained by transmission electron microscopy (TEM) (A), measured quasi-stationary I-V curves for the PCM device in the crystalline and amorphous phase (B), reset characteristic of R as a function of the write voltage for pulse-width 40 ns (C) and set characteristics of R as a function of the set pulse-width tP and voltage Vset = 1.05 V for variable initial PCM state (D). The PCM device shows fast switching at low voltage, thus supporting PCM technology for low-voltage, low-power synapses in neuromorphic systems.
Figure 2 schematically shows a neuron/synapse/neuron block of the neuromorphic network. Here, the synapse consists in a 1T1R structure where the PCM cell is connected in series with a MOS transistor. The transistor width and length must be suitable to drive a current around 300 μA, which is needed for set and reset transition in the PCM with 45 nm technology (Servalli, 2009). As a reference, an embedded PCM device with 1T1R structure has an area (almost equal to the transistor area) of 36F2, where F is the minimum feature size of the technology, for F = 90 nm and a write current of 400 μA (Annunziata et al., 2009). The 1T1R synapse has 3 terminals, namely the gate electrode of the transistor, the top electrode (TE) of the PCM and the bottom electrode consisting of the transistor channel contact not connected to the PCM. The synapse gate voltage VG is driven by the pre-synaptic neuron (PRE), which applies a sequence of rectangular spikes. The positive gate voltage activates a current spike in the synapse which is fed into the post-synaptic neuron (POST). Each neuron in the neuromorphic network consists of a leaky integrate and fire (LIF) circuit, where the input current spike is integrated by the first stage, thus raising the internal (or membrane) potential Vint. The TE voltage VTE is controlled by the POST, and is normally equal to a negative constant value, e.g., −30 mV. Thanks to the negative VTE, a negative current spike is generated in the 1T1R in correspondence of the PRE spike, hence causing a positive increase of Vint in the inverting integrator of Figure 2. The relatively low VTE ensures that the resistance state of the PCM is not changed, thus avoiding unwanted synaptic plasticity during the communication mode. The POST also controls the gate voltage of the synapse in the connection to the neuron in the next layer (not shown in Figure 2). Therefore, the scheme in Figure 2 represents the building block to be replicated to achieve a generic multilayer neuromorphic array. Note finally that the 1T1R synapse in Figure 2 can be considered a simplified version of the 2-transistor/1-resistor (2T1R) synapse presented by Wang et al. where communication and plasticity were achieved by 2 separate transistors (Wang et al., 2015), instead of only one transistor in the present solution.
Figure 2. Schematic illustration of the neuromorphic network with a 1T1R synapse. The PRE drives the MOS transistor gate voltage VG, thus activating a current spike due to the low negative TE voltage (VTE = −30 mV) set by the POST. The current spikes are fed into the POST, which eventually delivers a VTE spike back to the synapse as the internal voltage Vint exceeds a threshold Vth. The VTE spike includes a set and reset pulse to induce potentiation/depression according to the STDP protocol.
As Vint exceeds a given threshold Vth of a comparator, the fire stage delivers a pulse back to the TE to update the weight of the synapse. The TE spike contains 2 rectangular pulses, the second pulse having a higher amplitude than the first one. The specific shape of the VTE spike results in a change in the PCM resistance depending on the relative time delay between the PRE and POST spikes, in agreement with the STDP protocol. STDP in the PCM synapse is illustrated in Figure 3, showing the applied pulses from the PRE and the POST. The PRE spike is rectangular, with a 10 ms pulse-width and amplitude VG = 0.87 V, followed by a 10 ms after-pulse at zero voltage. The POST spike lasts 20 ms overall, and includes two pulses of width tP at the beginning of the first and the second halves of the total pulse. The amplitudes of the first and second pulses are Vset = 1.05 V and Vreset = 1.75 V, respectively, intercalated by wait times at zero voltage. Amplitudes Vset and Vreset are tuned to induce set transition (crystallization) and reset transition (amorphization), respectively, according to the PCM characteristics in Figure 1. These values should be suitably adjusted according to the specific memory technology integrated in the synapse.
Figure 3. Scheme of the applied pulses from the PRE and POST neurons to the 1T1R synapse. In the case of small positive delay Δt (A), when the PRE spike is applied just before the POST spike, the PCM receives a potentiating pulse with voltage Vset inducing set transition. On the other hand, for small negative delay Δt (B), when the PRE spike is applied just after the POST spike, the PCM receives a depressing pulse with voltage Vreset inducing reset transition. For positive/negative delays larger than 10 ms, there is no overlap between PRE and POST spikes, thus no potentiation/depression can take place.
We define the relative time delay Δt given by:
where tpost is the initial time of the POST spike and tpre is the initial time of the PRE spike, as shown in Figure 3. If the PRE spike appears before the POST spike (a), the relative delay Δt is positive and the PRE spike overlaps with the POST spike during the set pulse of voltage Vset, thus inducing set transition in the PCM with a consequent decrease of resistance. This corresponds to the so-called long-term potentiation (LTP) in the STDP protocol. If the PRE spike appears after the POST spike (b), the relative delay Δt is negative and the PRE spike overlaps with the POST spike during the reset pulse of voltage Vreset, thus inducing reset transition in the PCM with a consequent increase of resistance. This corresponds to the so-called long-term depression (LTD) in the STDP protocol.
We characterized STDP characteristics in a 1T1R synapse, obtained by wire-bonding a MOS transistor and a PCM device on 2 separate chips. The transistor size was L = 1 μm and W = 10 μm and the device was able to deliver sufficient current to switch the PCM device during set and reset. To demonstrate STDP operation, voltage pulses as in Figure 3 were applied to the transistor gate and to the TE terminal with variable delay Δt and variable initial resistance R0 of the PCM device. We used a pulse-width tP = 40 ns of set/reset pulses in the POST spike, i.e., the same as in Figures 1C,D. Figure 4 shows the measured change of conductance R0/R, where R0 and R were measured before and after the applied gate/TE pulses, for the 3 initial states of the PCM shown in Figure 1D, namely state A close to the full set state (R0 = 15 kΩ), state B which is intermediate between set and reset states (R0 = 500 kΩ), and state C close to the full reset state (R0 = 10 MΩ). R was measured after one spike event in all cases except for state C, where 1, 3, and 5 spikes were used in the experiments. State A (Figure 4A) displays strong depression for Δt < 0, indicating a resistance increase by about 3 orders of magnitude corresponding to the full resistance window of the PCM device between set and reset states in Figure 1C. On the other hand, state A does not show any potentiation, since the phase is already almost completely crystallized in this state. State B (Figure 4B) shows both depression (Δt < 0) and potentiation (Δt > 0), since both set and reset transition are possible for this intermediate state. Finally, state C (Figure 4C) shows no depression, since this state is already fully amorphized. In the case of one spike, the PCM also shows no potentiation, since a 40-ns pulse is not able to induce significant crystallization in the fully-amorphized state according to the set characteristics in Figure 1D. Potentiation however arises after an increasing number of spikes, reaching about a factor 103× in the case of 5 repeated spikes with the same delay. These characteristics demonstrated STDP with abrupt depression and gradual potentiation due to cumulative crystallization in the PCM device (Cassinerio et al., 2013). Note that tP = 40 ns was chosen to be long enough to allow for full reset of the PCM device, while providing a partial and additive crystallization according to Figure 1D. A longer tP would result in slightly different STDP characteristics, due to the larger crystallization similar to the enhanced potentiation with larger number of spikes in Figure 4C. On the other hand, depression would not be affected by increasing tP, since the reset transition only depends on the quenching time.
Figure 4. STDP characteristics, namely measured change of conductance R0/R as a function of delay Δt, for various PCM states, namely state A (R0 = 15 kΩ), state B (R0 = 500 kΩ), and state C (R0 = 10 MΩ), also reported in Figure 1D. Depression and/or potentiation are shown depending on delay and initial state, providing a confirmation of the STDP capability in our 1T1R synapse.
We also verified that continuous spiking with random relative delay Δt leads to random potentiation and depression of a single PCM synapse. Figure 5 shows the results of a random Δt spiking experiment over 1000 epochs (i.e., spike events), reporting the Δt (a), the synapse resistance R as a function of the number of epochs (b), and a correlation between R0/R and Δt (c), where R0 and R were measured before and after each spike in the sequence. Due to the uniform distribution of Δt adopted in our experiment, R in Figure 5B remains close to the full reset state for most of the experiment. Only few obvious resistance drops were obtained, since at least 3 pulses with Δt > 0 are needed in Figure 4C to achieve potentiation from the full reset state. The correlation between Δt and R0/R over 104 spikes in Figure 5C nicely agrees with the STDP characteristics in Figure 4, thus further supporting the STDP capability in our PCM–based synapse.
Figure 5. Result of a random spiking experiment, showing the random delay Δt as a function of the epoch (A), corresponding synapse resistance as a function of the epoch (B), and correlation between Δt and R0/R (C). The correlation between delay and conductance change is consistent with the STDP characteristics at variable resistance in Figure 4.
Note that potentiation/depression in Figures 4, 5 only take place during the set/reset pulses of pulse-width 40 ns, which is a negligible fraction of the spike timescale of 10 ms. This ensures that the energy consumption is negligible for synaptic plasticity as required by low power applications of the neuromorphic system.
Due to the simplicity of the POST spike shape including a set pulse and a reset pulse, the STDP characteristics in Figures 4, 5 show constant depression and potentiation for Δt < 0 and Δt >0, respectively, in contrast to the exponential-like decay which was revealed by previous in-vivo experiments (Bi and Poo, 1998). In addition, STDP characteristics in Figures 4, 5 are affected by a large window which can reach 1000x in one single spike, as opposed to the gradual change of only few percent of biological synapses (Bi and Poo, 1998). To demonstrate that the simplified features of our STDP do not prevent a proper learning capability in our synapse, we performed simulations of pattern learning in a fully-connected perceptron with 2 neuron layers and 1T1R PCM-based synapses. Figure 6 schematically illustrates the adopted architecture (a) and shows a practical circuit implementation with 1T1R synapses (b). The input pattern stimulates the first layer of neurons, consisting of a 28 × 28 retina in our simulations. Each of these 1st layer (PRE) neurons is connected to each 2nd-layer (POST) neurons via a synapse. We varied the number of POSTs in the 2nd layer and the intra-layer synaptic interaction depending on the purpose of the simulation. The 2-layer neuromorphic network can be arranged in the array-type synaptic architecture in Figure 6B, where a synapse in row i and column j, with i = 1, 2, 3, …, N and j = 1, 2, 3, …, M, represents the connection between the i-th PRE and the j-th POST. Therefore, the generic i-th PRE drives the gate terminals of all 1T1R synapses within the corresponding row, while the generic j-th POST receives the total current generated in the j-th column of synapses and drives the TE terminals of all synapses in the j-th column, according to the scheme in Figure 2.
Figure 6. Neuromorphic network adopted in our simulations: schematic illustration (A) and corresponding circuit (B). A first neuron layer with N = 28 × 28 neurons is fully connected to a second neuron layer with M neurons through 1T1R PCM-based synapses. The first layer delivers spikes in response to presentation of one or more visual patterns. During training, STDP within the synapses leads to LTP/LTD update of the synapse weights eventually resulting in the specialization of the output neurons in recognizing the submitted patterns.
Simulation of Learning of a Single Pattern
Figure 7 shows the simulation results for the case of a 28x28 PRE retina array (N = 784) with a single POST (M = 1). Simulations were obtained with the software MATLAB and the model for PCM crystallization dynamics was obtained by interpolating data in Figure 1D. CMOS neuron circuitry was modeled with ideal integrators, comparators and arbitrary waveform generators, while the transistor in the 1T1R was modeled as a series resistance of 2.4 kΩ during communication and fire. The input pattern in Figure 7A consists of a handwritten “1” chosen within the MNIST database (LeCun et al., 1998). The pattern was randomly alternated with random noise (Figure 7B) for the purpose of inducing random spikes which uniformly depress all background synapses not belonging to the pattern. PRE-synaptic neurons were randomly activated during each noise event to allow for uniform depression of the background. Pattern and noise were presented with probability 50% each with clock time tck = 10 ms. Noise consists in the excitation of an average of 51 neurons randomly selected within the 784 PREs, corresponding to a fraction of 6.5% of neurons. During each noise epoch we extracted a different instance of white 1/0 noise. PRE spikes led to the excitation of synaptic currents that were integrated by the single POST in the 2nd layer, causing fire events every time the internal voltage exceeded Vth.
Figure 7. Simulation results for pattern learning. The input pattern “1” (A) is presented at the input together with noise (B). Synaptic weights are random at t = 0 s (C), then they specialize at progressive times 3.5 s (D) and 7 s (E). The corresponding complete evolution of synapse weights for increasing time is shown in (F), with positions A, B, and C related to (C–E). Red lines represent synapses for pattern, cyan lines are the background synapses, while the black and blue lines are the mean pattern and background synapses, showing progressive learning and specialization.
The evolution of the synaptic weights is shown by the color maps of conductance 1/R at t = 0 s (Figure 7C), t = 3.5 s (d) and t = 7 s (e), also corresponding to the total simulated time. We assumed that the initial distribution of weights is random between set and reset states, which can be obtained, for instance, by initially resetting all cells, then applying relatively short set pulse with voltage close to the PCM threshold voltage VT. A random-set operation was shown to generate random bits in RRAM, thus enabling true random number generation (Balatti et al., 2015). Figure 7F shows the detailed time evolution of the synaptic weights, including 25, out of a total of 76, representative synapses within the pattern and other 236, from a total of 708, from the background, together with the corresponding average weights. Starting from the initial random distribution, the pattern weights (in red in Figure 7F) start to potentiate after approximately 0.3 s, reaching a value of 10−4 Ω−1 around about 0.4 s. This is the result of cumulative crystallization in the PCM as a result of multiple STDP events with Δt > 0, corresponding, e.g., to the presentation of a pattern which induces a fire in the POST. Background synapses (in cyan in Figure 7F) are instead depressed over a longer scale of about 3.5 s, where they reach a conductance of about 10−7 Ω−1 corresponding to the full reset state. The depression mechanism takes advantage of the random noise appearing at the PRE neuron layer. Since noise is uncorrelated, it only causes synapse depression when the noise PRE spike comes soon after a previous fire (thus with Δt < 0) most probably induced by pattern spikes. Therefore, noise plays a key role in depression, although it should be kept to a moderate frequency and moderate density (6.5% in Figure 7) during training to avoid interference with stable pattern learning. Note the fast pattern learning relatively to the slow background depression, as also evidenced by the evolution of synapse weights in Figure 7D at 3.5 s, where depression is still not uniformly achieved in the background. The rate of background depression might be enhanced by increasing the noise density, however at the expense of a disturbed potentiation of pattern synapses. In fact, a high noise density might lead to an increased probability of noise-induced fire, which, if followed by pattern presentation, may result in the depression of pattern synapses according to STDP. Therefore, the ideal noise density should be dictated by the tradeoff between fast background depression and efficient pattern learning. The real time evolution of synapse during a representative simulation is reported in the movie M1 in the Supplementary Material. We did not implement device-to-device variability for simplicity. However, the impact should be negligible, since the network relies on the bistable device behavior rather than on the analog weight update of the synapse (Suri et al., 2013).
Energy and Power Consumption
To assess the power consumption of our synaptic network, we calculated the average dissipated energy Esyn and power Psyn = Esyn/tck per synapse, which is shown in Figure 8A as a function of time during learning. The most significant contribution to energy dissipation is due to the PRE spike (communication) which induces a current spike of tck = 10 ms due to the constant VTE = −30 mV. The dissipated energy Esyn, c due to communication (not including fire) in a synapse is given by:
where Ri is the resistance of the i-th synapse, RMOS is the resistance of the MOS transistor in the on state, N and M are the numbers of PRE (N = 784 in our simulation) and POST (M = 1 in our simulation), respectively, and the summation is extended over all synapses that were activated by a PRE spike. In our calculations, we used a constant resistance RMOS = 2.4 kΩ for simplicity. The red filled points in Figure 8A show the calculated Esyn, c due to the communication mode, reaching a peak of about 80 pJ as the pattern is presented to potentiated synapses after stable learning in the neuromorphic network. The corresponding dissipated power Psyn, c = Esyn, c/tck is in the range of 8 nW. The dissipated energy is lower in the initial stages when the pattern is not yet learned, given the relatively low conductance of the pattern synapses.
Figure 8. Energy Esyn and mean power Psyn per synapse as a function of time during the learning process of Figure 7 (A) and corresponding histogram distribution of energy consumption Esyn, c due to communication from 4.2 s to 7 s, namely after completing potentiation/depression (B). Consumption due to communication (in red) is directly induced by PRE spikes, while fire energy (in blue) corresponds to set/reset events induced by POST spikes. The energy histogram reveals 3 energy levels: Group I around 80 pJ reflects communication of pattern spikes at potentiated synapses. Group II around 5 pJ represents communication of noise spikes at potentiated pattern synapses, while group III just below 100 fJ corresponds to noise spikes at depressed background synapses.
Figure 8B shows the distribution of Esyn, c due to spiking communication after consolidation of weights between t = 4.2 s and 7 s in Figure 8A. Note that there are 3 sub-distributions of Esyn, c, consisting of a high energy range (group I) due to pattern spiking and a low energy range, including a medium low sub-distribution (group II) and an extreme low sub-distribution (group III). Group II can be attributed to noise spikes exciting potentiated pattern synapses, which have large weights but only few are activated by the noise spikes. On the other hand, group III can be attributed to noise spikes exciting the background depressed synapses, thus corresponding to relatively few synapses with small weight on the average.
Figure 8A also shows the calculated Esyn, f corresponding to the fire event, when a POST spike overlaps with the PRE spike, thus giving rise to LTP or LTD. These events generally involve a much larger VTE and a larger corresponding current compared to the communication spike, since updating the PCM resistance requires set and reset transitions with significant Joule heating. On the other hand, due to the short pulse-width tP = 40 ns, the energy dissipation is around 1 pJ, hence negligible compared to the communication energy.
Multiple Pattern Learning in Sequence or in Parallel
For on-line unsupervised pattern learning, it is important to demonstrate not only learning of a specific pattern, but also the capability to forget a previous pattern and learn a new one. The ability to reconfigure synaptic weights by learning a new pattern is in fact a key feature to rapidly interact with stimuli from a continuously-changing environment as in the real world. To verify the reconfiguration function in our neuromorphic network, we presented an input pattern to the PRE neurons for 7 s, then we presented a different pattern, where both the first and second patterns were chosen from the MNIST database. Figure 9 shows the simulation results, including the first pattern (a), the second pattern (b), the color maps of the synaptic weights for t = 7 s (c), t = 7.5 s (d), and t = 14 s (e), and the synaptic conductance 1/R as a function of time (f). During the initial 7 s, pattern “1” and noise were provided with equal probabilities of 50%: the average synaptic weights show a potentiation of pattern synapse weights at 0.5 s, which is in line with Figure 7. At the same time, the background synapses are gradually depressed and the pattern is completely learnt after 1 s, as also shown by the weights at 7 s in Figure 9C. After 7 s, the input pattern is suddenly changed from “1” to “2,” which causes depression of weights within pattern “1” and potentiation of weights in pattern “2.” No conductance change is seen for synapses remaining in the background or pattern area. Pattern “2” is fully learned around 9 s, with depression taking slightly longer time. Sequential learning of 2 patterns is further described by movie M2 in the Supplementary Material.
Figure 9. Simulation results for pattern learning and updating. Pattern “1” and noise (A) were presented for the first 7 s, followed by pattern “2” (B) and noise for the last 7 s. After the first 7 s, in A, pattern “1” was learnt (C). After starting with “2,” synapses showed a mixed specialization at 7.5 s in B (D), where “1” was being forgotten and “2” was being learned. Finally, at 14 s in C (E), “2” was learnt. (F) shows the temporal evolution of synapses, with initial learning of “1,” followed by updating with “2.”
We also verified the capability to learn multiple patterns in parallel, rather than in sequence as in Figure 9. Since a neuron can only specialize to one pattern at a time (see Figure 9), we extended the simulation to a network of multiple M neurons in the POST layer. Figure 10A shows a fully connected network including N PRE neurons and 3 POST neurons in the 2nd layer, where 3 different patterns were presented alternatively as shown in Figure 10B. The purpose is that each of the 3 neurons eventually specializes to a separate pattern, thus emulating the capability to recognize different patterns, such as letters, numbers, or words, by our brain. To avoid co-specialization to the same pattern, the 3 neurons were connected by inhibitory synapses, where a successful fire in any neuron leads to a partial discharge of the internal potential in all other neurons, to inhibit fire in correspondence of the same pattern and encourage specialization to other patterns. The inhibitory synapses have fixed weights, hence they can be implemented by simple resistors. The 3 input patterns in Figure 10B were presented with 5% probability each, with the remaining 85% consisting of noise with an average number of PRE spikes of 4 per epoch, or 0.5% of all PREs. Such low percentage of noise activity over PREs is balanced by a relatively large frequency of noise equal to 85%. After a simulated total time of 300 s, the 3 different patterns were learnt each in a different neuron, as shown by the final synaptic weights in Figure 10C. Decreasing the pattern presentation rate below 5% in Figure 10 would result in a lower learning rate, while increasing the rate would cause learning instabilities. We have observed, in fact, that high pattern presentation rates cause the network to learn superposed patterns (e.g., a “1” plus a “2”) or difference patterns (e.g., a “1” with the pixels of “2” excluded). This results from interaction of distinct patterns in the STDP. A low pattern rate helps reducing the probability of having interaction between different patterns.
Figure 10. Simulation results for multiple pattern learning. A first layer with 28 × 28 = 784 neurons is fully connected to three second layer neurons, each of them connected with three inhibitory synapses (A). We provided three patterns “1,” “2,” and “3” (B) to the input. The three neurons specialize on different patterns (C). (D) shows the evolution of the synapses connected to one of the post neurons, in particular the mean weight for synapses of pattern “1,” “2,” “3” and background. While the background gradually decreases, the learnt pattern (the highest mean conductivity) changes during time due to interference between patterns.
Figure 10D shows the synaptic weights as a function of time, including the pattern weights and background weights (only synapses belonging to the background in all 3 patterns were shown). Learning takes place in a relatively short time at the beginning of the simulation, while depression of background weights requires about 200 s due to the low activity of noise. Note also the significant oscillations of pattern weights, which are due to the instability of pattern weights due to noise. In particular, the neuron specializes on one single pattern at a time, corresponding to the highest conductance of 10−4Ω−1. However, the network is unable to stabilize on a single pattern due to the interference with different patterns. Nonetheless, the network is able to recognize distinct patterns in distinct POST neurons, although sometimes different POSTs learn the same input pattern. This is an unwanted effect due to the low inhibitory effect we used in the simulations, where we discharged only 20% of the capacitance of a neuron during the inhibitory action. The increase of the inhibitory factor would improve the selectivity to input patterns, although it would also cause the blockade of some POST neurons due to repeated fire in another successful POST neurons. In summary, a careful trade-off must be searched to minimize blockade events, maximize the learning efficiency and minimize the learning time. Parallel learning of 3 patterns is further described by movie M3 in the Supplementary Material.
Reducing Power Consumption via Spiking Communication
Our results support PCM devices as highly-functional synapses with learning capability and low power consumption required for the synaptic plasticity. A key limitation of the proposed scheme is however the relatively large power consumed during communication (Figure 8). Assuming a synapse density of 1011 cm−2 as in the human cortex, a power per synapse of 8 nW would translate in a power density of almost 1 kWcm−2, which is comparable to a multicore CPU in conventional Von Neumann computing. The large power consumption is due to the relatively long current spike lasting 10 ms in response to the PRE spike applied to the transistor gate, where the relatively long pulse width is dictated by the STDP dynamics in the 10–100 ms time scale for real time learning and interaction (Bi and Poo, 1998). However, a spiking VTE can be adopted to reduce the dissipated energy during the spike. For instance, Figure 11 shows a spiking waveform of VTE, consisting of pulses of tspike = 1 μs width and spiking period Tspike = 1 ms, corresponding to a spiking frequency of 1 kHz and a duty cycle of 10−3. The reduced duty cycle results in a reduction of power consumption by a factor 103, clearly bringing our neuromorphic solution in the territory of low power chips.
Figure 11. Scheme for implementing low energy consumption communication. Instead of applying a constant VTE = −30 mV, sequences of spikes lasting tspike can allow for efficient communication (A), while reducing energy and power consumption by a factor tspike/Tspike, where Tspike is the time between adjacent pulses (B).
An additional advantage of adopting a spiking VTE with low duty cycle is the ability to reduce the capacitance in the neuron integrator stage. In fact, the capacitance can be estimated by:
where ΔQ is the integrated charge contributed by the current, equal to ΔQ = IΔt in the case of a constant VTE as in Figure 2. Assuming an array of 784 PRE neurons with 10% potentiated synapses after learning, a VTE of −30 mV, a resistance of potentiated synapse of 15 kΩ, and a comparator threshold voltage Vth = 0.5 V, we obtain a capacitance of about 3 μF, which is clearly unfeasible in an integrated circuit. A duty cycle of 10−3 would result in a reduction of the capacitance by a factor 103, hence in the range of few nF. Further reduction of the power consumption and of the integrator capacitance can be obtained by reducing the duty cycle, the value of VTE, and the conductivity of the PCM in the potentiated state, e.g., by adopting suitable low-conductivity phase change materials or by reducing the size of the heater controlling the cross section of the PCM device. Separation of communication and fire paths by 2T1R architecture of the synapse would allow to further reduce the current consumption and capacitor area by adopting sub-threshold bias and short pulse width of the communication gate (Kim et al., 2015; Wang et al., 2015). Finally, adopting accelerated, non-biological dynamics of tenths of ns instead of 10 ms range could allow for smaller values of integrated capacitances in the range of hundreds of fF.
Another issue consists in the wire capacitance charging energy, which is higher in the pulsing scheme. Synapses are arranged in a relatively large array, hence wires would cause a high parasitic capacitance, leading to an increase in capacitive energy dissipation in the pulsing scheme. One way to reduce the issue is to arrange synapses in a multiple smaller synapse arrays, with shorter interconnects. This approach would reduce the fan-in/fan-out of the neurons, however, with a proper design of the neuromorphic network, the issue could be acceptable, while preserving the reduction in the energy dissipation due to synapses. The capacitive energy would also be reduced by suitable voltage scaling via PCM engineering.
Multi-Layer Neuromorphic Network
To assess the learning efficiency of the neuromorphic network with PCM synapses, we performed 100 simulations of pattern learning with a total time of 2 s per each simulation. We evaluated the recognition probability Plearn as the number np, f of fire events in the POST neuron in correspondence of the presentation of pattern “1,” divided by the total number np of appearances of the same pattern, Plearn = np, f/np (see Figure 12A). Similarly, we evaluated the error probability Perr as the number nn, f of POST fire events taking place in correspondence of the presentation of noise in the input (false recognitions) divided by the total number nn of input noise appearances, Perr = nn, f/nn. Note that np + nn = n, where n is the total number of PRE spikes within the 2 s interval of simulation. With a 2-layer network with 28 × 28 PREs and 1 POST neuron, Plearn was equal to 33% and Perr was around 6%, thus quite unsatisfactory for the purpose of on-line learning and recognition. We found that unsuccessful learning was due most of the times to depression events of pattern synapses in the case of noise causing a POST fire, followed by the presentation of the pattern in the input. In fact, PCM is particularly prone to complete depression for Δt < 0, since the reset pulse results in a large resistance increase in just one shot. After this depression event, potentiation of pattern synapses is quite difficult, since the current flowing in the depressed pattern synapse is extremely low, making a POST fire event in response to the presentation pattern quite unlikely.
Figure 12. Multi-layer simulation results. The number n of PRE spikes is composed by np pattern and nn noise inputs. np is composed by np, f (pattern leading to output spike) and np, 0 (missing recognition). nn is composed by nn, f (false recognition) and nn, 0 (absence of spike for input noise) (A). After an input layer with 28 × 28 neurons, a second layer with variable M neurons and a third layer with one output neuron are implemented (B). The recognition rate Plearn = np, f/np increases with respect to the two layers network and it increases for increasing number M of second layer neurons (C), while the error rate Perr = nn, f/nn decreases (D). Plearn further increases for optimized conditions (lower noise), reaching a 95.5% recognition, while Perr drops to 0.35%.
To solve this issue and improve the recognition probability, we implemented a 3-layer network, as sketched in Figure 12B. This was done by inserting an intermediate layer with M neurons between a 28 × 28 input retina and an output layer consisting of a single neuron. All neurons between the first and the second layer were connected, and all second-layer neurons were connected to the output neuron, making the network a fully-connected architecture. The number M of neurons in the second layer was varied to study the recognition efficiency and error rates with the same pattern and noise conditions as in the calculations in Figure 7. Figure 12 shows the calculated recognition probability (c) and the error probability (d) as a function of M. The recognition probability increases with M from almost 36% up to 76%, while the error rate decreases from 6 to 3%, as shown by the blue lines. The improvement is due to the compensation of synapse blockade by the additional layer, thanks to the increased number of parallel channels.
To further improve the network efficiency, we reduced the input noise from 6.5 to 5.5%. The optimized results are shown by the red curve in Figures 12C,D. The noise reduction leads to a slight increase in the time needed for depression of background synapses. On the other hand, the recognition efficiency increases up to 95.5% for 256 neurons in the second layer, while the error probability decreases to 0.35% in a 2 s simulation time. These results strongly support PCM-based neuromorphic chip for on-line unsupervised learning and recognition.
Impact of Noise Density on Learning Efficiency
Noise presentation alternated to the pattern allows for proper background depression and on-line unsupervised pattern updating. The randomness and non-correlation of noise allow for a general background depression and, in general, a forgetting mechanism. Figure 13 explores more deeply the impact of noise on learning efficiency. We performed pattern learning simulations as in Figure 7, varying the input noise density, namely the average percentage of PRE delivering a noise spike. Plearn shows a decrease for increasing noise density which is explained by the competition between pattern learning caused by pattern input appearance and increasing pattern forgetting induced by noise. At the same time, for increasing noise, Perr increases due to the increasing noise current contribution. However, note that zero noise, which seems to be the best situation, is not applicable, since background depression and pattern updating as in Figure 9 would not be possible. Therefore, a careful trade-off between noise density and learning performance must be considered.
Figure 13. Probability of recognizing an input pattern Plearn, solid line, and probability of spurious fires Perr, dashed line, as a function of input noise.
In conclusion, our work demonstrates PCM-based electronic synapses based on 1T1R architecture. The synapses are capable of STDP thanks to the time-dependent overlap among PRE and POST spikes in the 1T1R circuit. On-line pattern learning, recognition, forgetting and updating is demonstrated by simulations assuming the alternation of pattern and noise spikes from the PRE layer. Reduction of energy consumption and improvement of recognition efficiency are discussed with the help of simulation results. These results support PCM as promising element for electronic synapses in future neuromorphic hardware.
SA provided simulations of neuromorphic circuits for learning and recognition, while NC and ML contributed experimental data. All authors discussed the results and contributed to manuscript preapration. DI supervised the research.
Conflict of Interest Statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The authors are grateful to S. Balatti and Z.-Q. Wang for several discussions. This work was supported in part by the ERC Consolidator Grant No. 648635 “Resistive-switch computing Beyond CMOS.”
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/article/10.3389/fnins.2016.00056
Ambrogio, S., Balatti, S., Nardi, F., Facchinetti, S., and Ielmini, D. (2013). Spike-timing dependent plasticity in a transistor-selected resistive switching memory. Nanotechnology 24:384012. doi: 10.1088/0957-4484/24/38/384012
Annunziata, R., Zuliani, P., Borghi, M., De Sandre, G., Scotti, L., Prelini, C., et al. (2009). Phase change memory technology for embedded non volatile memory applications for 90nm and beyond. IEDM Tech. Dig. 97–100. doi: 10.1109/iedm.2009.5424413
Balatti, S., Ambrogio, S., Wang, Z. Q., and Ielmini, D. (2015). True Random Number Generation by variability of resistive switching in oxide-based devices. IEEE J. Emerg. Select. Topics Circ. Sys. 5, 214–221. doi: 10.1109/JETCAS.2015.2426492
Bichler, O., Suri, M., Querlioz, D., Vuillaume, D., DeSalvo, B., and Gamrat, C. (2012). Visual pattern extraction using energy-efficient 2-PCM synapse neuromorphic architecture. IEEE Trans. Electr. Dev. 59, 2206–2214. doi: 10.1109/TED.2012.2197951
Burr, G. W., Shelby, R. M., di Nolfo, C., Jang, J. W., Shenoy, R. S., Narayanan, P., et al. (2014). “Experimental demonstration and tolerancing of a large-scale neural network (165,000 synapses), using phase-change memory as the synaptic weight element,” in Electron Devices Meeting (IEDM), 2014 IEEE International (San Francisco, CA: IEEE), 29.5.1–29.5.4. doi: 10.1109/iedm.2014.7047135
Eryilmaz, S. B., Kuzum, D., Jeyasingh, R., Kim, S., BrightSky, M., Lam, C., et al. (2014). Brain-like associative learning using a nanoscale non-volatile phase change synaptic device array. Front. Neurosci. 8:205. doi: 10.3389/fnins.2014.00205
Garbin, D., Vianello, E., Bichler, O., Rafhay, Q., Gamrat, C., Ghibaudo, G., et al. (2015). HfO2-Based OxRAM devices as synapses for convolutional neural networks. IEEE Trans. Electr. Dev. 62, 2494–2501. doi: 10.1109/TED.2015.2440102
Hosseini, P., Sebastian, A., Papandreou, N., Wright, C. D., and Bhaskaran, H. (2015). Accumulation-based computing using phase-change memories with FET access devices. IEEE Electr. Dev. Lett. 36, 975–977. doi: 10.1109/LED.2015.2457243
Indiveri, G., Linares-Barranco, B., Legenstein, R., Deligeorgis, G., and Prodromakis, T. (2013). Integration of nanoscale memristor synapses in neuromorphic computing architectures. Nanotechnology 24:384010. doi: 10.1088/0957-4484/24/38/384010
Kau, D. C., Tang, S., Karpov, I. V., Dodge, R., Klehn, B., Kalb, J. A., et al. (2009). “A stackable cross point Phase Change Memory,” in Electron Devices Meeting (IEDM), 2009 IEEE International (Baltimore, MD: IEEE), 617–620. doi: 10.1109/IEDM.2009.5424263
Kim, S., Ishii, M., Lewis, S., Perri, T., BrightSky, M., Kim, W., et al. (2015). NVM neuromorphic core with 64k-cell (256-by-256) phase change memory synaptic array with On-chip neuron circuits for continuous in-situ learning. IEDM Tech. Dig. 443.
Kuzum, D., Jeyasingh, R. G. D., Lee, B., and Wong, H.-S. P. (2012). Nanoelectronic programmable synapses based on phase change materials for brain-inspired computing. Nano Lett. 12, 2179. doi: 10.1021/nl201040y
Ohno, T., Hasegawa, T., Tsuruoka, T., Terabe, K., Gimzewski, J. K., and Aono, M. (2011). Short-term plasticity and long-term potentiation mimicked in single inorganic synapses. Nat. Mater. 10, 591–595. doi: 10.1038/nmat3054
Prezioso, M., Merrikh-Bayat, F., Hoskins, B. D., Adam, G. C., Likharev, K. K., and Strukov, D. B. (2015). Training and operation of an integrated neuromorphic network based on metal-oxide memristors. Nature 521, 61–64. doi: 10.1038/nature14441
Querlioz, D., Bichler, O., Vincent, A. F., and Gamrat, C. (2015). Bioinspired programming of memory devices for implementing an inference engine. Proc. IEEE 103, 1398–1416. doi: 10.1109/JPROC.2015.2437616
Suri, M., Bichler, O., Querlioz, D., Cueto, O., Perniola, L., Sousa, V., et al. (2011). Phase change memory as synapse for ultra-dense neuromorphic systems: application to complex visual pattern extraction. IEDM Tech. Dig. 79–82. doi: 10.1109/iedm.2011.6131488
Suri, M., Bichler, O., Querlioz, D., Palma, G., Vianello, E., Vuillaume, D., et al. (2012). CBRAM devices as binary synapses for low-power stochastic neuromorphic systems: auditory (Cochlea) and visual (Retina) cognitive processing applications. IEDM Tech. Dig. 235–238. doi: 10.1109/IEDM.2012.6479017
Suri, M., Querlioz, D., Bichler, O., Palma, G., Vianello, E., Vuillaume, D., et al. (2013). Bio-inspired stochastic computing using binary CBRAM synapses. IEEE Trans. Electron Devices 60, 2402. doi: 10.1109/TED.2013.2263000
Thomas, A., Niehöerster, S., Fabretti, S., Shepheard, N., Kushel, O., Kuepper, K., et al. (2015). Tunnel junction based memristors as artificial synapses. Front. Neurosci. 9:241. doi: 10.3389/fnins.2015.00241
Vincent, A. F., Larroque, J., Locatelli, N., Ben Romdhane, N., Bichler, O., Gamrat, C., et al. (2015). Spin-transfer torque magnetic memory as a stochastic memristive synapse for neuromorphic systems. IEEE Trans. Biomed. Circ. Syst. 9, 166–174. doi: 10.1109/TBCAS.2015.2414423
Yu, S., Gao, B., Fang, Z., Yu, H., Kang, J., and Wong, H.-S. P. (2013). A low energy oxide-based electronic synaptic device for neuromorphic visual systems with tolerance to device variation. Adv. Mater. 25, 1774. doi: 10.1002/adma.201203680
Yu, S., Wu, Y., Jeyasingh, R., Kuzum, D., and Wong, H.-S. P. (2011). An Electronic synapse device based on metal oxide resistive switching memory for neuromorphic computation. IEEE Trans. Electr. Dev. 58, 2729. doi: 10.1109/TED.2011.2147791
Wang, Z. Q., Ambrogio, S., Balatti, S., and Ielmini, D. (2015). A 2-transistor/1-resistor artificial synapse capable of communication and stochastic learning in neuromorphic systems. Front. Neurosci. 8:438. doi: 10.3389/fnins.2014.00438
Wright, C. D., Liu, Y., Kohary, K. I., Aziz, M. M., and Hicken, R. J. (2011). Arithmetic and biologically-inspired computing using phase-change materials. Adv. Mater. 23, 3408. doi: 10.1002/adma.201101060
Keywords: neuromorphic circuits, spike timing dependent plasticity, phase change memory, neural network, memristor, pattern recognition, cognitive computing
Citation: Ambrogio S, Ciocchini N, Laudato M, Milo V, Pirovano A, Fantini P and Ielmini D (2016) Unsupervised Learning by Spike Timing Dependent Plasticity in Phase Change Memory (PCM) Synapses. Front. Neurosci. 10:56. doi: 10.3389/fnins.2016.00056
Received: 30 October 2015; Accepted: 08 February 2016;
Published: 08 March 2016.
Edited by:Themis Prodromakis, University of Southampton, UK
Reviewed by:Damien Querlioz, CNRS, University of Paris-Sud, France
Mostafa Rahimi Azghadi, The University of Sydney, Australia
Erika Covi, Institute for Microelectronics and Microsystems, CNR, Italy
Copyright © 2016 Ambrogio, Ciocchini, Laudato, Milo, Pirovano, Fantini and Ielmini. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Daniele Ielmini, email@example.com