Stochastic IMT (Insulator-Metal-Transition) Neurons: An Interplay of Thermal and Threshold Noise at Bifurcation

Artificial neural networks can harness stochasticity in multiple ways to enable a vast class of computationally powerful models. Boltzmann machines and other stochastic neural networks have been shown to outperform their deterministic counterparts by allowing dynamical systems to escape local energy minima. Electronic implementation of such stochastic networks is currently limited to addition of algorithmic noise to digital machines which is inherently inefficient; albeit recent efforts to harness physical noise in devices for stochasticity have shown promise. To succeed in fabricating electronic neuromorphic networks we need experimental evidence of devices with measurable and controllable stochasticity which is complemented with the development of reliable statistical models of such observed stochasticity. Current research literature has sparse evidence of the former and a complete lack of the latter. This motivates the current article where we demonstrate a stochastic neuron using an insulator-metal-transition (IMT) device, based on electrically induced phase-transition, in series with a tunable resistance. We show that an IMT neuron has dynamics similar to a piecewise linear FitzHugh-Nagumo (FHN) neuron and incorporates all characteristics of a spiking neuron in the device phenomena. We experimentally demonstrate spontaneous stochastic spiking along with electrically controllable firing probabilities using Vanadium Dioxide (VO2) based IMT neurons which show a sigmoid-like transfer function. The stochastic spiking is explained by two noise sources - thermal noise and threshold fluctuations, which act as precursors of bifurcation. As such, the IMT neuron is modeled as an Ornstein-Uhlenbeck (OU) process with a fluctuating boundary resulting in transfer curves that closely match experiments. The moments of interspike intervals are calculated analytically by extending the first-passage-time (FPT) models for Ornstein-Uhlenbeck (OU) process to include a fluctuating boundary. We find that the coefficient of variation of interspike intervals depend on the relative proportion of thermal and threshold noise, where threshold noise is the dominant source in the current experimental demonstrations. As one of the first comprehensive studies of a stochastic neuron hardware and its statistical properties, this article would enable efficient implementation of a large class of neuro-mimetic networks and algorithms.

Artificial neural networks can harness stochasticity in multiple ways to enable a vast class of computationally powerful models. Boltzmann machines and other stochastic neural networks have been shown to outperform their deterministic counterparts by allowing dynamical systems to escape local energy minima. Electronic implementation of such stochastic networks is currently limited to addition of algorithmic noise to digital machines which is inherently inefficient; albeit recent efforts to harness physical noise in devices for stochasticity have shown promise. To succeed in fabricating electronic neuromorphic networks we need experimental evidence of devices with measurable and controllable stochasticity which is complemented with the development of reliable statistical models of such observed stochasticity. Current research literature has sparse evidence of the former and a complete lack of the latter. This motivates the current article where we demonstrate a stochastic neuron using an insulator-metal-transition (IMT) device, based on electrically induced phase-transition, in series with a tunable resistance. We show that an IMT neuron has dynamics similar to a piecewise linear FitzHugh-Nagumo (FHN) neuron and incorporates all characteristics of a spiking neuron in the device phenomena. We experimentally demonstrate spontaneous stochastic spiking along with electrically controllable firing probabilities using Vanadium Dioxide (VO 2 ) based IMT neurons which show a sigmoid-like transfer function. The stochastic spiking is explained by two noise sources -thermal noise and threshold fluctuations, which act as precursors of bifurcation. As such, the IMT neuron is modeled as an Ornstein-Uhlenbeck (OU) process with a fluctuating boundary resulting in transfer curves that closely match experiments. The moments of interspike intervals are calculated analytically by extending the first-passage-time (FPT) models for Ornstein-Uhlenbeck (OU) process to include a fluctuating boundary. We find that the coefficient of variation of interspike intervals depend on the relative proportion of thermal and threshold noise, where threshold noise is the dominant source in the current experimental demonstrations. As one of the first comprehensive studies of a stochastic neuron hardware and its statistical properties, this article would enable efficient implementation of a large class of neuro-mimetic networks and algorithms.

INTRODUCTION
A growing need for efficient machine-learning in autonomous systems coupled with an interest in solving computationally hard optimization problems has led to active research in stochastic models of computing. Optimization techniques (Haykin, 2009) including Stochastic Sampling Machines (SSM), Simulated Annealing, Stochastic Gradients etc., are examples of such models. All these algorithms are currently implemented using digital hardware which first creates a mathematically accurate platform for computing, and later adds digital noise at the algorithm level. Hence, it is enticing to construct hardware primitives that can harness the already existing physical sources of noise to create a stochastic computing platform. The principal challenge with such efforts is the lack of stable or reproducible distributions, or functions of distributions, of physical noise. One basic stochastic unit which enables a systematic construction of stochastic hardware has long been known-the stochastic neuron (Gerstner and Kistler, 2002)-which is also believed to be the unit of computation in the human brain. Moreover, recent studies (Buesing et al., 2011) have demonstrated practical applications like sampling using networks of such stochastic spiking neurons. There have been some attempts for building neuron hardware (Indiveri et al., 2006;Pickett et al., 2013;Mehonic and Kenyon, 2016;Sengupta et al., 2016;Tuma et al., 2016), but building a neuron with self-sustained spikes, or oscillations, which are stochastic in nature and where the probability of firing is controllable using a signal has been challenging. Here, we demonstrate and analytically study a true stochastic neuron (Jerry et al., 2017a) which is fabricated using oscillators (Shukla et al., 2014a,b;Parihar et al., 2015) based on insulator-metal transition (IMT) materials, e.g., Vanadium Dioxide (VO 2 ), wherein the inherent physical noise in the dynamics is used to implement stochasticity. The firing probability, and not just the deterministic frequency of oscillations or spikes, is controllable using an electrical signal. We also show that such an IMT neuron has similar dynamics as a piecewise linear FitzHugh-Nagumo (FHN) neuron with thermal noise along with threshold fluctuations as precursors of bifurcation resulting in a sigmoid-like transfer function for the neural firing rates. By analyzing the variance of interspike interval, we determine that for the range of thermal noise present in our experimental demonstrations, threshold fluctuations are responsible for most of the stochasticity compared to thermal noise.

IMT Phase Change Neuron Model
A stochastic IMT neuron is fabricated using relaxation oscillators (Shukla et al., 2014b;Parihar et al., 2015) composed of an IMT phase change device, e.g., Vanadium Dioxide (VO 2 ), in series with a tunable resistance, e.g., transistor (Shukla et al., 2014a) ( Figure 1A). An IMT device is a two terminal device with two resistive states-insulating (I) and metallic (M), and the device transitions between the two states based on the applied electric field (which in turn changes the current through the device and the corresponding temperature) across it. The phase transitions are hysteretic in nature, which means that the IMT (insulator-tometal) transition does not occur at the same voltage as the MIT (metal-to-insulator) transition. For a range of values of the series resistance, the resultant circuit shows spontaneous oscillations due to hysteresis and a lack of stable point (Parihar et al., 2015). Overall, the series resistance acts as a parameter for bifurcation between a spiking (or oscillating) state and a resting state of an IMT neuron. The equivalent circuit model for an IMT oscillator is shown in Figure 1B with the hysteretic switching conductance g v(m/i) (g vm in metallic and g vi in insulating state), a series inductance L, and a parallel internal capacitance C. Let the IMT and MIT thresholds of the device be denoted by v h and v l , respectively, with v h > v l , and the current-voltage relationship of the hysteretic conductance be where h is linear in i i and s is the state-metallic (M) or insulating (I).
The system dynamics is then given by: with i i and v o as shown in Figure 1B and s is considered as an independent variable.

Mechanism of Oscillations and Spikes
In VO 2 , IMT, and MIT transitions are orders of magnitude faster than RC time constants for oscillations, as observed in frequency (Kar et al., 2013) and time-domain measurements for voltage driven (Jerry et al., 2016) and photoinduced transitions (Cocker et al., 2012). As such, the change in resistance of the IMT device is assumed to be instantaneous. Figure 2A shows . V-I curves for IMT device in the two states metallic (M) and insulating (I) and the load line for series conductance v o = i i /g s for the steady state are shown along with the fixed points of the system S 1 and S 2 in insulating and metallic states respectively. The load line and V-I curves are essentially the nullclines of v o and i i , respectively. The capacitance-inductance pair delays the transitions and slowly pulls the system toward the fixed points S 1 and S 2 even when the IMT device transitions instantaneously. For small L/C ratio, the eigenvector (of the coefficient matrix) with large negative eigenvalue becomes parallel to the x-axis, whereas the other eigenvector becomes parallel to AB ′ or BA ′ depending on the state (M or I). When the system approaches A from below (or B from above) and IMT device is insulating (or metallic) with fixed point S 1 (or S 2 ), the IMT device transitions into metallic (or insulating) state changing the fixed point to S 2 (or S 1 ). Two trajectories are shown starting from points A and B each for the system (Equation 1)-one for small L/C value (solid) and the other for large L/C value (dashed). After a transition, the system moves parallel to x-axis almost instantaneously and spends most of the time following the V-I curve toward the fixed point. Before the fixed point is reached the MIT (or IMT) transition threshold is encountered which switches the fixed point, and the cycle continues resulting in sustained oscillations or spike generation.

Non-hysteretic Approximation
The model of (Equation 1) is very similar to a piecewise linear caricature of FitzHugh-Nagumo (FHN) neuron model (Gerstner and Kistler, 2002), also called the McKean's caricature (McKean, 1970;Tonnelier, 2003). Mathematically, the FHN model is given by: where f (u) is a polynomial of third degree, e.g., f (u) = u − u 3 /3, and I ext is the parameter for bifurcation, as opposed to g s in Equation (1). In the FHN model, one variable (u), possessing cubic nonlinearity, allows regenerative self-excitation via a positive feedback, and the second, a recovery variable (w), possessing linear dynamics, provides a slower negative feedback. It was reasoned in McKean (1970) that the essential features of FHN model are retained in a "caricature" where the cubic non-linearity is replaced by a piecewise linear function f (u). Nullclines of (Equation 2) with a piecewise linear f (u) are shown in Figure 2B in the phase space is trivially possible such that it is equal to v dd − h(i i , s) in the regions M and I, hence making the u-nullcline similar to the i i -nullcline in those regions. In the region N, the difference between f (u) and v dd − h(i i , s) for any state s does not result in a difference in the direction of system trajectories but only in their velocity, because for small L/C the trajectories are almost parallel to x-axis. Bifurcation in VO 2 neuron is achieved by tuning the load line using a tunable resistance (g s ), or a series transistor ( Figure 3A). Figure 3B shows two load line curves corresponding to different gate voltages (v gs ), where one gives rise to spikes while the other results in a resting state.

Single Dimensional Approximation
Moreover, a single dimensional piecewise approximation of the system can be performed using a dimensionality reduction by replacing the movement along the eigenvector parallel to the xaxis with an instantaneous transition from A to A ′ , or B to B ′ . This leaves a 1-dimensional subsystem in M and I each along the V-I curves AB ′ and BA ′ . Experiments using VO 2 show that the metallic state conductance g vm is very high which causes the charging cycle of v o to be almost instantaneous (Figure 4) and resembles a spike of a biological neuron. As such, the spiking statistics can be studied by modeling just the discharge cycle of v o . The inductance being negligible can be effectively removed and only the capacitance is needed for modeling the 1D subsystem of insulating state ( Figure 6A) making

Noise Induced Stochastic Behavior
The two important noise sources which induce stochasticity in an IMT neuron are (a) V IMT (v h ) fluctuations (Zhang et al., 2016;Jerry et al., 2017b), and (b) thermal noise. Thermal noise η(t) is modeled in the circuit ( Figure 6A) as a white noise voltage η(t)dt = σ t dw t where w t is the standard weiner process and σ 2 t is the infinitesimal thermal noise variance. The threshold v h is assumed constant during a spike, but varies from one spike to another. The distribution of v h from spike to spike is assumed to be Gaussian or subGaussian whose parameters are estimated from experimental observations of oscillations. If the series transistor always remains in saturation and show linear voltage-current relationship, as is the case in our VO 2 based experiments, the discharge phase can be described by an Ornstein-Uhlenbeck (OU) process where µ, θ , and σ are functions of circuit parameters of the series transistor, the IMT device and σ t . The interspike interval is thus the first-passage-time (FPT) of this OU process, but with a fluctuating boundary.

OU Process With Constant Boundary
Analytical expressions for the FPT of OU process (with µ = 0) for a constant boundary were derived using the Laplace transform method in Ricciardi and Sato (1988). Reproducing some of its results, let the first passage time for the system (Equation 3), with µ = 0, which starts at x(0) = x 0 and hits a boundary S, be denoted by the random variable t f (S, x 0 ), and its mth moment by τ m (S, x 0 ). Also, let t f (S, x 0 ) be the FPT for another OU process with µ = 0, θ = 1, and σ = 2, and τ m (S, x 0 ) be its mth moment. Then time and space scaling for the OU process imply that where α = 2 θ σ 2 . The first two moments for the base case OU process τ 1 and τ 2 are given by where φ k (z) can be written as an infinite sum with ρ(n, k) being a function of the digamma function (Ricciardi and Sato, 1988).

OU Process With Fluctuating Boundary
We extend this framework for calculating the FPT statistics with a fluctuating boundary S as follows. Let the IMT threshold be represented by the random variable v h . For the VO 2 based IMT neuron, the 1D subsystem in the insulating phase can be converted in the form of Equation (3)  For subGaussian distributions we use the Exponential Power family EP[κ], κ being the shape factor. Let the interspike interval of IMT neuron be denoted by the marginal random variable t imt (D, v l ). Then t imt is related to t f in Equation (4), given common parameters θ and σ , as follows: The moments of t imt can be calculated as: where α = 2 θ σ 2 . If D is Gaussian or EP[κ] distribution and αT is an affine transformation, then αTv h also has a Gaussian or EP[κ] distribution.

Experiments
IMT devices are fabricated on a 10nm VO 2 thin film grown by reactive oxide molecular beam epitaxy on (001) TiO 2 substrate using a Veeco Gen10 system (Tashman et al., 2014). Planar two terminal structures are formed by patterning contacts using standard electron beam lithography which defines the device length (L VO2 ). Pd (20 nm)/Au (60 nm) contacts are then deposited by electron beam evaporation and liftoff. The devices are then isolated and the widths (W VO2 ) are defined using a CF 4 based dry etch.
The IMT neuron is constructed using an externally connected n-channel MOSFET (ALD110802) and the fabricated VO 2 device. A prototypical I-V curve is shown in Figure 5A. Within the experimental data, the current is limited to an arbitrarily chosen 200 µA to prevent a thermal runaway and breakdown of the device while in the low resistance metallic state. It should be noted that as the metallic state corresponds to the abrupt charging cycle of v o , limiting the current would not have noticeable effect on spiking statistics of the neuron.
Threshold voltage fluctuations (cycle to cycle) were observed in all devices which were tested (>10). Threshold voltage distribution was estimated using the varying cycle-to-cycle threshold voltages collected from a single device. Thermal noise is not measured directly, but is estimated approximately by matching the simulation waveforms from the circuit model ( Figure 6A) with the observed experimental waveforms. It can be verified that thermal noise of the transistor is not the dominant noise source by measuring the threshold variation as a function of the transistor current ( Figure 5B) and observing that the distribution of switching threshold does not change with varying transistor current. Finally, the firing rate and its variation with v gs (Figure 6B) were measured for a single device.

First Moment and the Firing Rate
First moment of t imt is calculated using Equations (5) and (7) as The expansion for φ k (z) in Equation (6) can be used to calculate E v h [φ k (αTv h )] using the moments of αTv h as follows The prototypical DC voltage-current characteristics for a single VO 2 device exhibits abrupt threshold switching at V IMT and V MIT . The current in the metallic state has been arbitrarily limited to a 200µA compliance current. (B) V IMT distribution as a function of the peak current during oscillations (value is set by the MOSFET saturation current). V IMT is extracted from 300+ cycles.

Higher Moments
For higher moments, higher order terms are encountered. For example, in case of the second moment, using Equations(5) and (7), we obtain with a higher order term φ 1 (αTv h ) 2 . In the case of the third moment we obtain φ 1 (αTv h )φ 2 (αTv h ). As each φ k term is an infinite sum, we construct a cauchy product expansion for the higher order term using the infinite sum expansions of the constituent φ k s and then distribute the expectation over addition. For example, if the φ k expansions of φ 1 (z) and φ 2 (z) are ( a i ) and ( b i ), respectively, then the cauchy product expansion of φ 1 (z)φ 2 (z) can be calculated as c i , where c i is a function of a 1...i and b 1...i , and the expectation Since c i is a polynomial in z, E[c i ] can be calculated using the moments of z.
If µ imt and σ imt are the mean and standard deviation of interspike intervals t imt , the coefficient of variation (σ imt /µ imt ) varies with the relative proportion of the thermal and the threshold induced noise. Figure 7 shows σ imt /µ imt (calculated using parameters matched with our VO 2 experiments) plotted against σ t for various kinds of v h distributions fitted to experimental observations. σ imt /µ imt as observed in our VO 2 experiments is about an order of magnitude more than what would be calculated with only thermal noise using such a neuron, and hence, threshold noise contributes significant stochasticity to the spiking behavior. As the IMT neuron is setup such that the stable point is close to the IMT transition point (Figure 3B), low σ t results in high and diverging σ imt /µ imt for any distribution of threshold noise, and σ imt /µ imt reduces with increasing σ t for FIGURE 7 | σ imt /µ imt for the interspike interval plotted against σ t for v gs = 1. 8V with Constant,Gaussian,and Exponential Power (EP[κ], where κ is the shape factor) distributions of the threshold noise. The experimentally observed σ imt /µ imt for a VO 2 neuron is shown with a dotted line. The shaded region shows the experimentally estimated range of σ t (σ t < 5).
the range shown. For a Normally distributed v h the variance diverges for σ t 8, but for Exponential Power (EP) distributions with lighter tails, the variance converges for smaller values of σ t . Statistical measurements on experimental data, as indicated in Figure 7, provide measures of σ imt /µ imt (dotted line) and σ t (shaded region). We note that EP distributions provide a better approximation of the stochastic nature of experimentally demonstrated VO 2 neurons as the range of σ t is estimated to be <5.

DISCUSSION
In this paper, we demonstrate and analyse an IMT based stochastic neuron hardware which relies on both threshold fluctuations and thermal noise as precursors to bifurcation. The IMT neuron emulates the functionality of theoretical neuron models completely by incorporating all neuron characteristics into device phenomena. Unlike other similar efforts, it does not need peripheral circuits alongside the core device circuit (an IMT device and a transistor) to emulate any sub-component of the spiking neuron model like thresholding, reset etc. Moreover, the neuron construction not only utilizes inherent physical noise sources for stochasticity, but also enables control of firing probability using an analog electrical signal-the gate voltage of series transistor. This is different from previous works which control only the deterministic aspect of firing rate like the charging rate. A comparison of spiking neuron hardware characteristics in different works is shown in Table 1.
We also show that the neuron dynamics follow a linear "carricature" of the FitzHugh-Nagumo model with intrinsic stochasticity. The analytical models developed in this paper can also faithfully reproduce the experimentally observed transfer curve which is a stochastic property. Such analytical verification of stochastic neuron experiments is one of the first in this work. It is an important result as it indicates reproducibility of stochastic characteristics and helps in creating the pathway toward perfecting these devices. With a growing concensus that stochasticity will play a key role in solving hard computing tasks, we need efficient ways for controlled amplification and conversion of physical noise into a readable and computable form. In this regard, the IMT based neuron represents a promising solution for a stochastic computational element. Such stochastic neurons have the potential to realize bio-mimetic computational kernels that can be employed to solve a large class of optimization and machine-learning problems.

AUTHOR CONTRIBUTIONS
AP worked on the development of theory, simulation frameworks, and mathematical models; MJ worked on the experiments; AR advised AP and participated in the problem formulation; SD advised MJ and also participated in the design of experiments and problem formulations.