A 22-pJ/spike 73-Mspikes/s 130k-compartment neural array transceiver with conductance-based synaptic and membrane dynamics

Neuromorphic cognitive computing offers a bio-inspired means to approach the natural intelligence of biological neural systems in silicon integrated circuits. Typically, such circuits either reproduce biophysical neuronal dynamics in great detail as tools for computational neuroscience, or abstract away the biology by simplifying the functional forms of neural computation in large-scale systems for machine intelligence with high integration density and energy efficiency. Here we report a hybrid which offers biophysical realism in the emulation of multi-compartmental neuronal network dynamics at very large scale with high implementation efficiency, and yet with high flexibility in configuring the functional form and the network topology. The integrate-and-fire array transceiver (IFAT) chip emulates the continuous-time analog membrane dynamics of 65 k two-compartment neurons with conductance-based synapses. Fired action potentials are registered as address-event encoded output spikes, while the four types of synapses coupling to each neuron are activated by address-event decoded input spikes for fully reconfigurable synaptic connectivity, facilitating virtual wiring as implemented by routing address-event spikes externally through synaptic routing table. Peak conductance strength of synapse activation specified by the address-event input spans three decades of dynamic range, digitally controlled by pulse width and amplitude modulation (PWAM) of the drive voltage activating the log-domain linear synapse circuit. Two nested levels of micro-pipelining in the IFAT architecture improve both throughput and efficiency of synaptic input. This two-tier micro-pipelining results in a measured sustained peak throughput of 73 Mspikes/s and overall chip-level energy efficiency of 22 pJ/spike. Non-uniformity in digitally encoded synapse strength due to analog mismatch is mitigated through single-point digital offset calibration. Combined with the flexibly layered and recurrent synaptic connectivity provided by hierarchical address-event routing of registered spike events through external memory, the IFAT lends itself to efficient large-scale emulation of general biophysical spiking neural networks, as well as rate-based mapping of rectified linear unit (ReLU) neural activations.


. Introduction
Neuromorphic systems implementing spiking neural networks are promising research platforms for investigating and emulating the computational abilities of the brain (Mead, 1990;Indiveri et al., 2011;Thakur et al., 2018).The compactness and lowpower consumption of neuromorphic circuits make them highly suited for robotic and mobile applications emulating the dynamics of complex brain circuits in real-world environments (Badoni et al., 2006;Indiveri et al., 2006;Silver et al., 2007;Schemmel et al., 2010;Merolla et al., 2011;Ramakrishnan et al., 2012;Sharp et al., 2012;Imam and Cleland, 2020).Such complex reallife tasks require large-scale neuromorphic systems, and there are various approaches for their implementation.They range from implementations using microprocessor cores integrated with specialized network-on-chip routers (Furber et al., 2012;Sharp et al., 2012;Painkras et al., 2013), fully digital implementations with quasi-asynchronous elements to maintain synchrony (Merolla et al., 2011(Merolla et al., , 2014;;Imam et al., 2012;Akopyan et al., 2015), SRAMbased implementations for programmable precision of neural and synaptic dynamics and connectivity in a core and supporting local learning rules (Davies et al., 2018;Detorakis et al., 2018;Frenkel et al., 2019), implementations using amplifier-based neuron circuits with wafer-scale integration and connectivity (Schemmel et al., 2010;Millner et al., 2011;Schmitt et al., 2017), analog quadratic integrate-and-fire neurons sharing synapses, axons, and dendrites with neighboring neurons implementing a diffusive neural network as layered in the cortex (Lin et al., 2006;Benjamin et al., 2014;Neckar et al., 2019), and subthreshold CMOS analog neurons with digitally controlled conductance-based synapses (Yu et al., 2012b;Park et al., 2014).Despite the success of large-scale implementations, the required synaptic density of the scale of the brain with neuronal dynamic representations at low power consumption remains a challenge.
All these neuromorphic systems are built from basic neural computation units, that is neurons and synapses, which are also the basic computational elements in the biological brain.A neuron processes incoming information and transmits its outputs using an electrical signal represented by an action potential to other neurons via synapses.A basic principle for the emulation of neural and synaptic dynamics is the integration of synaptic currents into the membrane potential and generation of action potentials.There are various models for emulating these principles (Destexhe et al., 1998).Some neuron models emulate neural dynamics in more biologically plausible ways, ranging from a model of ion channel kinetics with hundreds of differential equations and parameters (Hodgkin and Huxley, 1952) to models of simplified conductance-based differential equations for computational efficiency (Izhikevich, 2003;Mihalas and Niebur, 2009).However, the hardware complexity for the implementation of these neuron models limits the large-scale integration of neurons in a silicon die.Conversely, the leaky integrate-and-fire neuron model is a popular choice for large-scale implementation because of its relative simplicity and ability to emulate many dynamic features of biological neurons (Brette and Gerstner, 2005).The integrate-and-fire neuron models the synaptic current integration and the generation of the action potential.A neuron generates an action potential when the membrane potential exceeds a certain threshold voltage.This basic principle can be implemented using a comparator and an integrator; thus, this simplicity makes it suitable for large-scale implementation in a silicon die.
When a presynaptic neuron generates a spike, it releases neurotransmitters to the synapses connected to postsynaptic neurons.In the biological brain, a neuron is connected to 10,000 neurons on average.Achieving hard-wired synaptic connections to the level of the biological brain is highly challenging in neuromorphic hardware.This challenge can be addressed using the asynchronous address event representation (AER) protocol in neuromorphic systems.AER facilitates spike event communication between arrayed neurons using address events, each of which represents a target neuron address with synaptic parameters (Sivilotti, 1991;Lazzaro et al., 1993;Mahowald, 1994;Deiss et al., 1999;Boahen, 2000).When a neuron fires in an array, the spike is encoded as an address event representing the address of the neuron in the array.The event is translated to synaptic events through a synaptic routing table implemented in random access memory (RAM) or read-only memory (ROM), and these synaptic events are sent to postsynaptic neurons.In each postsynaptic neuron, an incoming synaptic event accumulates the membrane potential of the postsynaptic neuron.
An integrate-and-fire array transceiver (IFAT) is proposed and developed as a promising system platform for largescale power-efficient neuromorphic processing.In our previous studies, integrate-and-fire neurons were arranged in a 2 kneuron core (with 2,048 neurons), and each neuron used a simple analog-switched capacitor architecture to model membrane dynamics, resulting in a discrete-time version of synaptic current integration (Goldberg et al., 2001;Vogelstein et al., 2007).This demonstrated the ability to emulate a model of attractor dynamics and neural activity in the rat hippocampus.For a more compact form of synapses while further extending the linearity of the synaptic dynamics in continuous time, a singletransistor realization of a conductance-based synapse emulating the log-domain encoding of first-order linear dynamics of synaptic conductance was presented (Yu and Cauwenberghs, 2010).In addition, large-scale integration incorporating a hierarchical AER architecture has been realized (Yu et al., 2012a;Park et al., 2017).For address event routing, a synchronous AER circuit was placed for each 2 k-neuron core.In this scheme, an event holds the AER circuit until the event is delivered, thus resulting in a limited input event throughput.In this study, the AER protocol is implemented fully asynchronously, implying that there is no synchronized system clock.The AER protocol is only activated by address events with a "handshaking" protocol.When a sender and receiver are ready to communicate, they send and receive a request and acknowledge signal to deliver an event.This event-driven activation reduces the dynamic power consumption significantly (Martin and Nystrom, 2006), achieving sub-nanojoule energy efficiency for an asynchronous microcontroller (Martin et al., 2003), and it is also applied to neuromorphic systems for energy-efficient address event communication (Vogelstein et al., 2007;Merolla et al., 2011Merolla et al., , 2014;;Millner et al., 2011;Benjamin et al., 2014;Davies et al., 2018).
In this paper, we present a 65k-neuron IFAT as a computational building block for large-scale neuromorphic systems.An IFAT  circuits, resulting in low-power consumption owing to its eventdriven operation.To maximize the parallelism of the input event streams, an additional pipeline stage was added per row in the 2 k-neuron cores.This two-tier micro-pipeline scheme designed using the asynchronous design principle results in a sustained peak throughput of 73 Mspikes/s at 22 pJ/spike power efficiency.) and a three-transistor dynamic latch.(B) Timing diagram of the synapse and dynamic latch operation with two events.When the three-transistor dynamic latch is selected by a ROW and COL, it holds V SEL to active low for selection of one synapse in the neuron while the pulse width ( t) modulated input with the amplitude modulated voltage (V s ) at V IN , which defines the update of synapse conductance ( G syn ) according to Equation ( ), drives the activated synapse.
This paper extends a previous preliminary report Park et al. (2014) which showed the characterizations of a single neuron to the complete characterizations of the entire array of neurons.Additionally, this paper presents the calibration process and mapping of a rate-based neural network onto the architecture with an example of a boundary detection application.The remainder of the paper is organized as follows.In Section 2, we describe the circuit implementation and theoretical motivation behind the implementation.Section 3 presents the measurement results.We show the analysis of a single neuron response and the variability of the response across 2,048 neurons in one core.In addition, we demonstrate a potential application, that is image boundary detection, using IFAT neurons.Section 4 summarizes the related and prior works in a table and discusses potential extensions of the IFAT chip with emerging non-volatile memory devices.Finally, Section 5 concludes the contributions of the IFAT chip.
. Implementation details . .Two-compartment integrate-and-fire neuron model The proposed IFAT chip emulates the detailed biological dynamics of neurons and synapses in integrated circuits.Figure 1A illustrates the neural synaptic transmission between neurons.
When a presynaptic neuron generates an action potential, it releases neurotransmitters to the synapses, which integrate charges on the membrane of the postsynaptic neuron.When the membrane potential exceeds the firing threshold, the postsynaptic neuron generates an action potential.This neural activation and synaptic communication were emulated in the IFAT chip with a representation of connectivity information in address events, as shown in Figure 1B.Based on such address events using a synaptic routing table, which can be implemented with external memory, such as RAM or ROM, dynamically reconfigurable synaptic connectivity is supported across the IFAT chips in hierarchical address-event routing (HiAER-IFAT) architecture (Park et al., 2017).When a presynaptic neural spike is revieved, synaptic connection information between the presynaptic neuron and its connected postsynaptic neurons is read out from a synaptic routing table, and these address events are routed to the postsynaptic neurons with other synaptic information, which is encoded in the address events, such as synapse type and synaptic weight.
In the IFAT chip, each neuron is implemented using a two-compartment leaky integrate-and-fire neuron model, as shown in Figure 1C.In the neuron model, there are two compartments, called "distal" and "proximal, " each with a membrane capacitor, leak conductance.Each compartment also contained two synapse circuits, which are configured as excitatory or inhibitory synapses by programmable reversal potentials.The synaptic weight modulates the synaptic conductance, defining the amount of current injected into a membrane capacitor in a compartment.Each compartment capacitor is conductively coupled using configurable conductance.When the proximal membrane potential exceeds the threshold voltage, the axon hillock circuit triggers an action potential, similar to the biological system.The dynamics of a two-compartment leaky integrate-and-fire neuron are formulated as follows: where C mem0 and C mem1 are the distal and proximal membrane capacitances, respectively; V mem0 and V mem1 are the distal and proximal membrane voltages, respectively; I fb is the nonlinear positive feedback current due to the spiking mechanism; G syn is the synapse conductance; E rev is the reversal potential; G leak is the leak conductance; E leak is the leak potential; G comp is the inter-compartment conductance.
The input and output of a neuron are encoded as address events.A decoder routes an incoming address event to a destination postsynaptic neuron using the information on the synapse type and synaptic weight.Subsequently, an input AER circuit (AERin) stimulates the synapse in the destination neuron with a synaptic weight.On the output side, when an axon hillock circuit registers an event, the output AER circuit (AERout) raises the request signal.An encoder takes the request signal and converts it into an address event, indicating the address of the neuron in the arrayed neurons.
Figure 1D shows a transistor level schematic of the implementation of the two-compartment conductance-based integrate-and-fire neuron.Two conductance-based synapse circuits are tied to a compartment with programmable reversal potentials E rev defining the synapse type and synaptic time constants controlled by V τ .In the AERin circuit, an incoming event selects one of the four synapses using pairwise complement signals: ROWA, ROWB, and COLA, COLB.Each compartment integrates currents from the synaptic conductance and discharges to continuously leak conductance.In addition, the coupling conductance, which is controlled by the V COMP , couples the electrical charges between the proximal and distal compartments.When the proximal membrane potential exceeds the threshold voltage V thresh , a self-timed axon hillock circuit (Vogelstein et al., 2007) generates an action potential and registers a neural spike event on the AERout circuit to the output AER bus while resetting the membrane potential.

. . Overall architecture
Figure 2A shows the overall architecture of the IFAT chip, which is equipped with 65 k integrate-and-fire neurons in a single chip.The 65 k neurons are divided into four independent and identical quadrants, each of which contains eight 2 k-neuron IFAT cores.Each quadrant has independent input and output ports for address event communication.Asynchronous splitters and mergers are placed at the center of each quadrant to control the address event streams from and to the eight 2 k-neuron IFAT cores.Each 2 k-neuron IFAT core comprises 2 ktwo-compartment leaky integrate-and-fire neurons and periphery circuits, such as row and column decoders, pulse width and amplitude modulation (PWAM) circuits, asynchronous AER communication circuit, linear feedback shift register (LFSR), and row and column arbiters.The input and output AER buses are implemented by fully asynchronous communication circuits using a four-phase dual-rail encoding communication protocol.An address event is encoded in the address of the neuron location in the quadrant of the IFAT chip.A previous synchronous pulse-width modulation circuit (Yu and Cauwenberghs, 2010), which incurs a long waiting time between consecutive events, is improved by an additional pipeline stage, row-wise PWAM circuits, which improves the throughput to the 2 k-neuron IFAT core, while the additional amplitude modulation extends the dynamic range of synaptic strength exponentially.
Figure 2B shows a micrograph of the 4 × 4 mm 2 IFAT chip, which was fabricated using a 90-nm CMOS process.The chip has 436 staggered I/O pads and is packaged in a 35 × 35 mm 2 Fine Ball Grid Array (FBGA) package.The layouts of the 2 kneuron IFAT core and neuron cell are shown in Figures 2C, D, respectively.A 2 k-neuron IFAT core occupies 415 × 810 µm 2 and a two-compartment neuron occupies 12.15 × 11.5 µm 2 .

. . Conductance-based synapse
Figure 3A shows the single-transistor implementation of a conductance-based synapse (Yu and Cauwenberghs, 2010) incorporating a three-transistor dynamic latch, and Figure 3B shows the timing diagram for its operation.An incoming event drives COL and ROW and sets RST LATCH high, holding V SEL to active low to select one active synapse in a neuron selected by COL and ROW.Its pMOS diode-connected input is then driven by the source voltage V s .It increases the gate voltage of the synapse V g , increasing the synaptic conductance of G syn in the logdomain while implementing a linear dynamical synapse with a time constant controlled by V τ (Yu et al., 2012a).After a pulse width t, V s returns to V DL , RST LATCH is activated to release V SEL passive high, and the synapse is ready to receive the next synaptic input event.
The single-transistor conductance-based synapse was conducted in the subthreshold operating regime of the MOS transistor.As explained above, synaptic input events change the conductance of synapse transistors.The synaptic conductance modification in the log domain is formulated from the drain current of the nMOS transistor operating in the subthreshold regime as follows: where I 0 is the dark current of the transistor, V g is the gate voltage, V d is the drain voltage, V s is the source voltage, κ is the back gate parameter, and V T is the thermal voltage.This equation can be transformed to "log-domain" or "pseudo-voltage domain, " with the definition of a pseudo-voltage and pseudo-conductance (Fragnière et al., 1997).
where the pseudo-parameters of conductance ) , and membrane potential ) .
From the pseudo-parameters of conductance, we can derive the synaptic conductance update with respect to time.
where the back-gate coefficient κ is the same for nMOS and pMOS,  The synaptic strength is encoded in pulse width t and amplitude V S modulation, and the resulting step in synaptic conductance G syn is approximately given by: where: Frontiers in Neuroscience frontiersin.orgPark et al.
Measured throughput with respect to pulse width representing synaptic strength.The input events address neurons in the same row ( row) and multiple rows, from to .

FIGURE
Measured activity-dependent power consumption.
• W is the relative pulse width of the stimulus, which is the mantissa of the given synaptic strength, in integer units [0, 15], and four least significant bits (LSBs) of eight-bit synaptic strength.• A is the pulse amplitude in the log-domain, which is the exponent of the given synaptic strength in integer units [0, 15], and four most significant bits (MSBs) of eight-bit synaptic strength. .

. Asynchronous interface with four-phase dual-rail encoding
The AER circuits in the IFAT chip operate in a fully asynchronous way.Asynchronous circuits do not have a master clock for system synchronization.Instead, a "handshaking" protocol is used for reliable data communication between the sender and receiver.Handshaking protocols are implemented with two signals: request and acknowledge.A request signal indicates the sender's readiness to send a data packet.In response to the request signal, the receiver sends an acknowledgment signal back to the sender if available.The sender then sends a data packet.This is an event-driven process.Among various handshaking protocols (Martin and Nystrom, 2006), the IFAT chip uses a four-phase dual-rail encoding protocol for more reliable asynchronous handshaking communication."Four-phase" means that the whole process of request and acknowledge handshaking comprises four signal-transition phases."Dual-rail" means that two complementary bit-lines are used to represent one-bit information.
A basic building block for the protocol is a C-element circuit (Muller circuit;Muller and Bartky, 1957).The circuit implementation, schematic symbol, and truth table of the Celement are presented in Figures 4A, B, respectively.It accepts inputs when the inputs are the same; otherwise, it holds its output value until it receives the same value for both inputs.Such an operation is required for delay-insensitive operations in asynchronous design.Figure 4C shows a schematic of the n-bit asynchronous pipeline stage for the four-phase dual-rail encoding protocol.This pipeline stage holds its data until one of the next pipeline stages is ready to collect the data.It is a function similar to a register in the synchronous design principle.The four-phase dual-rail handshaking protocol does not have an explicit request signal, but it is embedded in the dual-rail.Each bit of the data is encoded in two complementary lines: TRUE and FALSE.The TRUE bit represents the actual value of the data and the FALSE bit is complimentary.If TRUE and FALSE indicate different values, a valid value is loaded into the dual-rail properly, as in TRUE.However, if both are the same, the bit lines are transitioning.The completion tree, the C-tree block shown in Figure 4C, validates that all the bit lines are properly latched.Upon validation, the output of the C-tree is used as an acknowledge signal, ACK PRE , to the previous pipeline stage.The properly lathed dual-rail-encoded output bits are considered as a request signal to the next pipeline stage.

. . Asynchronous splitter and merger
Owing to the limited number of I/O pads on the chip, the input and output buses need to be shared by eight 2 kneuron IFAT cores in a quadrant.The input bus is designed to communicate 24-bit input synaptic address events.Each event comprised a three-bit destination core address, an 11-bit neuron address in the destination core, a two-bit synapse type, and eightbit synapse strength.Asynchronous splitters are implemented to locate an input synaptic address event to a destination core.The asynchronous splitter has a binary tree structure of cascaded asynchronous pipeline stages.There are three stages from the input IOs to the destination 2 k-neuron core.At each stage, the MSB of the input synaptic address events is decoded as a request signal to the next pipeline stage.
On the shared output bus side, an asynchronous merger is designed to multiplex address events that are generated

FIGURE
Measured example of shunting inhibition, which blocks the upstream synaptic excitation e ect.The distal compartment of the neuron is strongly excited by excitatory synaptic input events, which results in excitatory compartmental inputs coupled through compartmental conductive interactions to the proximal compartment and generation of neuron spike.From to ms, the proximal compartment is inhibited, and then it blocks the upstream synaptic excitation.
simultaneously from multiple IFAT neuron cores.The asynchronous merger comprised an arbiter and asynchronous pipeline stage.Figure 5 shows the schematics of Figure 5A the arbiter and Figure 5B asynchronous merger circuit.The arbiter circuit receives request signals REQ0 and REQ1 from two paths in the previous stage.Two cross-coupled NAND gates select a path that prioritizes the sending of a request signal to the next signal.The selected request signal, either REQ0 SEL or REQ1 SEL , is encoded in the dual-rail encoding scheme.The dual-rail encoded bit is the MSB of the address event that is selected at the current stage.Additionally, the data from the selected path are properly latched at the asynchronous pipeline stage and acknowledged to be ready for the next event.There are eight 2 k-neuron IFAT cores in each 16 k-neuron quadrant and two paths can be merged using an asynchronous merger.Hence, there are three stages of asynchronous mergers in each quadrant, which are binary-tree structured.When a neuron fires at a 2 k-neuron IFAT core, it is encoded as an 11-bit address event that represents the address of the neuron in the 2 k-neuron IFAT core.One MSB is added to the address event when it passed through each stage, resulting in a 14-bit address event at the output bus of the chip.

. . Two-tier micro-pipelining scheme
The communication of each address event at a 2 k-neuron IFAT core is implemented using on-chip asynchronous request (REQ) and acknowledgement (ACK) signals.To increase the throughput of the input events, an input asynchronous AER distribution network on a 2 k-neuron IFAT core is pipelined in two stages with an asynchronous AER communication circuit (shown in Figure 6A) and single-row PWAM circuits (shown in Figure 6B), as shown in Figure 2A.A 2 k-neuron IFAT core receives a 21-bit AER event, which comprises the information of an 11-bit postsynaptic neuron address ([20:10]), a two-bit synapse type ([9:8]), and an eight-bit synapse strength ([7:0]).If a 21-bit AER event is received, the asynchronous AER communication circuit coordinates the AER event to the destination neuron address via column and row decoders and to the synapse type, which is determined by the twobit synapse type ([9:8]).The asynchronous AER communication circuit then requests a selected PWAM circuit with eight-bit synapse strength.If the PWAM circuit is available, the eight bits for synapse strength are latched onto an eight-bit bus, which selects a comparator reference voltage (V REF ) defining the pulse width over the baseline by pulse amplitude (V IN ) in the log-domain.If the PWAM circuit is held by a previous address event, the event is not acknowledged and waits until it is serviced.
Figure 7 shows a handshaking timing diagram of the twotier micro-pipelining scheme when two consecutive events address neurons in the same row.It shows asynchronous handshaking timing from a destination neuron address selection via column and row decoders to a selection of synapse types and data packet requests.T latency is the latency of handshaking from an asynchronous AER circuit in a 2 k-neuron IFAT core to the destination neuron.If an event is input to the same row as the latest input event, which holds a PWAM circuit, it waits until the event is served to a destination neuron.T wait represents the additional latency induced by consecutive input events.

. Measurement results
In this section, we present the experimental results of the system on throughput, system-level energy efficiency, neural activation with respect to input spike strength, and variability due to transistor mismatches across a 2 k-neuron IFAT core.In addition, we present a linear synapse response model with a simple application of orientation tuning curves for boundary detection.

. . Event throughput
In the presented architecture, the throughput can be defined as follows: where T latency is the average event handshaking latency, and T wait is the average waiting time in cases where an incoming event addresses a neuron in the same row as the previous event as shown in Figure 7. T wait is proportional to t/N interleave , where t is the input pulse width, and N interleave is the number of interleaved rows.Figure 8 shows the measurement results for event throughput.A spike input stream, which has the maximum pulse width for each input, addressing the 32 neurons in a single row results in 70.6 kevents/s throughput.When the input event stream interleaves multiple rows, the waiting time in a row-wise PWAM circuit is avoided, resulting in higher throughput, as predicted by Equation ( 7).With this interleaving scheme by the two-tier micropipelining stage, we measured 18.2 Mspikes/s per quadrant, and the total throughput of the IFAT chip is thus 73 Mspikes/s., and its spatial distribution is drawn in the inset. .

. System-level spike event energy e ciency
In the brain, each neuron is connected to ∼10,000 neurons on average and fires spikes at an average firing rate of 5-10 Hz.Therefore, the power consumption and energy efficiency of biologically inspired neuromorphic systems are primarily determined by synaptic inputs.We then measured the systemlevel spike event energy efficiency as a function of the synapse input event rate, as shown in Figure 9.This shows that the power consumption increases linearly with the synaptic event input rate.We measured power consumption until the input event rate reached its maximum throughput capability (73 Mevents/s).
At the maximum throughput, we measured a current draw of 1.31 mA from a 1.2 V power supply.This resulted in a total power consumption of 1.572 mW.The slope of the graph, which indicates the overall energy efficiency for a spike operation, is measured to be 22 pJ/spike.

. . Neural activation function
Figure 10 shows the neural activation functions, which are defined as the output event rates in response to the input event rates, measured using Poisson and regular spike trains from one representative neuron.The two cases exhibited different activation function shapes.The shape of the function measured using regular input spikes is threshold-linear.This is consistent with the leaky integrate-and-fire neuron model.In the leaky integrate-and-fire neuron model, the threshold originates from the leak conductance of the membrane.In contrast, fluctuations in the Poisson spike trains tend to smooth the activation function, which is expected from studies of noisy integrate-and-fire neuron models (Fusi and Mattia, 1999).In addition, the activation function has a characteristic similar to that of the rectified linear unit model (Nair and Hinton, 2010), which has been widely used in deep neural networks, particularly in convolutional neural networks (CNNs), owing to its faster computation and ability to avoid the vanishing gradient problem.

. . Multi-compartmental neural computation
A distinguishing feature of the implemented neuron model in the IFAT chip compared to most existing leaky integrate-and-fire neurons is its multi-compartmental neuron implementation.Dendritic computation with proximal and distal compartments in neuroscience exhibits various mechanisms implementing elementary computation units for spatiotemporal information processing (Koch, 1999;London and Häusser, 2005).It has a multiplication-like effect of two time-varying signals in a single neuron resulting in fewer transistors for the implementation, reducing energy and area footprint.Moreover, such neuromorphic dendritic computation shows various applications ranging from configurable multi-layer neural network computation (Ramakrishnan et al., 2013), spatiotemporal input pattern classification by temporal coincidence detection (Wang and Liu, 2013), to efficient learning for event-based sequential data (Yang et al., 2021).
The IFAT neuron comprises two compartments: distal and proximal compartments, each with two conductance-based synapses.The compartmental conductances are configurable, implying that the strength of the interaction between the compartments is configurable.Figure 11 shows such interactions as examples of shunting inhibition, which is an important feature of dendritic computation (Nelson, 1994;Mitchell and Silver, 2003;Groschner et al., 2022).Excitatory and inhibitory synaptic inputs, indicated by red and blue bars, respectively, are applied to a neuron, as shown in the schematic.The distal compartment is strongly excited by excitatory synaptic inputs from a regular input spike train.This results in an excitatory compartmental input coupled through the compartment conductance to the proximal compartment of the neuron and the firing of the neuron indicated by green bars in the figure.From 50 to 80 ms, the proximal compartment is inhibited at the reversal potential near rest, which blocks the effect of upstream excitation.

. . Input-output transfer function of neural response
To characterize the input-output transfer function of the neural response, we measured the output spike rates from one representative neuron over digital weights from 0 to 255 for varying input spike rates from 500 Hz to 10 kHz.To generate Poisson input spike trains, the interspike intervals of the input spike trains were generated using the Poisson process with a constant mean rate.Figure 12A shows the output spike rate of a representative neuron in response to varying digital weights and input spike rates.Figure 12B shows the gain of the neuron, which is the output spike normalized by the input spike rate.At a low input spike rate, the membrane potential leaks faster than the synaptic integration, resulting in rare responses at lower digital weights (weak synaptic inputs).At high input strengths, because each input spike produces an output spike, the gain of the input-output transfer function saturates to one.

. . Neuron mismatch variability
Analog-based neuron circuits designed with transistors in the subthreshold regime emulate biologically plausible neural systems efficiently with low power consumption, but they intrinsically exhibit large variations in neural responses owing to transistor mismatches.In the IFAT chip, one of the major sources of variation is the mismatch of the threshold voltage of a transistor in the axon hillock circuit.This mismatch results in a digital weight offset of the neural activation.Figure 13A shows the measured output spike rate responses from representative 32 neurons in the same row when the digital weights were varied from 0 to 255.Here, the input spike rate was 10,000 Hz, and the interspike intervals were distributed in the Poisson distribution.The offset is monitored as the digital weight at which the gain of the neural response is 0.1 (with an output spike rate of 10 3 ).The digital weight offset can be compensated by synaptic weight learning in the address event domain (Park and Jung, 2020).Figure 13B shows the output spike rate responses when the weight offsets are compensated.The response curves are aligned to the mean of the 32 neural responses.The slope of the output spike rate increment over a decade to the digital weight shows the linearity of the synapse responses in the input-output transfer function in the linear response regime.For further analysis, we conducted measurements on a representative 2 k-neuron IFAT core, and the histograms of the offset and slope are shown in Figures 13C, D, respectively.The colormaps for 2 kneurons (64 rows and 32 columns), drawn in the insets, represent the spatial distributions of the offset and slope in the array.
The calibration process shown above is effective to accommodate the relatively large variations in the subthreshold regime.However, it constitutes no hardware and software overhead at the inference.It is because the calibration is done offline, and the pre-distortion digital coefficients are stored externally, with the synapses dynamically instantiated (Park et al., 2017).In any case, the instantiation needs to be done as part of the HiAER-IFAT operation, and there is no cost for changing the digital entries in the lookup table based on the calibrated characteristics.

. . Linear synapse response model
The current injection into the leaky integrate-and-fire neuron model is formulated as follows: where C mem is the membrane capacitance, V mem is the membrane voltage, g ext and g inh are the conductances of the excitatory and inhibitory synapses, E ext and E inh are the reversal potentials of the excitatory and inhibitory synapses, respectively, g leak is the leak conductance, E L is the leak voltage, and V mem is the membrane voltage.Using a mean-rate approximation on a time scale of multiple action potentials, we can approximate the above terms to a simple linear neural response model, as follows: With a first-order approximation, we assumed that the conductance is equal to the nominal synapse weight multiplied by the total number of spikes in the input spike trains: where g syn is the conductance of the synapse, f in,n is the frequency of the n th input spike train, w n is the synapse weight of the n th input spike train, f in,eff is the sum of all the input spike train frequencies, and w nom is the nominal synapse weight.Given a first-order approximation, the output frequency is the sum of the excitatory and inhibitory synaptic input spike trains times the nominal synapse weight.
11) where G w nom denotes the gain-scaling factor at w nom .The gainscaling factor, which is the frequency response gain, is defined as the ratio of the frequency response gain to the digital weight.Figure 13 shows the measured (in Figure 14A) and modeled (in Figure 14B) output frequency response colormaps, while the excitatory and inhibitory synapse input frequencies are varied from 0 to 2,000 Hz at a nominal digital weight of 80.We used it as the model of the neuron response for the orientation tuning curve and boundary detection shown in the following sections.

. . Orientation tuning curve
An orientation tuning curve shows the firing rate selectivity of a neuron to stimuli with different orientations.It is a typical measurement used to characterize orientation selectivity in visual cortical neurons.Figure 15 shows the measured tuning curves of the IFAT chip.An output neural response is the measured result of the convolution of a stimulus and an orientated filter.Each data point is the mean of 30 measurements each with 1 s projection to a neuron.We used 15 × 15-pixel bar stimuli with rotations ranging from 0 to 180 • in 5 • steps.These stimuli were convolved into four Gabor patch orientations (0, 45, 90, and 135 • ).The pixel intensity of the stimuli is converted to input spike rates ranging from 0 (darkest) to 63 (brightest).The pixel intensity of a Gabor patch is translated into the synaptic strength of the input.Using Equation ( 11), the output frequency can be calculated as follows: where i and j are the indices of pixel positions, f out is the output spike rate, f in is the input synaptic spike rate, and w is the input synapse weight.Figure 15 shows that the simulation results drawn in solid lines lie within the range of the measured data points within one standard deviation. .

. Boundary detection
Gabor-like local receptive fields are used to extract elementary visual features, such as oriented edges and corners, from images.This is an essential step for CNNs, which are a type of feedforward neural network inspired by the biological multilayer perceptrons widely used in image recognition systems (Lecun et al., 1998).The layers in a CNN comprise feature maps and a subsequent spatial subsampling layer to down-sample raw image data.Here, we present an example of image boundary detection, which is an elementary component of a CNN.Image boundary detection was performed with an input image with a size of 113×75 pixels, as shown in Figure 16A.We used four edge detection kernels, each with a 15 × 15-pixel patch, as shown in Figure 16B in the first column.The experimental procedure was the same as that of the orientation tuning curve measurements.The stimulus was a 15 × 15 patch of a region in the image, and each pixel intensity of the patch was converted to an input synaptic event rate.The pixel intensity of an edge detection kernel is translated into synaptic weight.The convolution result of the image patch and an edge detection kernel were projected onto the representative neuron, and the output spike rate of the neuron was measured to reconstruct the filtered image output.Figure 16B in the second column shows the expected images, which were simulated using Equation (11). Figure 16B in the third column shows the measurement results for the IFAT neuron.The measurement results show that the reconstructed image from the measured output matches the expected images well.This shows that the IFAT neuron can be used as an essential unit for CNNs.

. Discussion
Recently, many large-scale neuromorphic systems have been presented using various design approaches ranging from FPGAs and asynchronous digital to subthreshold analog design (Thakur et al., 2018).Such diverse approaches with their own design objectives make it difficult to compare large-scale neuromorphic systems quantitatively.We tried to compare neuromorphic processors, which are designed to extend to largescale neuromorphic systems with a multi-chip routing architecture.Table 1 summarizes the measured characteristics of the IFAT chip in comparison with state-of-the-art works.It shows the IFAT chip has good area density and energy efficiency aspects.
The IFAT has been designed with an analog-based neuron and synapse circuit implemented with subthreshold conduction CMOS transistors.It achieved efficiency in power and area consumption with biologically plausible continuous analog temporal dynamics.However, the synaptic weight digitally encoded with an address event is stored in synaptic routing tables implemented in external memory, which is supported by HiAER-IFAT architecture (Park et al., 2017).It requires additional memory access to instantiate synaptic events, degrading energy efficiency.To address the issue, the synapse can be replaced with various emerging non-volatile memory devices such as ReRAM and magnetoresistive random access memory, which are recently presented for potential synaptic devices in analog neuromorphic hardware (Ielmini and Wong, 2018;Sun et al., 2018;Wang et al., 2018;Luo et al., 2020;Jang and Park, 2022;Tang et al., 2022;Wan et al., 2022).These emerging memory devices typically feature low-power and high-density compared to silicon-based CMOS logic circuits: a ReRAM device consumes about 0.1 pJ per switching operation (Ielmini and Wong, 2018).ReRAMs can be integrated with Silicon-based CMOS logic by using a monolithic 3D integration (Li et al., 2021)

. Conclusion
In this paper, we presented a general-purpose neuromorphic processor that can serve as a basic computational building block for large-scale neuromorphic systems.The chip was fabricated using a 90-nm CMOS process and occupied a 4 × 4 mm 2 die area.It is equipped with 65-k two-compartmental leaky integrate-andfire neurons.Event-driven fully asynchronous circuits minimize the event communication latency, which is not bounded to any synchronized clock speed.In addition, the two-tier asynchronous micro-pipelining scheme maximizes the parallelization of event delivery to neurons in multiple rows; thus, resulting in a sustained throughput of 18.2 Mspikes/s per quadrant and 73 Mspikes/s for the chip.A high density of synapses and neurons was achieved by the single transistor synapse implementation and virtual synaptic wiring supported by the AER, resulting in 11.5 × 12.15 µm 2 integration for a neuron and four synapse types.An activity-driven asynchronous design enables the achievement of a system-level energy efficiency of 22 pJ per spike event.The proposed processor implemented biophysical details in compartmental conductancebased dynamics without compromising in area density and energy efficiency.
FIGURE (A) Block diagram of the IFAT chip including identical four quadrants each with eight k-neuron IFAT cores, an asynchronous splitter, and an asynchronous merger for event communication.(B) Chip micrograph of the IFAT chip.One quadrant, each comprising eight k-neuron IFAT cores and asynchronous AER merger and splitter, is indicated.(C) k-neuron IFAT core and (D) two-compartment integrate-and-fire neuron cell layout.
FIGURE(A) Circuit implementation, (B) schematic symbol, and truth table of the C-element, which is also called a Muller circuit.(C) Schematic of n-bit asynchronous pipeline stage.A one-bit latch with C-elements in dual-rail encoding is shown in the bottom-left box.When the ACK is active low, the current stage can latch an input bit.A completion tree (C-tree), which is a tree of C-elements, determines the completion of latched data lines and enables active high to the previous state for the acknowledge signal, ACK PRE .The current stage holds the latched data until the next stage acknowledges, via the active high ACK signal.
FIGURE(A) Schematic of the arbitration circuit comprising two cross-coupled NAND gates.Two request signals, REQ and REQ , compete to activate one of two cross-coupled NAND-gate paths.The selected request signal enables a path to deliver an acknowledge signal (ACK) to the selected previous stage.(B) Block diagram of the asynchronous merger circuit comprising an arbitration circuit and n-bit asynchronous pipeline stage (shown in FigureC).N-bits are transferred from the selected previous stage, and the selected request signal (REQ SEL or REQ SEL ) is added to the transferred data as the MSB to indicate the source of the data.
n and I p are the subthreshold pre-exponential current factors of nMOS and pMOS, respectively, I pmos = I p e Vs V T e − κVg V T , and C syn is the synapse capacitor.

FIGURE
FIGURETiming diagram for the input asynchronous AER distribution (FigureA) and single-row PWAM (FigureB) circuits when two consecutive events address neurons on the same row.

FIGURE
FIGURENeural activation functions measured with input spike trains, each comprising Poisson (green) and regular spike trains (blue) with varying input event rates.Measured representative membrane potential, which is shown in the log-domain, from Poisson and regular inputs are plotted on the top left and bottom right insets, respectively.In the insets, input and output spikes are indicated by bars at the top and middle rows, respectively.
Measured input-output transfer function of neural responses.The input spike rate is varied from to , Hz where the interspike intervals are distributed in the Poisson distribution.(B) Measured gain of input-output transfer function of the neuron defined as a ratio of the output and input spike rates.

FIGURE
FIGURE (A) Measured output frequency response curves as a function of eight-bit synaptic digital weight, which were measured from neurons in the representative row.The input spike train was a , Hz mean-rate Poisson spike train in s measurement.The result shows the o set of neuron activation caused by the threshold voltage mismatch of the transistor in the axon hillock circuit.(B) O set compensated neuron responses aligned to the mean response.The slope is defined as the ratio of the output spike rate increments in a decade and the unit of digital weight.(C) Histogram of the o sets across a representative k-neuron core.It shows the normal distribution with a wide variance across sample counts, while the inset shows a colormap representing the spatial distribution of o set; the brightest dot represents the most positive o set and the darkest dot represents the most negative o set.(D) Histogram of the slopes across the representative k-neuron core.It has a normal distribution with a mean of .and standard deviation of ., and its spatial distribution is drawn in the inset.
FIGURE (A) Measured and (B) modeled output frequency while varying the excitatory and inhibitory input frequencies from to , at digital weight of .

FIGURE
FIGUREMeasured tuning curves from the representative neuron with × pixel bar stimulus rotating orientation from to• by • per each and four × pixel Gabor filters, each with , , , and • .Pixel intensity of the stimulus is translated as a synaptic input frequency ranging from (darkest) to (brightest).Pixel intensity of the filter is translated as a synapse weight.Each data point is the mean of measurements each with s stimulation.The solid lines show simulation models from the output frequency response model show in Figure .

FIGURE
FIGURE (A) Raw input image with a size of × pixels.(B) Boundary detection with simulated model and measurement results.The × -pixel kernels used for the boundary detection are shown in the first column.For a simulation and measurement, a kernel presented at the same row is used.The simulation results and measured outputs are shown in the second and third columns, respectively.
Block diagram and schematic of two-compartment conductance-based leaky integrate-and-fire neuron circuit with AER interface circuits.The proximal and distal compartments, each comprising a conductively leaky membrane with two single-transistor conductance-based synapse circuits, are conductively coupled.A three-transistor dynamic latch holds V SEL to active low to select one synapse in the selected neuron while a pulse width modulated synaptic input at voltage V IN activates the synapse.An axon hillock circuit generates action potential and registers output events resetting the membrane potential of proximal compartment V mem .
A C D B FIGURE (A) Biological neural systems illustrating neural synaptic transmission.Incoming action potential induces that a presynaptic neuron releases neurotransmitters to synapses stimulating a postsynaptic neuron.(B) Emulation of the biological neural systems in electronics.Dynamic reconfigurable synaptic connectivity across IFAT arrays using virtual synaptic connections represented in neural spike events through a RAM/ROM synaptic routing table.(C) Block diagram of two-compartmental leaky integrated-and-fire neuron model with conductance-based synapses.(D) neuron comprises two conductively coupled compartments, each with two single-transistor conductance-based synapses.The compact form of single-transistor conductance-based synapses enables the dense integration of 65,536 neurons in a single chip.The IFAT neuron is suitable for continuous-time dynamical emulation of biologically realistic neuronal networks.We demonstrated the proof-of-principles with examples such Software-instantiated leaky integrated and fire neuron.b Internal connectivity.c By multiplexing the neuron 256 times.d When a core emulates 1,024 neural units.e Simulation results.
. It means synapses implemented by ReRAMs can be integrated on top of IFAT neurons and HiAER architecture, resulting in higher density and lower energy consumption. that