Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Neurosci., 12 September 2025

Sec. Neuromorphic Engineering

Volume 19 - 2025 | https://doi.org/10.3389/fnins.2025.1656892

This article is part of the Research TopicAdvancing Adaptive and Energy-Efficient Neuromorphic Computing for Real-Time Edge AI and RoboticsView all articles

Spike-based time-domain analog weighted-sum calculation model for extremely low power VLSI implementation of multi-layer neural networks

  • 1Graduate School of Life Science and Systems Engineering, Kyushu Institute of Technology, Kitakyushu, Japan
  • 2Research Center for Neuromorphic AI Hardware, Kyushu Institute of Technology, Kitakyushu, Japan

In deep neural network (DNN) models, the weighted summation, or multiply-and-accumulate (MAC) operation, is an essential and heavy calculation task, which leads to high power consumption in current digital processors. The use of analog operation in complementary metal-oxide-semiconductor (CMOS) very-large-scale integration (VLSI) circuits is a promising method for achieving extremely low power-consumption operation for such calculation tasks. In this paper, a time-domain analog weighted-sum calculation model is proposed based on an integrate-and-fire-type spiking neuron model. The proposed calculation model is applied to multi-layer feedforward networks, in which weighted summations with positive and negative weights are separately performed, and two timings proportional to the positive and negative ones are produced, respectively, in each layer. The timings are then fed into the next layers without their subtraction operation. We also propose VLSI circuits to implement the proposed model. Unlike conventional analog voltage or current mode circuits, the time-domain analog circuits use transient operation in charging/discharging processes to capacitors. Since the circuits can be designed without operational amplifiers, they can operate with extremely low power consumption. We designed a proof-of-concept (PoC) CMOS circuit to verify weighted-sum operation with the same weights. Simulation results showed that the precision was above 4-bit, and the energy efficiency for the weighted-sum calculation was 237.7 Tera Operations Per Second Per Watt (TOPS/W), more than one order of magnitude higher than that in state-of-the-art digital AI processors. Our model promises to be a suitable approach for performing intensive in-memory computing (IMC) of DNNs with moderate precision very energy-efficiently while reducing the cost of analog-digital-converter (ADC) overhead.

1 Introduction

Artificial neural networks (ANNs) or deep neural networks (DNNs), such as convolutional deep neural networks (CNNs) (LeCun et al., 2002) and fully connected multi-layer perceptrons (MLPs) (Cireşan et al., 2010), have shown excellent performance on various intelligent tasks, such as object detection and image classification (Cireşan et al., 2010; Krizhevsky et al., 2012; LeCun et al., 2015). However, DNNs require an enormous number of parameters and computational capability, resulting in much heavier computation and data movements, which leads to high power consumption in current digital computers, and even in highly parallel coprocessors such as graphics processing units (GPUs). To implement ANNs at edge devices such as mobile phones and personal service robots, very low power consumption operation is required.

In ANN models, the weighted summation, or multiply-and-accumulate (MAC) operation, is an essential and heavy calculation task (Shukla et al., 2019), and dedicated complementary metal-oxide-semiconductor (CMOS) very-large-scale integration (VLSI) processors have been developed to accomplish it (Chen et al., 2020). As an implementation approach other than digital processors, the use of analog operation in CMOS VLSI circuits is a promising method for achieving extremely low power-consumption operation for such calculation tasks (Hasler and Marr, 2013; Fick et al., 2017; Mahmoodi and Strukov, 2018; Bavandpour et al., 2020).

From the perspective of computing architecture, it is well known that traditional von Neumann-based architectures require the movement of weights and intermediate computing results between memory and processing units, resulting in extra latency and energy consumption, which is further aggravated in the data-intensive applications of DNNs (Horowitz, 2014; Sze et al., 2017). To reduce or eliminate power consumption and latency of data movement over memory, in-memory computing (IMC) with SRAM or memristive devices, which act as analog memory, is becoming a promising paradigm for accelerating DNNs. Analog in-memory computing (AIMC), which combines analog computation with the IMC architecture, can provide better energy efficiency by performing MACs in parallel-also known as matrix-vector multiplications (MVMs)-within the memory array in a single step (Verma et al., 2019; Valavi et al., 2019; Jiang et al., 2020; Jia et al., 2021a; Prezioso et al., 2015; Shafiee et al., 2016; Tsai et al., 2018; Khaddam-Aljameh et al., 2022). Typically, an AIMC core executes analog MVMs for a single ANN layer, multiplying the stationary weight matrix stored in the core with the activation vector applied at its input. SRAM-based implementations cannot hold the whole weight of larger networks fully on-chip because of their large areas for storing multi-bit weights (Jia et al., 2021b). As a result, off-chip weight buffers are needed to store network weights and transfer them partially to AIMC cores, which further reduces energy efficiency. Additionally, SRAM's volatility means that on-chip weights are lost when power is turned off. Analog non-volatile memory (NVM) technologies, such as resistive memory and flash memory, offer multiple bits per device, high density, and non-volatility, making it possible to store entire network weights on-chip. This has led to an emerging trend in NVM-based analog AI systems, where an increasing number of AIMC cores or tiles are deployed to efficiently conduct inference tasks (Wan et al., 2022; Fick et al., 2022; Le Gallo et al., 2023; Ambrogio et al., 2023).

Despite the exciting opportunities for energy-efficient ANN processing, AIMC-based AI systems also present unique challenges that must be addressed to realize their full potential. Beyond the limitations in computation accuracy, a critical challenge is the additional need for digital-to-analog converters (DACs) and analog-to-digital converters (ADCs) to transfer intermediate data between layers or tiles in digital form and to interface with digital peripheral circuits when MAC computations are performed in the current or voltage domain. These converters significantly limit the energy efficiency and scalability of AIMC systems (Shafiee et al., 2016; Marinella et al., 2018; Tsai et al., 2018; Verma et al., 2019; Jia et al., 2021a). To mitigate or overcome the limitations of DACs and ADCs, AIMC cores are increasingly adopting time-domain (TD) computing for MAC operations and interfacing with digital systems through digital-to-time converters (DTCs) and time-to-digital converters (TDCs) (Bavandpour et al., 2019a,b, 2021; Yang et al., 2019; Freye et al., 2022; Wu et al., 2022; Al Maharmeh et al., 2023; Choi et al., 2023). TD computing offers better technology scaling than voltage- and current-domain approaches (Al Maharmeh et al., 2020; Freye et al., 2024; Al Maharmeh et al., 2024), while DTCs and TDCs are generally more energy- and area-efficient than DACs and ADCs (Chen et al., 2022; Khaddam-Aljameh et al., 2022). Most TD schemes represent data using pulse-width modulation (PWM) or delay. In multi-core TD AIMC systems, time-based communication-where pulses are transmitted directly from tile to tile or layer to layer in an analog manner-is emerging as another key trend (Lim et al., 2020; Narayanan et al., 2021; Jiang et al., 2022; Seo et al., 2022; Nägele et al., 2023; Ambrogio et al., 2023). Because of the elimination of most DTCs and TDCs, system energy efficiency can be further improved. However, the TD analog computation is inherently susceptible to analog non-idealities, such as process, voltage, and temperature (PVT) variations, limiting the computation precision. Regarding accuracy, it is now widely recognized that 4-8 bits provide stable inference performance for most mainstream applications (Gupta et al., 2015; McKinstry et al., 2018; Choi et al., 2018), and this level of precision can typically be achieved through carefully designed analog circuits.

Successful DNNs are based on the second-generation artificial neuron model, which processes real-valued data and utilizes nonlinear activation functions (Roy et al., 2019). Most DNN architectures require signed computations. Since the rectified linear unit (ReLU) activation function (Nair and Hinton, 2010) is commonly used and the inputs of the first layer can be normalized as non-negative, IMC architectures have primarily been explored for two-quadrant MACs. However, to accommodate a wider range of applications, the AIMC system needs to support four-quadrant MACs (Khaddam-Aljameh et al., 2022; Le Gallo et al., 2023; Ambrogio et al., 2023; Le Gallo et al., 2024).

The time-domain weighted-sum calculation model was initially proposed based on the third-generation neuron model-spiking neurons inspired by the behavior of biological neurons (Maass, 1997a,b, 1999), to implement real-valued MACs in ANNs. In the model, inputs and outputs are encoded as spike timings, and weights are represented by the rising slope of the post-synaptic potential (PSP).

Subsequent research has simplified and expanded this model under the assumption of operation in analog circuits with transient states (Morie et al., 2016, 2010; Tohara et al., 2016). However, they have been limited to one-quadrant weighted-sum models where all weights for a single neuron share the same sign, and these studies did not address how to apply their models to neural networks. The proposed analog circuit, consisting of multiple input resistive elements and a capacitor (an RC circuit), enables extremely low-power operation, with energy consumption potentially reduced to the order of 0.1 fJ per operation. Throughout this paper, we refer to this VLSI implementation approach as “time-domain analog computing with transient states (TACT).” Unlike conventional weighted-sum operations in analog voltage or current modes, the TACT approach is well-suited for achieving much lower power consumption in CMOS VLSI implementations of ANNs. In this work, we extend the model to address the above-described challenges associated with NVM-based AIMC AI systems. Our primary contributions are summarized as follows.

1) We extend the time-domain one-quadrant MAC calculation model to four-quadrant one, where signed inputs are encoded using a differential pair of spikes, and signed weights are implemented through a dummy weights scheme. The output is represented by a pair of spikes, with their timing difference proportional to the MAC result, enabled by the added dummy weights. Since both inputs and outputs are encoded in a timing format, the AIMC core can be seamlessly integrated with efficient DTCs and TDCs.

2) We theoretically demonstrate how MAC output spikes can be directly transferred to the next layer, potentially eliminating the need for DTCs and TDCs between tiles. Additionally, we provide a clear explanation of the challenges associated with spike transfer in this process.

3) We propose two sets of analog circuits, each consisting of multiple input resistive elements and a capacitor (an RC circuit), to implement the four-quadrant MAC computation. Additionally, we describe a proof-of-concept (PoC) CMOS circuit equivalent to the RC circuit, with a preliminary estimation suggesting that the energy efficiency could reach hundreds of TOPS/W (Tera Operations Per Second Per Watt) and the precision could be four bit or higher.

4) We propose architectures for an analog NVM-based AIMC core and system. In the core, signed weights can be implemented using either a complementary scheme or a differential scheme, depending on how dummy weights are introduced. At the system level, tile-to-tile communication is achieved through spike-based transmission in an analog manner.

2 Spike-based time-domain weighted-sum calculation model

2.1 Time-domain weighted-sum calculation with same-signed weights

A simple spiking neuron model, also known as an integrate-and-fire-type (IF) neuron model, is shown in Figure 1 (Maass, 1999). In this model, a neuron receives spike pulses via synapses. A spike pulse only indicates the input timing, and its pulse width and amplitude do not affect the following processing. A spike generates a temporal voltage change, which is called a post-synaptic potential (PSP), and the internal potential of the n-th neuron, Vn(t), is equal to the spatiotemporal summation of all PSPs. When Vn(t) reaches the firing threshold θ, the neuron outputs a spike, and Vn(t) then settles back to the steady state.

Figure 1
Diagram illustrating a spiking neuron model. Shows a circle with labeled segments: input spikes i1, i2, i3, potential Vn, firing threshold θ, output spike in, and PSPs P1, P2, P3. Features graphs of input spikes i(t), their PSPs P(t), cumulative potential Vn(t), and firing threshold θ. Time markers t1, t2, t3, tv indicate event sequences, with Tin, Tv for intervals. Illustrates neuron firing dynamics and thresholds.

Figure 1. IF neuron model for weighted-sum operation: schematic of the model and weighted-sum operation using the rise timing of PSPs.

Based on the model proposed in Maass (1997a), a simplified weighted-sum operation model using IF neurons is proposed. The time span Tin is defined, during which only one spike is fed from each neuron, and it is assumed that a PSP generated by a spike from neuron i increases linearly with slope ki from the timing of the spike input, ti, as shown in Figure 1.

A required weighted-sum operation is that normalized variables xi (0 ≤ xi ≤ 1, i = 1, 2, ⋯ , N) are multiplied by weight coefficients ai, and the multiplication results are summed regarding i, where N is the number of inputs. This weighted-sum operation can be performed using the rise timing of PSPs in the IF neuron model. Input spike timing ti is determined based on xi using the following relation:

ti=Tin(1-xi),    (1)
xi=(1-tiTin).    (2)

Coefficients ai are transformed into the PSP slopes ki:

ki=λai,    (3)

where λ is a positive constant. If the firing time of the neuron is defined as tν, we easily obtain the equation

i=1Nki(tν-ti)=θ.    (4)

If we define the following parameters:

β=i=1Nai,    (5)

we obtain

i=1Nai·xi=θ/λ+β(Tin-tν)Tin,    (6)
=θλTin+β(1-tνTin).    (7)

Here, we assume that all the weights in the calculation have the same sign, i.e., ai≥0 or ai ≤ 0 for all i. This is different from the previous similar work in which only the sum of weights is restricted to be positive for firing (Zhang et al., 2021). When all inputs are minimum (∀i xi = 0), the left side of Equation 6 is zero. Then, the output timing tν is given by

tνmin=θλβ+Tin.    (8)

On the other hand, when all inputs are maximum (∀i xi = 1), the left side of Equation 6 is β, and the output timing tν is given by

tνmax=θλβ.    (9)

The time span during which tν can be output is [tνmax,tνmin], and its interval is

Touttνmin-tνmax=Tin.    (10)

Thus, the time span of output spikes is the same as that of input spikes, Tin.

In this model, since the normalization of the sum of ai (β = 1) is not required [unlike in the previous work (Maass, 1997a, 1999; Tohara et al., 2016)], the calculation process becomes much simpler. When implementing the time-domain weighted-sum operation, setting the threshold potential θ properly is the key to making the operation work appropriately. As shown in Figure 1, the earliest output spike timing has to be later than the latest input spike timing Tin; that is, tνmaxTin. Thus,

θλβTin.    (11)

Also, we can rewrite Equation 11 as

θ=λβTin+δ,    (12)
δ=ϵ(λβTin),    (13)

where ϵ≥0 is an arbitrarily small value. By substituting Equations 12, 13 into Equations 8, 9, we obtain

tνmin=(2+ϵ)Tin,    (14)
tνmax=(1+ϵ)Tin,    (15)

where ϵTin is considered as a time slot between input and output timing spans, as shown in Figure 1, and ϵ determines the length of the slot.

2.2 Time-domain weighted-sum calculation with different-signed weights

We propose a time-domain weighted-sum calculation model with two spiking neurons, one for all the positive weights and the other for all the negative ones. We apply Equation 6 to each neuron, and the two results are summed as the final result of the original weighted sum. Here, we show the details of the model.

Let ai+ and ai- indicate the positive and negative weights, respectively. We define

β+=i=1N+ai+0,β-=i=1N-ai-0.    (16)

Where N+ and N are the numbers of positive and negative weights, respectively:

N=N++N-,i=1Nai=i=1N+ai++i=1N-ai-,β=β++β-.    (17)

Thus, assuming λ = 1, Equation 4 is rewritten for the positive and negative weighted-sum operations as

i=1N+ai+(tν+-ti)=θ+,    (18)
i=1N-ai-(tν--ti)=θ-,    (19)

where θ+(>0), θ(<0), and tν+ and tν- indicate the threshold values and output timings for the positively and negatively weighted-sum operation, respectively. We obtain

i=1N+ai+·xi=θ++β+(Tin-tν+)Tin,    (20)
i=1N-ai-·xi=θ-+β-(Tin-tν-)Tin.    (21)

Therefore, we can obtain the original weighted-sum result:

i=1Nai·xi=i=1N+ai+·xi+i=1Nai·xi                    =θ++θ+βTin(β+tν++βtν)Tin    (22)

Let us define a dummy weight a0 as the difference between both absolute values of β±:

a0=-(β++β-).    (23)

If β+≥−β, then a0 ≤ 0 and this dummy weight is incorporated into the negative weight group, and vice versa. This dummy weight is related to a zero input, x0 = 0, which means t0 = Tin. By using the dummy weight, we can make the absolute values of β± identical (β = 0), and we define

βo=β+=-β-.    (24)

Also, according to Equations 12, 13, the absolute values of θ+ and θ can be the same, and θ+ = 0. Therefore, Equation 22 can be rewritten as

i=1Nai·xi=βo(tν--tν+)Tin.    (25)

3 Time domain neural network model

3.1 Neuron model

The typical neuron model of ANNs is shown in Figure 2a, which has N inputs xi with weights wi and a bias b;

y=f(i=1Nwi·xi+b),    (26)
Figure 2
Diagram showing three models: (a) a single-layer neural network with inputs, weights, and bias producing output y. (b) a perceptron model similar to (a) with an additional dummy input and equations for weight adjustments. (c) a perceptron model with absolute weight values for updates. Each model illustrates input connections, weight labels, and output functions.

Figure 2. Neuron model: (a) typical neuron model; (b) neuron model for time-domain weighted-sum operation with a dummy weight, wn+1; (c) neuron model for time-domain weighted-sum operation in which each synapse has two sets of inputs and weights that one set is (xi, wi) and the other is (0, −wi) or (ti, wi) and (Tin, −wi) according to Equation 2.

where y is the output of the neuron, and f is an activation function. We can consider the bias as a weight whose input is always unity and regard the 0 index weight as the bias throughout this paper. Therefore, our time-domain weighted-sum calculation model with the dummy weight can be applied to this neuron model, as shown in Figure 2b. According to Equation 25,

i=0Nwi·xi=β(tν--tν+)Tin.    (27)

Based on Equation 27, we propose another model, shown in Figure 2c, in which each synapse has two sets of inputs and weights; one is (xi, wi) and the other is (0, −wi). In this model, it is not necessary to add a dummy weight because the summation of positive weights is equal to the absolute one of negative weights automatically, i.e., β=i=0N||wi||.

As the activation function f, we often use the rectified linear unit called “ReLU” (Nair and Hinton, 2010), which is defined as follows:

f(x)=ReLU(x)={xif x0,0otherwise.    (28)

We can implement the ReLU function by comparing the output timings tν- and tν+ in the time-domain weighted-sum calculation as follows:

f(i=0Nwi·xi)=ReLU(β(tν--tν+)Tin)=β(tν--tν+)Tin    (29)

where, if tν->tν+, the difference between the two timing values is regarded as the output transferred to neurons in the next layer, and if tν-<tν+, we set tν- and tν+ to be identical to make the output zero because of the negative weighted-sum result. Its circuit implementation will be shown later.

3.2 Neural network model

In this section, we extend our time-domain neuron model shown in Figure 2c to the neural network and theoretically show the intermediate timing transfer mechanisms between layers. We first apply the procedure to a two-layer MLP, which has one hidden layer and two sets of input and weight for each neuron shown in Figure 3, as an example, and then generalize it.

Figure 3
Diagram of a neural network with an input layer, hidden layer, and output layer. Nodes are represented by circles connected with directional lines. Input nodes are labeled x1 to xi. The hidden layer has nodes θ1(n) and θ2(n), with weights w and biases β(n). The output layer node θ(p) has weights and biases w(p) and β(p). Connections are color-coded with corresponding equations and terms indicated in blue and black.

Figure 3. General neural network model with two inputs and outputs for time-domain weighted-sum calculation with positive and negative weights.

In this MLP, according to Equation 27, the weighted-sum result of the j-th neuron in the hidden layer labeled as n can be

i=0Nwij(n)·xi=βj(n)Tin(tvj(n)--tvj(n)+),    (30)

where βj(n)=i=0N||wij(n)||, tvj(n)- and tvj(n)+ are the timings generated at the j-th neuron in the n-th layer. The output yk(p) of the k-th neuron in the output layer labeled as p(= n+1) is

yk(p)=j=1Nwjk(p)·f(i=0Nwij(n)·xi)+bk(p)      =j=1Nwjk(p)·f(βj(n)Tin(tvj(n)tvj(n)+))+bk(p)             =j=0Nwjk(p)·βj(n)Tin(tvj(n)tvj(n)+),    (31)

where ReLU is used as the activation function, and the bias bk(p)=w0k(p) is represented in the time-domain model by

bk(p)                  =w0k(p)·1       =w0k(p)·β0(n)Tin(tv0(n)tv0(n)+),    (32)

in which tv0(n)+=Tin is paired to w0k(p), tv0(n)-=0 is paired to -w0k(p) and β0(n)=1 as there is no input to the bias.

In the MLP shown in Figure 3, we transfer the output timings tvj(n)+ and tvj(n)- generated in layer n to the neurons in layer p and perform the time-domain weighted-sum operation. The timings tvk(p)+ and tvk(p)- are assumed to be produced at the k-th neuron of layer p. We relate timing tvj(n)+ to weight wjk(p) and tvj(n)- to -wjk(p). We also assume here N = 3 and that w1k(p)0,w2k(p)<0,w3k(p)0,bk(p)0, and θk(p)+=-θk(p)-, where θk(p)+ and θk(p)- are the threshold values for positively and negatively weighted-sum operations, respectively. Thus, according to Equation 4, we can obtain

w1k(p)(tvk(p)+tv1(n)+)+(w2k(p))(tvk(p)+tv2(n))     +w3k(p)(tvk(p)+tv3(n)+)+bk(p)(tvk(p)+tv0(n)+)=θk(p)+    (33)
          (w1k(p))(tvk(p)tv1(n))+w2k(p)(tvk(p)tv2(n)+)+(w3k(p))(tvk(p)tv3(n))+(bk(p))(tvk(p)tv0(n))=θk(p)    (34)

By adding Equation 33 to Equation 34 on the left and right sides, respectively, the following relationship is obtained:

j=0N=3||wjk(p)||·(tvk(p)+-tvk(p)-)+j=0N=3wjk(p)·(tvj(n)--tvj(n)+)=0.    (35)

Thus, we can obtain the following simple expression:

j=0N=3wjk(p)·(tvj(n)--tvj(n)+)=(tvk(p)--tvk(p)+)·j=0N=3||wjk(p)||.    (36)

Therefore, we generalize the number of neurons N = 3 to N again, and replace Equation 36 with Equation 31. Then, the output yk(p) in Equation 31 can finally be

yk(p)=j=1Nwjk(p)·ReLU(i=0Nwij(n)·xi)+bk(p)             =(tvk(p)tvk(p)+)·j=0Nwjk(p)·βj(n)Tin    (37)

As a result, for neurons in the hidden layer n, we apply the time-domain weighted-sum operation to generate the timing tvj(n)+ and tvj(n)- for the positively and negatively weighted-sum calculation from the input layer, respectively. Then, these timings are directly transferred to neurons in the next layer p, and timing tvj(p)+ and tvj(p)- are obtained. Finally, we calculate the final outputs of the MLP using Equation 37 without calculating the middle layers' weighted-sum results using Equation 27.

We summarized the above mathematical time-domain operations in general MLPs graphically as shown in Figure 4. We refer to the “sum of the weights” as the “sum of the weights' absolute values” in the remainder of this paper. Note that the weights in the middle and output layers are replaced by the products of the original weight and the sum of the neuron's weights in the previous layer during the time-domain process. We indicated the sum of the new reconfigured weights by B instead of the aforementioned β, which indicated the sum of the original weights, as follows:

βj(1)=i=0||wij(1)||,    (38)
Bj(2)=i=0βj(1)wij(2)            (39)
Bj(n)=i=0Bj(n-1)||wij(n)||.    (40)
Figure 4
Diagram of a multi-layer neural network. Shows connections between neurons labeled as 1st, ith, jth, kth, and bias. Mathematical equations describe the interactions and weights, including terms like Bi(n-1), w_{ij}{(n)}, and t_{vj}{(n)}. A small inset graph illustrates a curve, highlighting changes in values. The jth neuron-s weighted-sum result equation is shown in a shaded box at the bottom.

Figure 4. Summary of the general scheme of the time-domain weighted-sum neural network model.

Note the bias of neuron j in layer n, whose index i = 0, is indicated as B0(n-1)w0j(n), in which B0(n-1)1. Therefore, the new weight connected from neuron i in layer n−1 to neuron j in layer n becomes Bi(n-1)wij(n). Then we can have the original weighted-sum result of the j−th neuron in the n−th layer indicated by yj(n) expressed as follows:

yj(n)=Bj(n)Tin(tνj(n)--tνj(n)+)    (41)

3.3 Numerical simulations of neural networks

We performed numerical simulations to verify our weighted-sum calculation model. First, in order to verify our model for weighted-sum calculation with different-signed weights, we conducted a simulation to perform a weighted-sum calculation with 501 pairs of inputs and weights that consisted of 249 positive and 252 negative weights. We added a dummy weight to make the sum of positive weights equal to the absolute sum of the negative ones. Figure 5 shows the simulation results of the time-domain weighted-sum calculation with a dummy weight wn+1. The results show that the weighted summation can be calculated correctly with different negative and positive firing timing inputs, each set of which are multiplied by the corresponding signed weights.

Figure 5
Graph (a) shows an exponential increase of V(t), transitioning to a linear increase at tv. Graph (b) shows a similar pattern inverted, with an exponential decrease transitioning to linear. Both graphs delineate time intervals Tin and tv, and thresholds \theta, tvmax, and tvmin.

Figure 5. Simulation results for the time-domain weighted-sum calculation model applied to the neuron shown in Figure 2b: (a) PSP of positively weighted-sum operation with 249 inputs in which Tin=1,λ=1,ϵ=0.01,β+=24.01, and θ+ = 24.25. The output spike timing is tν+=1.6356. (b) PSP of negatively weighted-sum operation with 253 inputs in which w0=-0.06,wn+1=-2.819,Tin=1,λ=1,,ϵ=0.01,β-=-24.01, and θ = −24.25. The output spike timing is tν-=1.8321. Thus, the result of the weighted-sum calculation is ||β±||(tν--tν+)/Tin=4.718.

Then, we applied our model to a four-layer MLP (784-100-100-100-10) to classify the MNIST digit character set. We trained the MLP and then performed inference according to Equation 41 with the obtained weights, which were either binary (Courbariaux et al., 2015) or floating-point values. As described above, output spike timings at each neuron in the previous layer were directly conveyed to the neurons in the next layer without obtaining weighted-sum results in the middle layers. We found that we obtained the same weighted-sum calculation results in the last layer and also the same recognition precisions in both NNs as in the numerically calculated ones.

4 Issues about time-domain weighted-sum models toward VLSI implementation

We have established our time-domain weighted-sum neural network model in a general form and summarized it in Figure 4, and conducted numerical simulations that verified the effectiveness in pre-trained ANN models in Section 3. In this section, we will discuss some issues about the model when implemented in analog VLSI circuits.

4.1 Weights and biases

In the general time-domain weighted-sum neural network model as shown in Figure 4 and Equation 41, the weights must be reconfigured as Bi(n-1)wij(n) in order to generate two timings whose interval is proportional to the original weighted-sum result, which results in the same recognition accuracy as the original ANN. The reconfigured weights correspond to the PSP slope in the IF neuron model. The slope will be greatly increased with the reconfiguration, which may result in very high potentials that do not satisfy the hardware system criteria. To solve this problem, we introduced a scaling factor Γ(n) for the n-th layer as shown in Figure 6, to adjust the PSP slope to a reasonable level. Note that every neuron in the same layer has the same scaling factor.

Figure 6
Diagram illustrating a neural network layer interaction. It shows the relationship between layers n−1 and n, with nodes labeled ith, jth, and 0th. Equations describe weight (wij), bias (bj(n)), and signal transformations (Bsi(n−1), tvij). A boxed equation shows the output yj(n). Connections represent weighted relationships and processing paths, with variables represented in different colors for context.

Figure 6. Weights scaling: a scaling factor Γ(n), expressed in Equation 52, is introduced for the n-th layer to adjust the reconfigured large PSP slope shown in Figure 4 to a reasonable level. After scaling, the slopes are expressed as Bsi(n-1)wij(n)/Γ(n), where Bsi(n-1) represents the scaled sum of weights in the previous layer n−1 and is scaled by Γ(n−1).

Then the reconfigured weight becomes Bsi(n-1)wij(n)/Γ(n), where Bsi(n-1) represents the scaled sum of weights in the previous layer n−1, and the scaled sum of the reconfigured weights in layer n, which is also interpreted as the total PSP slope, is expressed as

Bsj(1)=1Γ(1)i=0||wij(1)||,    (42)
Bsj(2)=1Γ(2)i=0Bsj(1)wij(2)              (43)
Bsj(n)=1Γ(n)i=0Bsj(n-1)||wij(n)||    (44)

so we can obtain

Bsj(1)=1Γ(1)βj(1),    (45)
Bsj(2)=1l=1(2)Γ(l)Bj(2)            (46)
Bsj(n)=1l=1(n)Γ(l)Bj(n)    (47)

Note that the bias bj(n) is reconfigured as

bj(n)=Bs0(n-1)w0j(n)/Γ(n),    (48)

where Bs0(n-1)=1l=1(n-1)Γ(l). Accordingly, the original weighted-sum result is expressed as

yj(n)=l=1(n)Γ(l)Bsj(n)Tin(tνj(n)tνj(n)+)=Bj(n)Tin(tνj(n)tνj(n)+)    (49)

From Equations 49, 41, we can find that the difference between timing tνj(n)- and tνj(n)+ remains the same before and after the scaling operations.

So far we have shown the general scaling process toward the weights' reconfiguration. Next, we will show some special cases that can simplify the weights' reconfiguration. Suppose that

i0||wi1(n)||=i0||wi2(n)||=⋯=i0||wij(n)||,    (50)
||w01(n)||=||w02(n)||=⋯=||w0j(n)||    (51)

meaning that the bias and the sum of the original weights of every neuron in layer n are equal to each other, such as in the BinaryConnect NN model (Courbariaux et al., 2015), whose weights and biases are binary values. Let the scaling factor Γ(n) be

Γ(1)=βj(0),Γ(2)=βj(1),        Γ(n)=βj(n1)    (52)

where βj(l) denotes the sum of the weights of the layer l(= 0, 1, 2⋯n) expressed as follows:

βj(0)=1,βj(1)=i0wij(1)+1βj0(0)w0j(1),βj(2)=i0wij(2)+1βj0(1)w0j(2),         βj(n)=i0wij(n)+1βj0(n1)w0j(n),    (53)

where βj=0(n)=1. Note that β1(n)=β2(n)=⋯=βj(n) under the assumption of Equations 51, 52. Then we can generate the desired timings using only the original weights wij(n),i0 shown in Figure 7a, without reconfiguring the weights (not biases included) as Bi(n-1)wij(n),i0 shown in Figure 4. However, the bias bj(n)must be reconfigured as

bj(n)=1βj(n-1)·w0j(n)    (54)
Figure 7
Diagrams (a) and (b) illustrate neural network architectures with layers labeled as (n−1)-th, n-th, and (n+1)-th. Neurons and connections with weights are depicted. Diagram (a) shows connections with a focus on biases and weighted sums, while diagram (b) includes dummy weights and modified weight calculations, highlighting dj(n) adjustments. Equations and annotations explain neuron computations, targeting neural network and algorithm functions.

Figure 7. Derivations from the general time-domain weighted-sum process in MLPs whose weights (biases not included) involved in the time-domain process, i.e., Bj(n-1)wij(n),i0, are scaled to the original weights, wij(n),i0: (a) the models with biases: the biases are scaled as β0(n-1)βi0(n-1)·w0j(n) under the assumption of Equations 50, 51 accompanying the weights' scaling operation; (b) the models without biases: we add an extra dummy weight dj(n)=β(n)-i||wij(n)|| whose two input timings are the same, i.e., the input is 0, for neuron j in layer n to make the sum of weights of every neuron in layer n identical, which is marked as β(n).

Accordingly, the original weighted-sum result will be

yj(n)=l=1(n)β(l)Tin(tνj(n)--tνj(n)+)    (55)

where β(l) denotes the identical value of the sum of weights among neurons in layer l.

In modern deep neural networks with deep layers and a large number of parameters, several experiments using models without bias demonstrated that there was an accuracy degradation of 3.9% and 4% in CIFAR10 and CIFAR100 datasets, respectively (Wang et al., 2019). Such degradation of less than 5% is supposed to be acceptable when deploying DNNs to resource-constrained edge devices, in which trade-offs between accuracy, latency, and energy efficiency need to be carefully considered (Shuvo et al., 2022; Ngo et al., 2025).

We also trained a four-layer MLP (784-100-100-100-10) with and without biases on the datasets MNIST and Fashion-MNIST and compared the results in both the floating-point and binary weight connect models, as shown in Figure 8. The results showed that the accuracies with and without biases were comparable. Therefore, in certain cases, the bias can be removed so that the reconfiguration cost of the bias shown in Equations 48, 54 is saved.

Figure 8
Two line graphs comparing accuracy over epochs for different configurations: (a) shows minimal differences between four configurations, with all nearing 95% accuracy. (b) shows more variation with accuracy around 80-90%, with floating-point weights generally performing better than BinaryConnect configurations.

Figure 8. The MLP recognition accuracy comparison between models with and without bias on (a) MNIST dataset and (b) Fashion-MNIST dateset. The MLP models are the BinaryConnect model and the general floating-point connection model.

We have shown a case in which the weights and biases were restricted to the condition shown in Equations 50, 51 so that the cost of the weights' reconfiguration can be saved. Next, we propose a method to satisfy the restriction in Equation 50 for a more general ANN model to save the weights' reconfiguration shown in Figure 7b. Note that we discuss the method in the model without biases for simplicity. We add a dummy weight to every neuron in layer n to make the sum of the weights identical. We regard the identical value in layer n as β(n). The dummy weight to neuron j in layer n, indicated as dj(n), is allocated as:

dj(n)=β(n)-i||wij(n)||    (56)

Note that the input timings for the dummy weights dj(n) and -dj(n) are identical, such as Tin, meaning a 0 input.

4.2 Output timing difference

From Equations 40, 41, we can find that the coefficients applied to the output timing difference are increased monotonically as the layer goes deeper. It has generally been observed that the outputs of every neuron, i.e., the weighted-sum results, converge at a certain range in a well-trained ANN. Therefore, we figure out that the timing difference decreases monotonically as the layer goes deeper. This effect essentially results from our calculating the positively signed and negatively signed weighted sums separately.

We demonstrate the timing difference issue by means of a case study in which we perform the time domain weighted-sum inference in a well-trained four-layer MLP (784-100-100-100-10). We collected the distributions of the output timing differences in every layer of the MLP when performing evaluation on the 10,000-sample test data in MNIST. Figure 9a shows the distributions where the histograms show only 100 of the total 10,000 samples, but the standard deviations σ are calculated over the total samples. The output timing differences, σ, of the 1st, 2nd, 3rd and 4th layers are 5.89e − 8 s, 5.91e − 9 s, 8.42e − 10 s, and 1.08e − 10 s, respectively, under the assumption of Tin = 1 μs. If we assume that the resolution time step is 10 ns by taking the noise of analog circuits into account, the great majority of the timing differences in the 2nd and the subsequent layers are less than the time resolution. Therefore, such a time-domain multi-layer model cannot be implemented into analog VLSI directly. We also conducted an experiment to evaluate the noise tolerance of the above model. In the experiment, we injected noise with different standard deviations σ to the output timing tν- and tν+, and evaluated the MLP recognition accuracy. The results are shown in Figure 10a. We can find that the accuracy deteriorates when the noise level is around 0.5 ns near the distribution σ of the 3rd layer. The model does not work when σ is over 5 ns near that of the 2nd layer.

Figure 9
Comparison of histograms (a) and (b) showing the distribution of (tv*-tv+) for four layers. Graphs (a) have narrower peaks and lower standard deviations compared to graphs (b). Standard deviations for layer one are 5.89 × 10−8 seconds in both (a) and (b), with (b) showing wider distribution. Layer two shows 5.91 × 10−9 seconds in (a) and 5.94 × 10−8 seconds in (b). Layer three shows 8.42 × 10−10 seconds in (a) and 8.30 × 10−8 seconds in (b). Layer four shows 1.08 × 10−10 seconds in (a) and 1.07 × 10−7 seconds in (b).

Figure 9. The distribution of the output timing difference, tν--tν+, in every layer of the 784-100-100-100-10 four-layer MLP model Tin is assumed to be 1 μs: (a) the original distributions; (b) the distributions in which the timing differences are amplified with a gain of 10.

Figure 10
Three graphs are presented: (a) shows accuracy versus noise, where accuracy decreases as noise increases; (b) compares accuracy against gain for different noise levels, showing varying trends based on noise; (c) depicts a relation between time difference, noise, and gain across four layers, each represented by a different color, indicating an increase with gain.

Figure 10. Noise tolerance examinations in the four-layer MLP: (a) recognition accuracy of the model without timing difference amplification evaluated under different noise levels; (b) recognition accuracy under the different amplification gain conditions with the critical noise injected; (c) the output timing difference distributions in every layer under different amplifying gain conditions.

In order to solve the problem of decreasing output timing difference, we can introduce an amplification component into our model, which can amplify the timing difference just before the timings are transferred to the next layer. To examine the effectiveness of this amplification function, we performed experiments in which we amplified the timing difference with different gains and evaluated the recognition accuracy with the critical noise injected. The results are shown in Figure 10b. We also plotted the output timing difference distributions in every layer under different amplifying gain conditions. Figure 10c shows the distribution standard deviations, and Figure 9b shows the histograms with a gain of 10. Note that in these experiments, we simply set the same amplification gains in every layer without optimizing the gains. We found that the recognition accuracy is comparable to that in the model without noise injected if we select a gain to make the output timing difference distribution σ of the last layer larger than the critical noise level, such as 10 ns. However, for more robustness, the distribution σ is supposed to be much larger than the resolution time step so that the gain can be 8–10. We verified the effectiveness of the amplification function and established the time-domain weighted-sum model with amplification components. In VLSI circuits, we can introduce a time-difference-amplifier (TDA) (Abas et al., 2002; Asada et al., 2018) component to amplify the output timing difference.

5 Circuits and architectures for TACT-based neural networks

As a VLSI implementation of our time-domain weighted-sum calculation based on the TACT approach, we propose an RC circuit in which a capacitor is connected by multiple resistors, as shown in Figure 11a. Theoretical estimations have indicated that this circuit can perform weighted-sum calculations with extremely low energy consumption (Tohara et al., 2016; Wang et al., 2016).

Figure 11
Diagram illustrating an synapse circuit. Part (a) shows a resistor ladder with three input voltages and a capacitor grounded. Part (b) displays a MOSFET connected to a capacitor, with a timing diagram for input voltage and capacitor voltage. Part (c) depicts a switch configuration with different voltage states and a table indicating threshold voltage, Vth, Vki, and theta values. Text explains that switch S2 is off due to a positive gate-source voltage.

Figure 11. Synapse circuit: (a) step voltage input and a resistance-capacitance (R-C) circuit in which a pMOSFET acts as resistance R, and parasitic capacitance of interconnection and the gate capacitance of MOSFETs act as C in a VLSI circuit; (b) approximately linear response of the step voltage input at timing ti with a slope determined by gate voltage Vki. (c) an operating example for explanation of the pMOSFET synapse's rectification function.

In CMOS VLSI implementation, resistance R can be replaced by a p-type MOS field-effect transistor (pMOSFET), as shown in Figure 11b. The approximately linear slope k is generated by capacitance C and ON resistance of a pMOSFET with a step voltage input Vin, where we use step voltages instead of spike pulses as inputs. Each resistance should have a rectification function to prevent an inverse current. The rectification function is automatically realized by the FET operation as follows. When a pMOSFET receives a step-voltage input, the terminal voltage of the input is higher than that at C, and therefore, the input-side terminal of the pMOSFET is the “source,” and the capacitor-side terminal is the “drain.” In this state, if the gate-source voltage (Vgs) of the pMOSFET is set to exceed its threshold voltage, the pMOSFET turns on, and C is charged up. On the other hand, when a pMOSFET receives no input, the terminal voltage of the input is lower than that at C, and therefore the source-drain position in the pMOSFET is reversed; i.e., the input-side terminal of the pMOSFET is “drain,” and the capacitor-side terminal is “source.” In this state, if the Vgs of the pMOSFET is set not to exceed its threshold voltage, the pMOSFET turns off, and the charges stored at C do not flow back to the input side. An operating example is shown in Figure 11c, in which the synapse pMOSFET without input (i.e., the input voltage is 0V here), denoted as S2, is strongly off because its gate-source voltage is positive.

5.1 Architectures

We propose a circuit architecture of a neural network based on our established weighted-sum calculation model which is suitable for our TACT approach, accommodating both positive and negative weights. The architecture is shown in Figure 12 and is composed of a crossbar synapse array acting as resistive elements, the neuron part functioning as thresholding and nonlinear activation, and the configuration part controlling synapses.

Figure 12
Diagram showing a neuromorphic computing architecture. Panel A illustrates weight control using positive and negative dendrite lines, neurons, ReLU activation, and post-processing blocks. Panel B displays a similar setup with a focus on dummy synapse cells. Both panels include weight conditions for neuron connections and axon lines.

Figure 12. TACT-based MLP architecture. (a) A two-layer MLP architecture described in Figure 3 whose input layer is modeled in Figure 2c in which each synapse has two set of inputs and weights. (b) Another type of input layer architecture of the MLP that there is one input and one weight for each synapse while a row of dummy cells with a dummy input are added.

Figure 12a shows a two-layer MLP architecture described in Figure 3 whose input layer is modeled in Figure 2c in which each synapse has two sets of inputs and weights. Another type of input layer architecture of the MLP is shown in Figure 12b, as described in Figure 2b.

In Figure 12a, there are two inputs for each synapse circuit, which are ti as the signal input and tdmy as a dummy input in the first layer, and tνi+ and tνi- in the subsequent layers. Pairs of positive and negative timings are directly connected to the next layer without subtracting the negatively signed weighted results from the positively signed weighted results according to the theory explained in Section 3.2 and Figure 4. In the synapse array, the horizontal and vertical lines are referred to as “axons” of the previous neurons and “dendrites” of the post neurons, respectively. Suppose that each axon line has M synapse circuits, and each dendrite line receives N synapse outputs. A synapse cell is designed with two resistive elements and two pairs of switches. A set of two identical resistances represents the weight value. The resistive elements are expected to be replaced by resistance-based analog memories to store the multi-bit weights (Sebastian et al., 2020). We can assume that the upper-side axon is for tνi+ and the other is for tνi-, and the left-side dendrite is for a positive weight connection while the other is for a negative one. The two switches are exclusively controlled according to the corresponding sign of weights, which is controlled by the weight control circuit. By contrast, there is one input and one weight for each synapse in the input layer shown in Figure 12b, while a row of dummy cells with a dummy input is added to the synapse array conceptually according to Section 3.1. The resistances (di) of the dummy cell are theoretically set according to Equation 56.

In AIMC, the most common implementation of a signed weight wi is using a differential scheme with two subweights wi+ and wi- such that

wi=wi+-wi-,    (57)

in which one is for positive weighted-sum and another is for negative one (Xiao et al., 2023; Aguirre et al., 2024). Here, we treat the following signed weight configuration as a special differential scheme (Yamaguchi et al., 2020; Kingra et al., 2022):

wi={wi+0where wi is disabled, indicating wi0,0wiwhere wi+ is disabled, indicating wi0.    (58)

Subtraction to obtain the final weighted-sum result is commonly performed in either differential mode or common mode. In differential mode, the operation is carried out at two separate nodes within the peripheral circuitry (Guo et al., 2017; Joshi et al., 2020; Yamaguchi et al., 2020; Sahay et al., 2020). In common mode, the subtraction is performed at a single node based on Kirchhoff's law, using either bipolar (Wan et al., 2022; Aguirre et al., 2024) or unipolar inputs (Wang et al., 2021; Khaddam-Aljameh et al., 2022; Le Gallo et al., 2023). It's worth noting that common-mode subtraction with unipolar inputs generally requires a bi-directional peripheral circuit, capable of providing two voltages: one higher and one lower than the common node voltage.

Signed inputs for 4-quadrant computation can be implemented by applying opposite polarity voltage (Marinella et al., 2018; Le Gallo et al., 2024) or using differential pairs when the input is unipolar (Schlottmann and Hasler, 2011; Bavandpour et al., 2019a). Additionally, signed computations without adopting the above two designs need multiple phase modulations, like two phases in Kingra et al. (2022) for 2-quadrant MAC and four phases in Le Gallo et al. (2023) for 4-quadrant MAC.

With respect to the signed weight representation in our approaches, the configuration in Figure 12b is regarded as the special differential scheme described in the expression Equation 58, and that in Figure 12a restricts the two subweights to be identical. We term the latter configuration as a complementary scheme, distinguishing it from the general differential scheme. With respect to the signed input representation, we adopt differential pairs as in Bavandpour et al. (2019a). Our model with the complementary scheme can perform four-quadrant MAC computation in a single modulation without bipolar or bi-directional peripheral requirements.

The neuron part shown in Figure 12a consists of a thresholding block, such as a comparator, a ReLU block, and a post-processing block (PPB) after the ReLU block. ReLU block with input and output timings is shown in Figure 13a. The relationships between inputs and outputs are illustrated in Figure 13b, in which both output timings are set identical to tνi+ when tνi+>tνi-. The truth table is shown in Figure 13c and accordingly the ReLU activation function can easily be implemented by logic gates, as shown in Figure 13d. With such circuits, the nonlinear activation function ReLU can be implemented with low energy consumption operation. The PPB can be either a TDA circuit or a set of TDC and DTC to address the issue of timing difference shrinkage discussed in Section 4.2. The TDA is introduced to transmit the timings to the next layer directly in an analog manner, and the TDC and DTC are introduced to communicate intra-layers digitally. We leave the PPB implementation with high performance, such as high precision and low power, to be an open design problem in this paper.

Figure 13
Diagram with four panels illustrating a ReLU circuit logic. (a) ReLU circuit with input and output signals. (b) Graphs showing the signal relationships when tv+<tv- and tvi+>tvi-. (c) Truth table showing input and corresponding output. (d) Circuit diagram using logic gates with labeled inputs and outputs.

Figure 13. ReLU block: (a) symbol of the block; (b) timing chart of the ReLU function; (c) true table and (d) circuit implementation by simple logic gates.

We summarize the main differences of our complementary weight approach with respect to the previous similar research (Bavandpour et al., 2019a; Sahay et al., 2020), which are also partially inspired by our model (Morie et al., 2016; Tohara et al., 2016; Wang et al., 2018), as follows:

• Input and output information are encoded as the timing of step voltages, rather than using a PWM scheme.

• The response to every input step voltage in the output line is continuous until the firing threshold of the post-neuron, instead of being discrete.

• We represent signed weights using a complementary scheme, where the two sub-weights are identical, rather than using a differential scheme.

5.2 Circuits

In order to evaluate the energy consumption and computation precision of our TACT-based circuit, we designed a PoC CMOS circuit equivalent to the RC circuit to perform the one-column (i.e., N inputs and 1 output) signed weighted-sum calculation whose synapses are in a complementary scheme. The resistive elements in the synapses are replaced by pMOSFETs. The comparator function in the neuron part is implemented by an S-R latch.

We propose SRAM-based synapse circuits to implement an IMC circuit for the computation shown in Figure 14a. It consists of a 1-bit standard 6T SRAM to save the sign of a weight, a pair of pMOSFETs assigned as M1 and M2 serving as the value of the weight, and four pMOSFETs assigned as M3- M6 functioning as switches controlled by the SRAM state. M1 and M2 are identical transistors and are biased with the same gate voltage, implementing the concept of complementary dummy weight introduced in the previous chapter. They serve as current sources operating in the subthreshold saturation region, showing a high impedance. M3-M6 switch the current to the dendrite line determined by the weight's sign according to the diagram shown in Figure 12a. As a PoC circuit, we implemented the BinaryConnect NN (Courbariaux et al., 2015) by limiting all the biases of the synapse pMOSFETs to be the same.

Figure 14
Diagram composed of four parts: (a) and (b) depict electrical circuits with labeled components like transistors (M1, M2, etc.) and connections to neuron parts. (c) shows a graph plotting voltage (Vdl, Vth) versus time, illustrating jitter. (d) presents a table categorizing main ANIs (Analog Noise Injectors) such as input pulse jitter and transistor NL into stochastic and deterministic categories.

Figure 14. SRAM based synpase circuit and the main analog nonidealities (ANIs): (a) SRAM based synpase circuit; (b) Main ANIs of the synapse circuit (c) Output timing jitters induced by the ANIs (d) Main ANIs summary.

The main design parameters and simulation conditions are summarized in Table 1. We used the predictive technology model (PTM) 45 nm SPICE model for the design and simulation. Both the gate length and width of the synapse pMOSFET were 0.45 μm. Based on the size of the synapse pMOSFET, we estimated the parasitic capacitance of the axon line and the dendrite line based on 65 nm SRAM-based IMC circuits (Kneip and Bol, 2021). As a result, the parasitic capacitance of the axon line per cell, denoted as Cal, is around 0.88 fF, and the parasitic capacitance of the dendrite line per cell, denoted as Cdl, is around 0.87 fF. Vgs of the synapse pMOSFET is fixed at –0.34 V so that one synapse current Is is around 11.5 nA under typical conditions. We set the typical supply voltage of the synapse array and the neuron part to be 1.1 and 0.75 V, respectively. And the threshold (VTH) of the post-neuron (i.e., the S-R latch) is around 0.4 V typically.

Table 1
www.frontiersin.org

Table 1. Simulation conditions.

With respect to the computation precision, we set the full-scale time window (Tin) to be 640 ns, and the effective number of bit (ENOB) to be 4 bit as the design target. Then the total capacitance of the dendrite line (CDL) for MAC computation can be obtained by

CDL=NIsTinVTH.    (59)

CDL includes the total parasitic capacitance of the dendrite lines, NCdl, the input capacitance of the post neuron, Ci, and an extra load capacitor (Cl) which is needed under the given Tin and Is conditions.

Analog computation suffers from analog nonidealities (ANIs) (Kneip and Bol, 2021). These ANIs limit the computation precision, leading to a degradation of the inference accuracy. We sketch the main ANIs on our synapse circuit shown in Figure 14b. All these ANIs may cause the output jitters illustrated in Figure 14c resulting in computation errors. We summarize them in Figure 14d classifying them by their stochastic or deterministic nature.

Because time domain computation is sensitive to PVT variations (Seo et al., 2022) and local device mismatch is dominant than the intrinsic noise (Kneip and Bol, 2021; Gonugondla et al., 2021), we mainly considered the local mismatch and the PVT variations here.

We conducted Monte Carlo simulations to evaluate the errors induced by the local mismatches. A set of Monte Carlo simulation waveforms is shown in Figure 15a, and the standard deviations (σ) vs. the number of the inputs, N, are shown in Figure 15b. The results showed that the σ scales roughly as 1N leading to higher precision for larger N (Bavandpour et al., 2019a).

Figure 15
(a) A graph illustrating neuron voltage against time in microseconds, with input spikes and various lines representing voltage changes. (b) A line graph showing standard deviation of timing differences versus number of inputs. (c) A box plot displaying firing timings with three color-coded series: Δtv, tv-, and tv+. (d) A scatter plot of simulated versus expected timing differences with an enlarged section detailing a small fluctuation box plot. (e) A frequency histogram of simulated firing timing differences in nanoseconds, with peaks at regular intervals, each colored differently.

Figure 15. TACT-based time-domain weighted-sum simulation results: (a) operational waveforms in a Monte Carlo simulation; (b) Monte Carlo simulation results of the positive or negative dendrite line firing timings tυ+/tυ- and their difference Δtυ=tυ+-tυ- vs. the number of MAC's inputs (Nin); (c) comparison of the simulated PVT variations between tυ+/tυ- and Δtυ; (d) simulated weighted-sum linearily and its PVT variation in which the max peak-to-peak (Δtυpp) is 8.33 ns in the case of Nin = 50; (e) the distribution of the 50-inputs weighted-sum results (Δtυ) which is well divided into 16 levels based on the predefined 4-bit output time resolution.

Our approach is expected to have good immunity to the PVT variations. The error induced by PVT variation in both the synapse array and neuron part is common to both positive and negative dendrite lines, and thus can be canceled from the point of view of the timing difference. To verify the PVT variation tolerance, we conducted a simulation for N = 50 with process and temperature (PT) corners shown in Table 1, and changed the supply voltage of the neuron part to 0.65, 0.75, and 0.85 V. We supposed that the gate voltage of the synapse pMOSFET changed along with the supply voltage of the synapse part, and thus the Vgs of it is fixed. One MAC PVT simulation result is shown in Figure 15c. The synapse current Is changed largely across the PT variations shown in Table 1, resulting in large variation of the single dendrite line output timings tν+/tν-. However, the variation of the difference between them was much smaller. We also checked the weighted-sum linearity against the ideal 16 levels with 1LSB = 40 ns under the PVT simulations. The simulated linearity result is shown in Figure 15d, indicating good linearity with the max peak-to-peak variation of 8.33 ns. Finally, we incorporated the PVT and Monte Carlo simulation results into a distribution shown in Figure 15e, indicating that the ENOB = 4 was well achieved. We summarized the potentially achievable MAC computation precision with the number of inputs increased to 256, as shown in Table 2.

Table 2
www.frontiersin.org

Table 2. The MAC computation precision.

With respect to energy consumption, an input voltage charges up the parasitic capacitance of the axon line, Cal, and then charges the capacitance CDL via synapse pMOSFET. Therefore, total energy consumption due to the dendrite line (DL) charge and discharge, EDL, is expressed by EDL=CDLVTH2, and it was 80.59 fJ when VTH = 0.3 V and 143.27 fJ when VTH = 0.4 V. And the total energy consumption per DL due to the axon line (AL) charge and discharge, EAL, is expressed by EAL=NCalVdd2, and it was 53.24 fJ, where Vdd = 1.1 V.

As for the neuron part, which consists of an S-R latch and the output buffer, the energy consumption, ENP, was about 216.91 fJ when the supply voltage was 0.75 V, and decreased to about 76.49 fJ when the supply voltage is 0.65 V.

As a result, overall energy consumption was 210.32 fJ per MAC with N = 50 when the supply voltage of the neuron part was 0.65 V. This implies that the energy efficiency is 237.74 TOPS/W (Tera-Operations Per Second per Watt). This efficiency is comparable to the state of the art of the analog MAC macros (Seo et al., 2022; Choi et al., 2023). We summarized the potentially achievable energy efficiency with the number of inputs increased to 256, as shown in Table 3.

Table 3
www.frontiersin.org

Table 3. The energy consumption breakdown and the energy efficiency of one MAC computation.

Our purpose is to show the potential energy efficiency and computation precision of the TACT-based circuit, so we don't perform further design space exploration for optimizing the performance of the proposed circuit.

6 Discussion

We discuss some possible improvements on our PoC circuit design here.

In our design of the PoC circuit, we use a relatively long time window, i.e., 640 ns, to guarantee a moderate time resolution of 4–7 bits. The single MAC operation time is 1,300 ns, which consists of 20 ns for the neuron part reset and 640 ns for the input and output window, respectively. To improve the system latency, we can utilize massively parallel MAC operations to compensate for the relatively slow single MAC computation thanks to our simple readout circuit, which is area-efficient to make one column one readout possible, like in Khaddam-Aljameh et al. (2022) and Wan et al. (2022). At the system level, applying a pipeline scheme that uses the output timing window as the input window for subsequent computation can also be an effective approach Lim et al., (2020); Seo et al., (2022); Ambrogio et al., (2023).

We also designed a relatively large capacitance for the DL, which could degrade the area efficiency of the AIMC system. The total DL capacitance, CDL, is about 895 fF when the number of inputs is 50, as shown in Table 3. Because the wiring parasitic capacitance of the dendrite line per cell, Cdl, is around 0.87 fF, an extra load capacitor, Cl, will be about 850 fF, leading to area inefficiency in the neuron part. According to Equation 59, shortening the full-scale time window, decreasing the total integration current, or setting a higher VTH will help minimize the capacitance to improve area efficiency. Regarding the decrease of the total integration current, we can make use of sparsity-aware optimization such as weight pruning. The sparsity, which we refer to as the ratio of zero weights to total weights, is typically 20%–50% Sze et al., (2017); Deng et al., (2020). Suppose the sparsity is 40%, then CDL will be about 540 fF. To further minimize neuron part area overhead, we can also implement capacitors using a multi-layer metal-oxide-metal (MoM) structure lying on top of transistors in the synapse cell Valavi et al., (2019); Seo et al., (2022). Typically, the capacitance is 1–3 fF per cell area. By this means, Cl can be optimized to about 340 fF and such capacitance can be efficiently implemented by a MOSCAP Bavandpour et al., (2019a).

Shortening the full-scale time window also helps minimize CDL, but it will lead to a degradation of the computation precision. Because it involves improving the system latency and lowering the energy consumption of the neuron part, we are interested in estimating the results. When the number of inputs is increased to 256, the ENOB can be up to 5.4 bits, as shown in Table 2. If we keep the ENOB target as four bits, the full-scale time window can be shortened to about 250 ns. CDL will be decreased to about 1,100 fF, and Cl can be minimized to a level under 100 fF, given that the weights' sparsity is 40%, Is is 11.5 nA, and VTH = 0.4 V. Accordingly, the energy consumption of the DL, EDL, and the neuron part, ENP, is optimized to about 176.6 and 29.9 fJ, respectively. This indicates an energy efficiency of 534.3 TOPS/W. We compare our work with the previous AIMC designs shown in Table 4. Our work shows a favorable performance.

Table 4
www.frontiersin.org

Table 4. Performance summary and comparison with previous AIMC designs.

When deploying DNNs to resource-constrained edge devices, trade-offs between accuracy, model size, latency, and energy efficiency need to be optimized, which is typically achieved by means of algorithm—hardware codesigns (Shuvo et al., 2022; Ngo et al., 2025).

Our future work includes the design and fabrication of a fully parallel MVM AIMC core or macro and the measurement of DNN inference accuracy, latency, energy efficiency on more realistic datasets such as CIFAR-10 and CIFAR-100. With respect to NN model optimization for the hardware, the improvement of the accuracy of the NN model without bias, discussed in Section 4.1, will also be an important effort.

7 Conclusions

We introduced a time-domain four-quadrant MAC calculation model where signed inputs are encoded using a differential pair of spikes, and signed weights are implemented through a dummy weights scheme. The output is represented by a pair of spikes, with their timing difference proportional to the MAC results, enabled by the added dummy weights. Since both inputs and outputs are encoded in a timing format, the AIMC core with this model can be seamlessly integrated with efficient DTCs and TDCs. We proposed architectures for our TACT-based MLP with the weights configured in a complementary scheme. We demonstrated a proof-of-concept (PoC) CMOS circuit equivalent to the previously proposed RC circuit, with preliminary simulation suggesting that the energy efficiency could reach hundreds of Tera Operations Per Second Per Watt (TOPS/W) and the precision could be four bit or higher.

Our proposed time-domain weighted-sum calculation model promises to be a suitable approach for intensive in-memory computing (IMC) of deep neural networks (DNNs) with moderate multi-bit inputs/outputs and weights, and avoiding or reducing the cost of ADC overhead so as to ultimately run the DNNs energy efficiently on edge devices for inference tasks.

Data availability statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

Author contributions

QW: Investigation, Software, Conceptualization, Writing – review & editing, Visualization, Validation, Writing – original draft. HT: Project administration, Resources, Writing – review & editing, Supervision. TM: Writing – review & editing, Project administration, Conceptualization, Funding acquisition, Supervision.

Funding

The author(s) declare that financial support was received for the research and/or publication of this article. This work was supported by JSPS KAKENHI Grant Nos. 22240022 and 15H01706. Part of the work was carried out under project JPNP16007 commissioned by the New Energy and Industrial Technology Development Organization (NEDO).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Gen AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Abas, A., Bystrov, A., Kinniment, D., Maevsky, O., Russell, G., Yakovlev, A., et al. (2002). Time difference amplifier. Electron. Lett. 38, 1437–1438. doi: 10.1049/el:20020961

Crossref Full Text | Google Scholar

Aguirre, F., Sebastian, A., Le Gallo, M., Song, W., Wang, T., Yang, J. J., et al. (2024). Hardware implementation of memristor-based artificial neural networks. Nat. Commun. 15:1974. doi: 10.1038/s41467-024-45670-9

PubMed Abstract | Crossref Full Text | Google Scholar

Al Maharmeh, H., Ismail, M., Alhawari, M., et al. (2024). Energy-efficient time-domain computation for edge devices: challenges and prospects. Found. Trends Integr. Circuits Syst. 3, 1–50. doi: 10.1561/3500000013

Crossref Full Text | Google Scholar

Al Maharmeh, H., Sarhan, N. J., Hung, C.-C., Ismail, M., and Alhawari, M. (2020). “Compute-in-time for deep neural network accelerators: challenges and prospects,” in 2020 IEEE 63rd International Midwest Symposium on Circuits and Systems (MWSCAS) (Springfield, MA: IEEE), 990–993. doi: 10.1109/MWSCAS48704.2020.9184470

Crossref Full Text | Google Scholar

Al Maharmeh, H., Sarhan, N. J., Ismail, M., and Alhawari, M. (2023). A 116 tops/w spatially unrolled time-domain accelerator utilizing laddered-inverter dtc for energy-efficient edge computing in 65 nm. IEEE Open J. Circuits Syst. 4, 308–323. doi: 10.1109/OJCAS.2023.3332853

Crossref Full Text | Google Scholar

Ambrogio, S., Narayanan, P., Okazaki, A., Fasoli, A., Mackin, C., Hosokawa, K., et al. (2023). An analog-AI chip for energy-efficient speech recognition and transcription. Nature 620, 768–775. doi: 10.1038/s41586-023-06337-5

PubMed Abstract | Crossref Full Text | Google Scholar

Asada, K., Nakura, T., Iizuka, T., and Ikeda, M. (2018). Time-domain approach for analog circuits in deep sub-micron LSI. IEICE Electron. Express 15:20182001. doi: 10.1587/elex.15.20182001

Crossref Full Text | Google Scholar

Bavandpour, M., Mahmoodi, M. R., and Strukov, D. B. (2019a). Energy-efficient time-domain vector-by-matrix multiplier for neurocomputing and beyond. IEEE Trans. Circuits Syst. II: Express Briefs 66, 1512–1516. doi: 10.1109/TCSII.2019.2891688

Crossref Full Text | Google Scholar

Bavandpour, M., Mahmoodi, M. R., and Strukov, D. B. (2020). Acortex: an energy-efficient multipurpose mixed-signal inference accelerator. IEEE J. Explor. Solid-State Comput. Devices Circuits 6, 98–106. doi: 10.1109/JXCDC.2020.2999581

Crossref Full Text | Google Scholar

Bavandpour, M., Sahay, S., Mahmoodi, M. R., and Strukov, D. (2019b). Efficient mixed-signal neurocomputing via successive integration and rescaling. IEEE Trans Very Large Scale Integr. Syst. 28, 823–827. doi: 10.1109/TVLSI.2019.2946516

Crossref Full Text | Google Scholar

Bavandpour, M., Sahay, S., Mahmoodi, M. R., and Strukov, D. B. (2021). 3D-acortex: an ultra-compact energy-efficient neurocomputing platform based on commercial 3D-nand flash memories. Neuromorphic Comput. Eng. 1:014001. doi: 10.1088/2634-4386/ac0775

Crossref Full Text | Google Scholar

Chen, Y., Xie, Y., Song, L., Chen, F., and Tang, T. (2020). A survey of accelerator architectures for deep neural networks. Engineering 6, 264–274. doi: 10.1016/j.eng.2020.01.007

Crossref Full Text | Google Scholar

Chen, Z., Jin, Q., Yu, Z., Wang, Y., and Yang, K. (2022). “DCT-RAM: a driver-free process-in-memory 8t sram macro with multi-bit charge-domain computation and time-domain quantization,” in 2022 IEEE Custom Integrated Circuits Conference (CICC) (Newport Beach, CA: IEEE), 1–2. doi: 10.1109/CICC53496.2022.9772826

Crossref Full Text | Google Scholar

Choi, E., Choi, I., Lukito, V., Choi, D.-H., Yi, D., Chang, I.-J., et al. (2023). “A 333tops/w logic-compatible multi-level embedded flash compute-in-memory macro with dual-slope computation,” in 2023 IEEE Custom Integrated Circuits Conference (CICC) (San Antonio, TX: IEEE), 1–2. doi: 10.1109/CICC57935.2023.10121209

Crossref Full Text | Google Scholar

Choi, J., Wang, Z., Venkataramani, S., Chuang, P. I.-J., Srinivasan, V., Gopalakrishnan, K., et al. (2018). Pact: parameterized clipping activation for quantized neural networks. arXiv [Preprint]. arXiv:1805.06085. doi: 10.48550/arXiv.1805.06085

Crossref Full Text | Google Scholar

Cireşan, D. C., Meier, U., Gambardella, L. M., and Schmidhuber, J. (2010). Deep, big, simple neural nets for handwritten digit recognition. Neural Comput. 22, 3207–3220. doi: 10.1162/NECO_a_00052

PubMed Abstract | Crossref Full Text | Google Scholar

Courbariaux, M., Bengio, Y., and David, J.-P. (2015). “Binaryconnect: training deep neural networks with binary weights during propagations,” in Advances in Neural Information Processing System, 28 (Red Hook, NY: Curran Associates).

Google Scholar

Deng, L., Li, G., Han, S., Shi, L., and Xie, Y. (2020). Model compression and hardware acceleration for neural networks: a comprehensive survey. Proc. IEEE 108, 485–532. doi: 10.1109/JPROC.2020.2976475

Crossref Full Text | Google Scholar

Fick, L., Blaauw, D., Sylvester, D., Skrzyniarz, S., Parikh, M., Fick, D., et al. (2017). “Analog in-memory subthreshold deep neural network accelerator,” in 2017 IEEE Custom Integrated Circuits Conference (CICC) (Austin, TX: IEEE), 1–4. doi: 10.1109/CICC.2017.7993629

Crossref Full Text | Google Scholar

Fick, L., Skrzyniarz, S., Parikh, M., Henry, M. B., and Fick, D. (2022). “Analog matrix processor for edge AI real-time video analytics.” in 2022 IEEE International Solid-State Circuits Conference (ISSCC), Vol. 65 (San Francisco, CA: IEEE), 260–262. doi: 10.1109/ISSCC42614.2022.9731773

Crossref Full Text | Google Scholar

Freye, F., Lou, J., Bengel, C., Menzel, S., Wiefels, S., Gemmeke, T., et al. (2022). Memristive devices for time domain compute-in-memory. IEEE J. Explor. Solid-State Comput. Devices Circuits 8, 119–127. doi: 10.1109/JXCDC.2022.3217098

Crossref Full Text | Google Scholar

Freye, F., Lou, J., Lanius, C., and Gemmeke, T. (2024). “Merits of time-domain computing for vmm-a quantitative comparison,” in 2024 25th International Symposium on Quality Electronic Design (ISQED) (San Francisco, CA: IEEE), 1–8. doi: 10.1109/ISQED60706.2024.10528682

Crossref Full Text | Google Scholar

Gonugondla, S. K., Sakr, C., Dbouk, H., and Shanbhag, N. R. (2021). Fundamental limits on energy-delay-accuracy of in-memory architectures in inference applications. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 41, 3188–3201. doi: 10.1109/TCAD.2021.3124757

Crossref Full Text | Google Scholar

Guo, X., Bayat, F. M., Bavandpour, M., Klachko, M., Mahmoodi, M., Prezioso, M., et al. (2017). “Fast, energy-efficient, robust, and reproducible mixed-signal neuromorphic classifier based on embedded nor flash memory technology,” in 2017 IEEE International Electron Devices Meeting (IEDM) (San Francisco, CA: IEEE), 6–5. doi: 10.1109/IEDM.2017.8268341

Crossref Full Text | Google Scholar

Gupta, S., Agrawal, A., Gopalakrishnan, K., and Narayanan, P. (2015). “Deep learning with limited numerical precision,” in International Conference on Machine Learning (Lille: JMLR.org), 1737–1746.

Google Scholar

Hasler, J., and Marr, B. (2013). Finding a roadmap to achieve large neuromorphic hardware systems. Front. Neurosci. 7:118. doi: 10.3389/fnins.2013.00118

PubMed Abstract | Crossref Full Text | Google Scholar

Horowitz, M. (2014). “1.1 computing's energy problem (and what we can do about it),” in 2014 IEEE international solid-state circuits conference digest of technical papers (ISSCC) (San Francisco, CA: IEEE), 10–14. doi: 10.1109/ISSCC.2014.6757323

Crossref Full Text | Google Scholar

Jia, H., Ozatay, M., Tang, Y., Valavi, H., Pathak, R., Lee, J., et al. (2021a). “15.1 a programmable neural-network inference accelerator based on scalable in-memory computing,” in 2021 IEEE International Solid-State Circuits Conference (ISSCC), Vol. 64 (San Francisco, CA: IEEE), 236–238. doi: 10.1109/ISSCC42613.2021.9365788

Crossref Full Text | Google Scholar

Jia, H., Ozatay, M., Tang, Y., Valavi, H., Pathak, R., Lee, J., et al. (2021b). Scalable and programmable neural network inference accelerator based on in-memory computing. IEEE J. Solid-State Circuits 57, 198–211. doi: 10.1109/JSSC.2021.3119018

Crossref Full Text | Google Scholar

Jiang, H., Huang, S., Li, W., and Yu, S. (2022). Enna: An efficient neural network accelerator design based on adc-free compute-in-memory subarrays. IEEE Trans. Circuits Syst. I: Regul. Papers 70, 353–363. doi: 10.1109/TCSI.2022.3208755

Crossref Full Text | Google Scholar

Jiang, Z., Yin, S., Seo, J.-S., and Seok, M. (2020). C3sram: an in-memory-computing sram macro based on robust capacitive coupling computing mechanism. IEEE J. Solid-State Circuits 55, 1888–1897. doi: 10.1109/JSSC.2020.2992886

Crossref Full Text | Google Scholar

Joshi, V., Le Gallo, M., Haefeli, S., Boybat, I., Nandakumar, S. R., Piveteau, C., et al. (2020). Accurate deep neural network inference using computational phase-change memory. Nat. Commun. 11:2473. doi: 10.1038/s41467-020-16108-9

PubMed Abstract | Crossref Full Text | Google Scholar

Khaddam-Aljameh, R., Stanisavljevic, M., Mas, J. F., Karunaratne, G., Brändli, M., Liu, F., et al. (2022). Hermes-core—a 1.59-tops/mm 2 pcm on 14-nm cmos in-memory compute core using 300-ps/lsb linearized cco-based adcs. IEEE J. Solid-State Circuits 57, 1027–1038. doi: 10.1109/JSSC.2022.3140414

Crossref Full Text | Google Scholar

Kingra, S. K., Parmar, V., Sharma, M., and Suri, M. (2022). Time-multiplexed in-memory computation scheme for mapping quantized neural networks on hybrid cmos-oxram building blocks. IEEE Trans. Nanotechnol. 21, 406–412. doi: 10.1109/TNANO.2022.3193921

Crossref Full Text | Google Scholar

Kneip, A., and Bol, D. (2021). Impact of analog non-idealities on the design space of 6t-sram current-domain dot-product operators for in-memory computing. IEEE Trans. Circuits Sys. I: Regul. Papers 68, 1931–1944. doi: 10.1109/TCSI.2021.3058510

Crossref Full Text | Google Scholar

Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing System (Red Hook, NY: Curran Associates), 25.

Google Scholar

Le Gallo, M., Hrynkevych, O., Kersting, B., Karunaratne, G., Vasilopoulos, A., Khaddam-Aljameh, R., et al. (2024). Demonstration of 4-quadrant analog in-memory matrix multiplication in a single modulation. Npj Unconv. Comput. 1:11. doi: 10.1038/s44335-024-00010-4

PubMed Abstract | Crossref Full Text | Google Scholar

Le Gallo, M., Khaddam-Aljameh, R., Stanisavljevic, M., Vasilopoulos, A., Kersting, B., Dazzi, M., et al. (2023). A 64-core mixed-signal in-memory compute chip based on phase-change memory for deep neural network inference. Nat. Electron. 6, 680–693. doi: 10.1038/s41928-023-01010-1

Crossref Full Text | Google Scholar

LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. Nature 521, 436–444. doi: 10.1038/nature14539

PubMed Abstract | Crossref Full Text | Google Scholar

LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (2002). Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324. doi: 10.1109/5.726791

Crossref Full Text | Google Scholar

Lim, J., Choi, M., Liu, B., Kang, T., Li, Z., Wang, Z., et al. (2020). “AA-ResNet: energy efficient all-analog resnet accelerator,” in 2020 IEEE 63rd International Midwest Symposium on Circuits and Systems (MWSCAS) (Springfield, MA: IEEE), 603–606. doi: 10.1109/MWSCAS48704.2020.9184587

Crossref Full Text | Google Scholar

Maass, W. (1997a). Fast sigmoidal networks via spiking neurons. Neural Comput. 9, 279–304. doi: 10.1162/neco.1997.9.2.279

PubMed Abstract | Crossref Full Text | Google Scholar

Maass, W. (1997b). Networks of spiking neurons: the third generation of neural network models. Neural Netw. 10, 1659–1671. doi: 10.1016/S0893-6080(97)00011-7

Crossref Full Text | Google Scholar

Maass, W. (1999). Computing with spiking neurons. Pulsed Neural Netw. 2, 55–85. doi: 10.7551/mitpress/5704.003.0006

Crossref Full Text | Google Scholar

Mahmoodi, M. R., and Strukov, D. (2018). “An ultra-low energy internally analog, externally digital vector-matrix multiplier based on nor flash memory technology,” in Proceedings of the 55th Annual Design Automation Conference (New York, NY: ACM), 1–6. doi: 10.1145/3195970.3195989

Crossref Full Text | Google Scholar

Marinella, M. J., Agarwal, S., Hsia, A., Richter, I., Jacobs-Gedrim, R., Niroula, J., et al. (2018). Multiscale co-design analysis of energy, latency, area, and accuracy of a reram analog neural training accelerator. IEEE J. Emerg. Sel. Top. Circuits Syst. 8, 86–101. doi: 10.1109/JETCAS.2018.2796379

Crossref Full Text | Google Scholar

McKinstry, J. L., Esser, S. K., Appuswamy, R., Bablani, D., Arthur, J. V., Yildiz, I. B., et al. (2018). Discovering low-precision networks close to full-precision networks for efficient embedded inference. arXiv [preprint] arXiv:1809.04191. doi: 10.48550/arXiv.1809.04191

Crossref Full Text | Google Scholar

Morie, T., Liang, H., Tohara, T., Tanaka, H., Igarashi, M., Samukawa, S., et al. (2016). “Spike-based time-domain weighted-sum calculation using nanodevices for low power operation,” in 2016 IEEE 16th International Conference on Nanotechnology (IEEE-NANO) (Sendai: IEEE), 390–392. doi: 10.1109/NANO.2016.7751490

Crossref Full Text | Google Scholar

Morie, T., Sun, Y., Liang, H., Igarashi, M., Huang, C.-H., Samukawa, S., et al. (2010). “A 2-dimensional si nanodisk array structure for spiking neuron models,” in Proceedings of 2010 IEEE International Symposium on Circuits and Systems (Paris: IEEE), 781–784. doi: 10.1109/ISCAS.2010.5537456

Crossref Full Text | Google Scholar

Nägele, R., Finkbeiner, J., Stadtlander, V., Grözing, M., and Berroth, M. (2023). Analog multiply-accumulate cell with multi-bit resolution for all-analog AI inference accelerators. IEEE Trans. Circuits Syst. I: Regul. Papers 70, 3509–3521. doi: 10.1109/TCSI.2023.3268728

Crossref Full Text | Google Scholar

Nair, V., and Hinton, G. E. (2010). “Rectified linear units improve restricted Boltzmann machines,” in Proceedings of the 27th International Conference on Machine Learning (ICML-10) (Madison, WI: Omnipress), 807–814.

Google Scholar

Narayanan, P., Ambrogio, S., Okazaki, A., Hosokawa, K., Tsai, H., Nomura, A., et al. (2021). Fully on-chip mac at 14 nm enabled by accurate row-wise programming of pcm-based weights and parallel vector-transport in duration-format. IEEE Trans. Electron Devices 68, 6629–6636. doi: 10.1109/TED.2021.3115993

Crossref Full Text | Google Scholar

Ngo, D., Park, H.-C., and Kang, B. (2025). Edge intelligence: a review of deep neural network inference in resource-limited environments. Electronics 14:2495. doi: 10.3390/electronics14122495

Crossref Full Text | Google Scholar

Prezioso, M., Merrikh-Bayat, F., Hoskins, B. D., Adam, G. C., Likharev, K. K., Strukov, D. B., et al. (2015). Training and operation of an integrated neuromorphic network based on metal-oxide memristors. Nature 521, 61–64. doi: 10.1038/nature14441

PubMed Abstract | Crossref Full Text | Google Scholar

Roy, K., Jaiswal, A., and Panda, P. (2019). Towards spike-based machine intelligence with neuromorphic computing. Nature 575, 607–617. doi: 10.1038/s41586-019-1677-2

PubMed Abstract | Crossref Full Text | Google Scholar

Sahay, S., Bavandpour, M., Mahmoodi, M. R., and Strukov, D. (2020). Energy-efficient moderate precision time-domain mixed-signal vector-by-matrix multiplier exploiting 1t-1r arrays. IEEE J. Explor. Solid-State Comput. Devices Circuits 6, 18–26. doi: 10.1109/JXCDC.2020.2981048

Crossref Full Text | Google Scholar

Schlottmann, C. R., and Hasler, P. E. (2011). A highly dense, low power, programmable analog vector-matrix multiplier: The FPAA implementation. IEEE J. Emer. Sel. Top. Circuits Syst. 1, 403–411. doi: 10.1109/JETCAS.2011.2165755

Crossref Full Text | Google Scholar

Sebastian, A., Le Gallo, M., Khaddam-Aljameh, R., and Eleftheriou, E. (2020). Memory devices and applications for in-memory computing. Nat. Nanotechnol. 15, 529–544. doi: 10.1038/s41565-020-0655-z

PubMed Abstract | Crossref Full Text | Google Scholar

Seo, J.-O., Seok, M., and Cho, S. (2022). “Archon: a 332.7 tops/w 5b variation-tolerant analog cnn processor featuring analog neuronal computation unit and analog memory,” in 2022 IEEE International Solid-State Circuits Conference (ISSCC), Volume 65 (San Francisco, CA: IEEE), 258–260. doi: 10.1109/ISSCC42614.2022.9731654

Crossref Full Text | Google Scholar

Shafiee, A., Nag, A., Muralimanohar, N., Balasubramonian, R., Strachan, J. P., Hu, M., et al. (2016). ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ACM SIGARCH Comput. Archit. News 44, 14–26. doi: 10.1145/3007787.3001139

Crossref Full Text | Google Scholar

Shukla, S., Fleischer, B., Ziegler, M., Silberman, J., Oh, J., Srinivasan, V., et al. (2019). A scalable multi-teraops core for AI training and inference. IEEE Solid-State Circuits Lett. 1, 217–220. doi: 10.1109/LSSC.2019.2902738

Crossref Full Text | Google Scholar

Shuvo, M. M. H., Islam, S. K., Cheng, J., and Morshed, B. I. (2022). Efficient acceleration of deep learning inference on resource-constrained edge devices: a review. Proc. IEEE 111, 42–91. doi: 10.1109/JPROC.2022.3226481

Crossref Full Text | Google Scholar

Sze, V., Chen, Y.-H., Yang, T.-J., and Emer, J. S. (2017). Efficient processing of deep neural networks: a tutorial and survey. Proc. IEEE 105, 2295–2329. doi: 10.1109/JPROC.2017.2761740

Crossref Full Text | Google Scholar

Tohara, T., Liang, H., Tanaka, H., Igarashi, M., Samukawa, S., Endo, K., et al. (2016). Silicon nanodisk array with a fin field-effect transistor for time-domain weighted sum calculation toward massively parallel spiking neural networks. Appl. Phys. Express 9:034201. doi: 10.7567/APEX.9.034201

Crossref Full Text | Google Scholar

Tsai, H., Ambrogio, S., Narayanan, P., Shelby, R. M., and Burr, G. W. (2018). Recent progress in analog memory-based accelerators for deep learning. J. Phys. D Appl. Phys. 51:283001. doi: 10.1088/1361-6463/aac8a5

Crossref Full Text | Google Scholar

Valavi, H., Ramadge, P. J., Nestler, E., and Verma, N. (2019). A 64-tile 2.4-mb in-memory-computing cnn accelerator employing charge-domain compute. IEEE J. Solid-State Circuits 54, 1789–1799. doi: 10.1109/JSSC.2019.2899730

Crossref Full Text | Google Scholar

Verma, N., Jia, H., Valavi, H., Tang, Y., Ozatay, M., Chen, L.-Y., et al. (2019). In-memory computing: advances and prospects. IEEE Solid-State Circuits Mag. 11, 43–55. doi: 10.1109/MSSC.2019.2922889

Crossref Full Text | Google Scholar

Wan, W., Kubendran, R., Schaefer, C., Eryilmaz, S. B., Zhang, W., Wu, D., et al. (2022). A compute-in-memory chip based on resistive random-access memory. Nature 608, 504–512. doi: 10.1038/s41586-022-04992-8

PubMed Abstract | Crossref Full Text | Google Scholar

Wang, L., Ye, W., Dou, C., Si, X., Xu, X., Liu, J., et al. (2021). Efficient and robust nonvolatile computing-in-memory based on voltage division in 2t2r rram with input-dependent sensing control. IEEE Trans. Circuits Syst. II: Express Briefs 68, 1640–1644. doi: 10.1109/TCSII.2021.3067385

Crossref Full Text | Google Scholar

Wang, Q., Tamukoh, H., and Morie, T. (2016). “Time-domain weighted-sum calculation for ultimately low power vlsi neural networks,” in International Conference on Neural Information Processing (Cham: Springer), 240–247. doi: 10.1007/978-3-319-46687-3_26

Crossref Full Text | Google Scholar

Wang, Q., Tamukoh, H., and Morie, T. (2018). A time-domain analog weighted-sum calculation model for extremely low power vlsi implementation of multi-layer neural networks. arXiv [preprint]. arXiv:1810.06819.doi: 10.48550/arXiv:1810.06819

Crossref Full Text | Google Scholar

Wang, S., Zhou, T., and Bilmes, J. (2019). “Bias also matters: bias attribution for deep neural network explanation,” in International Conference on Machine Learning (Long Beach, CA), 6659–6667.

Google Scholar

Wu, P.-C., Su, J.-W., Chung, Y.-L., Hong, L.-Y., Ren, J.-S., Chang, F.-C., et al. (2022). “A 28nm 1mb time-domain computing-in-memory 6t-sram macro with a 6.6 ns latency, 1241gops and 37.01 tops/w for 8b-mac operations for edge-AI devices,” in 2022 IEEE International Solid-State Circuits Conference (ISSCC), Volume 65 (San Francisco, CA: IEEE), 1–3. doi: 10.1109/ISSCC42614.2022.9731681

Crossref Full Text | Google Scholar

Xiao, T. P., Feinberg, B., Bennett, C. H., Prabhakar, V., Saxena, P., Agrawal, V., et al. (2023). On the accuracy of analog neural network inference accelerators. IEEE Circuits Syst. Mag. 22, 26–48. doi: 10.1109/MCAS.2022.3214409

Crossref Full Text | Google Scholar

Yamaguchi, M., Iwamoto, G., Nishimura, Y., Tamukoh, H., and Morie, T. (2020). An energy-efficient time-domain analog cmos binaryconnect neural network processor based on a pulse-width modulation approach. IEEE Access 9, 2644–2654. doi: 10.1109/ACCESS.2020.3047619

Crossref Full Text | Google Scholar

Yang, J., Kong, Y., Wang, Z., Liu, Y., Wang, B., Yin, S., et al. (2019). “24.4 sandwich-ram: an energy-efficient in-memory bwn architecture with pulse-width modulation,” in 2019 IEEE International Solid-State Circuits Conference-(ISSCC) (San Francisco, CA: IEEE), 394–396. doi: 10.1109/ISSCC.2019.8662435

Crossref Full Text | Google Scholar

Zhang, M., Wang, J., Wu, J., Belatreche, A., Amornpaisannon, B., Zhang, Z., et al. (2021). Rectified linear postsynaptic potential function for backpropagation in deep spiking neural networks. IEEE Trans. Neural Netw. Learn. Syst. 33, 1947–1958. doi: 10.1109/TNNLS.2021.3110991

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: time-domain analog computing, weighted sum, spike-based computing, deep neural networks, multi-layer perceptron, artificial intelligence hardware, AI processor, matrix-vector multiplication

Citation: Wang Q, Tamukoh H and Morie T (2025) Spike-based time-domain analog weighted-sum calculation model for extremely low power VLSI implementation of multi-layer neural networks. Front. Neurosci. 19:1656892. doi: 10.3389/fnins.2025.1656892

Received: 30 June 2025; Accepted: 14 August 2025;
Published: 12 September 2025.

Edited by:

Thomas Martin McGinnity, Ulster University, United Kingdom

Reviewed by:

A. N. M. Nafiul Islam, The Pennsylvania State University (PSU), United States
Zihao Xuan, The Hong Kong University of Science and Technology, Hong Kong SAR, China

Copyright © 2025 Wang, Tamukoh and Morie. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Takashi Morie, bW9yaWVAYnJhaW4ua3l1dGVjaC5hYy5qcA==

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.