Sharing leaky-integrate-and-fire neurons for memory-efficient spiking neural networks

Spiking Neural Networks (SNNs) have gained increasing attention as energy-efficient neural networks owing to their binary and asynchronous computation. However, their non-linear activation, that is Leaky-Integrate-and-Fire (LIF) neuron, requires additional memory to store a membrane voltage to capture the temporal dynamics of spikes. Although the required memory cost for LIF neurons significantly increases as the input dimension goes larger, a technique to reduce memory for LIF neurons has not been explored so far. To address this, we propose a simple and effective solution, EfficientLIF-Net, which shares the LIF neurons across different layers and channels. Our EfficientLIF-Net achieves comparable accuracy with the standard SNNs while bringing up to ~4.3× forward memory efficiency and ~21.9× backward memory efficiency for LIF neurons. We conduct experiments on various datasets including CIFAR10, CIFAR100, TinyImageNet, ImageNet-100, and N-Caltech101. Furthermore, we show that our approach also offers advantages on Human Activity Recognition (HAR) datasets, which heavily rely on temporal information. The code has been released at https://github.com/Intelligent-Computing-Lab-Yale/EfficientLIF-Net.


. Introduction
Spiking Neural Networks (SNNs) have gained significant attention as a promising candidate for low-power machine intelligence (Wu et al., 2018(Wu et al., , 2019Roy et al., 2019;Fang et al., 2021a;Kundu et al., 2021;Christensen et al., 2022). By mimicking biological neuronal mechanisms, Leaky-Integrate-and-Fire (LIF) neurons in SNNs convey visual information with temporal binary spikes over time. The LIF neuron (Liu and Wang, 2001) considers temporal dynamics by accumulating incoming spikes inside a membrane potential, and generates output spikes when the membrane potential voltage exceeds a firing threshold. Such binary and asynchronous operation of SNNs incurs energy-efficiency benefits on lowpower neuromorphic hardware (Furber et al., 2014;Akopyan et al., 2015;Davies et al., 2018;Orchard et al., 2021).
Although SNN brings computational efficiency benefits, memory overhead caused by LIF neurons can be problematic. As shown in Figure 1, LIF neurons require additional memory for storing the membrane potential value which changes over time. This is not the case for the traditional Artificial Neural Networks (ANNs) where most non-linear activation functions are parameter-free (e.g.ReLU, Sigmoid). At the same time, LIF neurons occupy a large portion of memory with the high-resolution input image (Figure 1). For instance, the LIF memory takes 53% of memory overhead in the case of ResNet19 (He et al., 2016) with a .
/fnins. . 224×224 image. This analysis assumes 32-bit weight parameters, 1bit spike activation, and 32-bit allocation for membrane potential. A more comprehensive analysis is provided in the "Memory Cost Break" subsection of Section 5.3. Unfortunately, the LIF memory overhead has been overlooked so far in SNN studies.
To address this, we propose EfficientLIF-Net where we share the LIF neurons across different layers and channels. By sharing the memory, we do not need to assign separate memory for each layer and channel. For layer-wise sharing, we use common LIF neurons across layers having the same activation size, such as layers in one ResNet block (He et al., 2016). For channel-wise sharing, we equally divide the LIF neurons into multiple groups through the channel dimension and share common LIF neurons across different groups. Surprisingly, our EfficientLIF-Net provides similar performance as the standard SNN models where each layer and channel has independent LIF neurons. We show the gradient can successfully flow back through all layers, thus the weight can be trained to consider the temporal dynamics of spike information.
Furthermore, the proposed EfficientLIF-Net brings huge benefits to saving memory costs during training. Spatio-temporal operation inside SNNs incurs a huge computational graph for computing backward gradients. Each LIF neuron needs to store membrane potential to make gradients flow back, where the training memory increases as the SNN goes deeper and uses larger timesteps. This huge computational graph often is difficult to be trained on the limited GPU memory Singh et al., 2022;Yin et al., 2022). In this context, since our architecture shares the membrane potential across all layers, we can compute each layer's membrane potential from the next layer's membrane potential real-time during backward step. This enables us to perform backpropagation without the need for storing/caching the membrane potentials of all layers in memory (from the forward step).
Our contributions can be summarized as follows: • We pose the memory overhead problem of LIF neurons in SNNs, where the memory cost significantly increases as the image size goes larger.
• To address this, we propose a simple and effective architecture, EfficientLIF-Net where we share the LIF neurons across different layers and channels. • EfficientLIF-Net also reduces memory cost during training by computing each layer's (channel's) membrane potential from the next layer's (channel's) membrane potential realtime during backward step, drastically reducing the caching of membrane potentials. • We conduct experiments on five public datasets, validating EfficientLIF-Net can achieve comparable performance as the standard SNNs while bringing up to ∼4.3× forward memory efficiency and up to ∼21.9× backward memory efficiency for LIF neurons. • We also observe that the LIF memory cost problem exists in pruned SNNs and in fact the LIF memory overhead percentage goes higher when the weight sparsity goes higher. Our EfficientLIF-Net successfully reduces the LIF memory cost to ∼23% in pruned SNNs while achieving iso-accuracy compared to the pruned baseline.
. Related work . . Spiking neural networks Different from the standard Artificial Neural Networks (ANNs), Spiking Neural Networks (SNNs) convey temporal spikes Christensen et al., 2022). Here, Leaky-Integrateand-Fire (LIF) neuron plays an important role as the non-linear activation. The LIF neurons have a "memory" called membrane potential, where the incoming spikes are accumulated. Output spikes are generated if the membrane potential exceeds a firing threshold, then the membrane potential resets to zero. This firing operation of LIF neurons is non-differentiable, so the previous SNN literature has focused on resolving the gradient problem. A widely-used training technique is converting pre-trained ANNs to SNNs using weight or threshold balancing (Diehl et al., 2015;Rueckauer et al., 2017;Sengupta et al., 2019;Han et al., 2020;Li et al., 2021a). However, such methods require large number of . /fnins. . timesteps to emulate float activation using binary spikes. Recently, a line of works propose to circumvent the non-differentiable backpropagation problem by defining a surrogate function (Lee et al., 2016(Lee et al., , 2020Shrestha and Orchard, 2018;Wu et al., 2018Wu et al., , 2020Wu et al., , 2021Neftci et al., 2019;Li et al., 2021b;Kim et al., 2022a). As the weight is trained to consider temporal dynamics, they show both high performance and short latency. Recent studies have expanded our understanding of Spiking Neural Networks (SNNs) and proposed novel approaches to overcome some of their inherent challenges. Hao et al. (2023) explores the issue of unevenness error in the conversion of ANNs to SNNs and introduces an optimization strategy based on residual membrane potential to mitigate this error, achieving state-of-the-art performance on several datasets. Li and Zeng (2022)

. . Compression methods for e cient SNNs
Due to the energy-efficiency benefit of SNNs, they can be suitably implemented on edge devices with limited memory storage (Skatchkovsky et al., 2020;Venkatesha et al., 2021;Yang et al., 2022). Therefore, a line of work has proposed various methods to reduce the memory cost for SNNs using compression techniques. Neural pruning is one of the effective methods for SNN compression. Several works (Neftci et al., 2016;Rathi et al., 2018) have proposed a post-training pruning technique using a threshold value. Unsupervised online adaptive weight pruning (Guo et al., 2020) dynamically prunes trivial weights over time. Shi et al. (2019) prune weight connections during training with a soft mask. Recently, deeper SNNs are pruned with ADMM optimization tool , gradient-based rewiring , and lottery ticket hypothesis . Meanwhile, various quantization techniques also have been proposed to compress SNNs (Datta et al., 2022;Guo et al., 2022b;Li et al., 2022a;Meng et al., 2022). Schaefer and Joshi (2020) propose integer fixed-point representations for neural dynamics, weights, loss, and gradients. The recent work (Chowdhury et al., 2021a) performs quantization through temporal dimension for low-latency SNNs. Lui and Neftci propose a quantization technique based on the Hessian of weights (Lui and Neftci, 2021). Nonetheless, no prior work has explicitly addressed the memory overhead caused by LIF neurons. Our method effectively reduces memory overhead by modifying the architecture, and is orthogonal to previous methods. Thus, combining EfficientLIF-Net with compression techniques will further compound the benefits.

. Preliminaries
. . Leaky integrate-and-fire neuron In our paper, we mainly address the memory cost from a Leaky-Integrate-and-Fire (LIF) neuron, which is widely adopted in SNN works (Wu et al., 2018Lee et al., 2020;Fang et al., 2021a,b;Li et al., 2021a,b;Kim et al., 2022a). Suppose LIF neurons in l-th layer have membrane potential U t l at timestep t, we can formulate LIF neuron dynamics as: where W l is weight parameters in layer l, O t l−1 represents the spikes from the previous layer, λ is a decaying factor in the membrane potential. Note, we use uppercase letters for matrix notation. The LIF neuron generates an output spike O t l when the membrane potential exceeds the firing threshold θ . Here, we define the spike firing function as: After firing, the membrane potential can be reset to zero (i.e.hard reset), or reduced by the threshold value (i.e.soft reset). Thus, a LIF neuron always stores the membrane potential to capture the temporal information of spikes. The memory cost for LIF neurons is proportional to the input image dimension, which poses a huge memory overhead for high-resolution data such as ImageNet (Deng et al., 2009).

. . Gradient backpropagation in SNNs
For the class probability prediction, we accumulate the finallayer activation across all timesteps, followed by the Softmax function. We apply cross-entropy loss L for training the weights parameters. The backward gradients are calculated in both spatial and time axis (Wu et al., 2018;Neftci et al., 2019) according to the chain rule: Here, the gradient of output spikes with respect to the membrane potential   previous work (Fang et al., 2021a), we use arctan() to approximate gradients, i.e.we use an approximate function f (x) = 1 π arctan(π x) + 1 2 for computing gradients of The overall computational graph is illustrated in Figure 3A. .

Methodology: E cientLIF-Net
In this section, we first describe the details of how we reduce the memory cost of LIF neurons across layers and channels. The overall concept of EfficientLIF-Net is illustrated in Figure 2. After that, we provide the analysis of the backward gradient in EfficientLIF-Net for training, which shows our EfficientLIF-Net successfully considers the entire time horizon. Finally, we show the memory advantage of our EfficientLIF-Net during backpropagation.

. . Sharing memory of LIF neurons . . . Cross-layer sharing
The key idea here is sharing the LIF neurons across different layers where they have the same output activation size. Thus, LIF neurons are shared across multiple subsequent layers before the layer increases channel size or reduces spatial resolution. Such architecture design can be easily observed in CNN architectures such as ResNet (He et al., 2016).
Let's assume the networks have the same activation size from the l-th layer to the (l + m)-th layer. The membrane potential of the (l + 1)-th layer is calculated by adding the previous layer's membrane potential and weighted spike output from the previous layer: Here the previous layer's membrane potential U t l decreases its value by the threshold for soft reset (firing threshold is set to 1) .
after it generates spikes O t l . After that, decay factor λ is applied to the previous layer's membrane potential, since we aim to dilute the previous layers' information as networks go deeper. The layer (l+1) generates output spike following Eq. 2: In the same timestep, the spike information goes through all layers (from l-th layer to l +m-th layer) with Eqs. 4 and 5 dynamics. Then, the membrane potential of layer l + m is shared with layer l at the next timestep (purple arrow in Figure 3B).
where the soft reset and decaying is applied to U t l+m , and the weighted input comes from layer l − 1.
Overall, we require only one-layer LIF memory for layer l ∼ layer (l + m) computation, which is shared across all layers and timesteps. Thus, LIF memory of layers l ∼ (l + m) can be reduced by 1 m . The overall computational graph is illustrated in Figure 3B.

. . . Cross-channel sharing
We also explore the neuron sharing scheme in the channel dimension. Let X l be the weighted input spike, i.e. X l = W l O t l−1 , then we first divide the weighted input spike tensor into N groups in channel axis.
Suppose X t l ∈ R C×H×W , then the spike of each group can be represented as X t,(i) l ∈ R C N ×H×W , i ∈ {1, 2, ..., N}. Here, C, H, W represent the size of channel, height, and width, respectively. Then, the LIF neurons can be sequentially shared across different groups (i.e.different channels) of weighted input spike. The membrane potential of (i + 1)-th group at layer l can be formulated as: where U t,(i) l is the membrane potential of the previous group, and X t,(i+1) l is the incoming weighted spike input of the (i + 1)-th group from the previous layer. Here, soft reset and decaying also applied. The output spikes of each group are generated by standard firing dynamics (Eq. 2): We concatenate the output spikes of each groups through channels in order to compute the output at timestep t: After completing the LIF sharing in timestep t, we share the last group's (i.e.group N) membrane potential to the first group in the next timestep t + 1.
By using cross-channel sharing, the memory cost for LIF neuron of one layer can be reduced by 1 N , where N is the number of groups. Thus, memory-efficiency will increase as we use larger group number.

. . . Cross-layer & channel sharing
The cross-layer and cross-channel sharing methods are complementary to each other, therefore they can be used together to bring further memory efficiency. The LIF neurons are shared across channels and layers as shown in Figure 2D. The neuron-sharing mechanism can be obtained by combining cross-layer and cross-channel sharing methods.
Let's assume the networks have the same activation size from the l-th layer to the (l + m)-th layer. The sharing mechanism in one layer is the same as channel sharing (Eq. 7 ∼ 9).
Thus, the output spikes of each group through channels in order to compute the output at timestep t: After completing the LIF sharing at layer l, we share the last group's (i.e.group N) membrane potential of l-th layer to the first group of l + 1-th layer.
Here, X t l stands for the weighted input spike, i.e. X t l = W l O t l−1 . In the same timestep, the spike information goes through all layers (from l-th layer to l + m-th layer) dynamics. Then, the last group's (i.e.group N) membrane potential of layer l + m is shared with the first group of layer l at the next timestep.
By using cross-channel sharing, the memory cost of LIF neuron for layer l ∼ layer (l+m) computation can be reduced by 1 mN , where N is the number of groups. Our experimental results show that although we combine two sharing methods, we still get iso-accuracy as the standard SNNs.

. . Gradient analysis
Sharing LIF neurons leads to different gradient paths compared to standard SNNs. Therefore, we provide the gradient analysis for EfficientLIF-Net.

. . . Gradient of cross-layer sharing
Suppose that we compute the gradients for m subsequent layers where they have the same activation size. For simplicity, we call these m subsequent layers as a "sharing block". The unrolled computational graph is illustrated in Figure 3B.
For the intermediate layers of the sharing block, the gradients flow back from the next layer (marked as 1 in Figure 3B), which can be formulated as: where both terms are derived by the forward dynamics in Eq. 4. For the final layer of the sharing block, . /fnins. .  the gradients flow back through both layer and temporal axis: The first term shows the gradient from the next layer (marked as 2 in Figure 3B), and the second term is from the first layer of the sharing block at the next timestep (marked as 3 in Figure 3B). The last layer of the sharing block obtains the gradients from the next timestep (marked as 3 ) which is then, propagated through the intermediate layers.
This allows the weight parameters to be trained with temporal information, achieving similar performance as the standard SNN architecture.

. . . Gradient of cross-channel sharing
Assume that we divide the channel into N groups. We define an index set G = {1, 2, ..., N}. Then, the gradients of weight parameters in layer l can be The first term represents the gradient from the next layer (marked as 1 in Figure 3C). The second term is the gradients from the next group's membrane potential except for the last group (marked as 2 in Figure 3C). The last term represents the gradients from the first group of the next timestep (marked as 3 in Figure 3C). Thus, the gradients propagate through both temporal and spatial dimension, training weight parameters to consider the temporal information.

. . Memory-e cient backpropagation
In addition to the memory efficiency in forward propagation, our EfficientLIF-Net saves memory costs during backward gradient computation. As shown in Figure 4A, the standard SNNs need to store all membrane potential to compute the gradient such as ∂U t+1 l ∂U t l in Eq. 3. However, saving the full-precision membrane potential of LIF neurons is costly.

. . . Backpropagation in cross-layer sharing
The key idea here is that the membrane potential of the previous layer can be computed from the next layer's membrane potential in a reverse way ( Figure 4B). Thus, without storing the membrane potential of the intermediate layers during forward, we can compute the backward gradient. By reorganizing Eqs. 4 and 6, we obtain the membrane potential of the previous layer or the previous timestep.

. . . Backpropagation in cross-channel sharing
In a similar way, we can also reduce memory cost through channel dimension by performing a reverse computation on the membrane potential of channel groups ( Figure 4C). Instead of storing a memory for all channels, we use a partial memory for storing the membrane potential of the last group channel of each layer. From Eq. 8 and 11, we calculate the membrane potential of the previous channel group or the previous timestep.

. . Hardware discussion
In this section, we aim to provide insights into the role that EfficientLIF-Net will play during hardware deployment.

. . . Cross-layer Sharing
Cross-layer sharing EfficientLIF-Net can largely benefit hardware implementation with reduction of memory communication. When deploying an SNN on hardware, one can either choose to process through all the layers and then repeat for all timesteps (standard) or first process through all timesteps and then proceed to the next layer [tick-batch (Narayanan et al., 2020)]. While the tick-batch can help to reduce the number of memory communication across timesteps, it requires more hardware resources. On the other hand, with a proper processing pipeline across layers, the standard way of processing SNNs will have smaller hardware resource requirement and larger throughput. And crosslayer sharing can further reduce the memory communication overhead of the standard SNN processing.
As we show in Figure 5A, instead of writing the membrane potential to the memory for every layer and every timestep, layersharing EfficientLIF-Net requires only one time of writing to memory for each shared layer for each timestep.

. . . Cross-channel Sharing
Due to the high level of parallelism and data reuse in these designs, we are focusing on examining the effects of crosschannel sharing on EfficientLIF-Net for ASIC systolic arraybased inference accelerators for SNNs (Narayanan et al., 2020;Lee et al., 2022;Yin et al., 2022). The key idea behind this group of designs is to broadcast input spikes and weights to an array of processing elements (PEs), where accumulators perform convolution operations. Each post-synaptic neuron's entire convolution operation is mapped to one dedicated PE. Once the convolution results are ready, they are sent to the LIF units inside the PE to generate the output spikes. LIF units are notorious .
/fnins. . for their high hardware overheads. This is because we need at least one buffer to hold the full precision membrane potential for each neuron. These buffers heavily contribute to the hardware cost of LIF units. Originally, all the prior designs (Narayanan et al., 2020;Lee et al., 2022;Yin et al., 2022) equipped each of the PEs with an LIF unit inside to match the design's throughput requirements. That means, for 128 PE array, we will need 128 LIF units. Even if the number of LIF units is reduced, there is no way to reduce the number of buffers required to hold the unique membrane potentials for each LIF neuron. Based on this design problem, we can instantly realize one advantage that cross-channel sharing EfficientLIF can bring in these hardware platforms. Depending on the number of crosschannel shared LIF neurons, we can have the same ratio of LIF units and buffer reduction at the hardware level, as we show in Figure 5B. For example, in the case of C#4 shared networks, we can manage to reduce the 128 LIF units in a 128 PE array (Narayanan et al., 2020;Lee et al., 2022;Yin et al., 2022) to 32. However, the shared LIF units will bring longer latency as a trade-off. In the case of C#4, originally, one cycle was needed to generate spikes from 128 post-synaptic neurons for one timestep. Now, we will need 4 cycles instead. However, the major portion of latency still lies in the convolution and memory operations, which is typically hundreds of times larger than the cycles needed for generating spikes through LIF units. We provide experimental results in Section V.C to further illustrate the effects of EfficientLIF-Net on hardware.

. . Performance comparison
Across all experiment sections, EfficientLIF-Net[L] denotes the cross-layer sharing scheme, EfficientLIF-Net[C#N] stands for the cross-channel sharing scheme with N channel groups. EfficientLIF-Net[L+C#N] means the cross-layer & channel sharing method.
In Table 1, we show the memory benefit from EfficientLIF-Net. We assume a 32-bit representation for membrane potential in LIF neurons. Regarding the backward LIF memory of baseline, we consider the standard backpropagation method which stores membrane potential across entire timesteps Singh et al., 2022;Yin et al., 2022).
The experimental results show the following observations: (1) The EfficientLIF-Net based on ResNet19 achieves a similar performance compared to the baseline, which implies that the proposed membrane sharing strategy still can learn temporal information in spikes.
(2) The EfficientLIF-Net also can be applied to the DVS dataset. (3) The ResNet19 EfficientLIF-Net achieves less performance degradation compared to VGG16, which implies that skip connection improves training capability in EfficientLIF-Net. Furthermore, ResNet19 brings higher memory efficiency since it has more layers with similar sized activation. (4) As expected, a large-resolution image dataset has more benefits compared to a small-resolution image dataset. For instance, EfficientLIF-Net [L+C#2] saves 108.72 MB and 672.24 MB for forward and backward path, respectively, on ImageNet-100 which consists of 224 × 224 resolution images, on the other hand, the same architecture   Note that the approaches to memory reduction proposed by other works, such as those reducing simulation time step Chowdhury et al. (2021b) and reducing SNN time dependence Meng et al. (2023), can be combined with our layer/channel-wise sharing technique. This would lead to an even more significant decrease in memory usage, demonstrating the compatibility and potential of our method when integrated with other optimization strategies. In our method section, we showed that the backward gradients of each method are different. To further analyze this, we investigate whether the trained weight parameters can be compatible with other architectures. We expect that the transferred weights to different architectures may show performance degradation since each architecture has different training dynamics (e.g.gradient path). To this end, we train standard ResNet19-SNN (i.e.baseline), EfficientLIF-Net [L], EfficientLIF-Net [C#2], and EfficientLIF-Net [L+C#2], In Figure 6, we report the accuracy of various weights-architecture configurations on CIFAR10 and TinyImageNet. We observe the following points: (1) As we expected, transferring weights to a different architecture brings performance degradation. This supports our statement that each architecture has different training dynamics.
(2) Especially, baseline shows a huge performance drop as compared to other architectures. Thus, EfficientLIF-Net needs to be trained from scratch with gradient-based training.

. . . Ablation studies on #group
In the cross-layer sharing scheme, we can further reduce LIF memory cost by increasing #group. Table 2 shows the accuracy and .
/fnins. . LIF memory cost with respect to #group. Interestingly, EfficientLIF-Net with high #group almost maintains the performance while minimizing the LIF memory cost significantly. For example, on the ImageNet-100 dataset, EfficientLIF-Net [C#8] incurs only 0.8% accuracy drop with 75% higher memory saving. Thus, one can further reduce LIF memory cost by increasing #group based on the hardware requirements. We hypothesize that the observed decrease in performance could be attributed to the mixing of information across channels during sharing. It is a widely recognized phenomenon in the field of neural networks that the preservation of discriminative representation across channels is crucial for optimal performance. However, when we share membrane potential across channels, subsequent groups may be influenced by information from prior groups due to the sequential nature of this sharing process. While we have suggested a potential cause, we aim to delve deeper into this issue in our future research.

. . . Combining with group convolution
To further enhance the efficiency in cross-channel sharing, we explore the feasibility of combining a group convolution layer with cross-layer sharing. Since group convolution splits input channels and output channels into multiple groups, they can be applied to each channel spike (O t,(i) l in Eq. 10). In Table 3, we observe the accuracy does not show a huge drop with two convolution groups. However, as the number of groups increases, the performance goes down drastically due to lesser number of parameters available for training convergence.

. . . Soft reset vs. hard reset
We also conduct experiments on the reset scheme in our EfficientLIF-Net. The membrane potential can be reset to zero (i.e.hard reset), or decreased by the threshold value (i.e.soft reset).
In Table 4, we compare the accuracy of both reset schemes on ResNet19 architecture, where we observe the hard reset achieves similar accuracy as the soft reset. However, using the hard reset does not allow reverse computation of the previous layer's or timestep's membrane potential (Eq. 18 and 19) during backpropagation. This is because the hard reset removes the residual membrane potential which can be used in the reverse computation. Therefore, our EfficientLIF-Net is based on the soft reset such that we get memory savings both during forward and backward.
. . . Analysis on spike rate In Figure 7A, we compare the spike rate across all different LIF sharing schemes in ResNet19. We conduct experiments on four datasets. Note, a high spike rate implies the networks require larger computational cost. The experimental results show that all LIF sharing schemes have a similar spike rate as the baseline. This demonstrates that EfficientLIF-Net does not bring further computational overhead while saving memory cost by sharing the membrane potential.

FIGURE
Experiments on ResNet E cientLIF-Net with weight pruning methods on CIFAR . Left: Most LIF neurons generate output spikes although the weight sparsity increases. Therefore, the LIF memory cost cannot be reduced by weight pruning. Right: Accuracy and LIF memory cost comparison across baseline and E cientLIF-Net. The weight memory cost across all models is ∼ MB indicated with a gray dotted line.

. . . Time overhead analysis
We measured the time overhead on a V100 GPU with a batch size of 128. We used VGG16 with CIFAR10 and ImageNet-100 datasets with image sizes of 32 × 32 and 224 × 224, respectively. Table 5 shows the latency results for each method. Interestingly, we found that our method improves computation time, implying that our LIF layer-sharing method reduces the time required to access DRAM, which originally takes a significant percentage of computational time. As a result, our method can be implemented without a huge computational burden.

. . . Memory cost breakdown
In Figure 7B, we compare the memory cost breakdown between the SNN baseline and EfficientLIF-Net in both forward and backward. In the memory cost comparison, we consider memory for weight parameters (32-bit), spike activation (1-bit), and LIF neurons (32-bit). In the baseline SNN, LIF neurons take a dominant portion for both forward and backward memory cost. Especially, for backward, LIF neurons occupy around 7× larger memory than weights or activation memory. Our EfficientLIF-Net significantly reduces the LIF memory cost, resulting in less memory overhead compared to weight parameters (in both forward and backward) and activation (in backward only).

. . . E cientLIF-Net with weight pruning
As pruning for SNNs is popular due to its usage on edge devices (Neftci et al., 2016;Shi et al., 2019;Guo et al., 2020;Chen et al., 2021;Kim et al., 2022b), it is important to figure out whether the advantage from EfficientLIF-Net remains in sparse SNNs.
Before exploring the effectiveness of the LIF sharing method in sparse SNNs, we first investigate if LIF neurons still require a huge memory in sparse SNNs. This is because a number of LIF neurons might not generate output spikes in the high weight sparsity regime (≥90%), then, the memory cost for such dead neurons can be reduced. To this end, we prune the SNN model to varied sparsity using magnitude-based pruning (Han et al., 2015). Interestingly, as shown in Figure 8 Left, only ∼3% neurons do not generate spikes (i.e.dead neuron) across all sparsity levels. This implies that the LIF memory cost is still problematic in sparse SNNs. Based on the observation, we prune EfficientLIF-Net and compare the memory cost and accuracy with the standard SNN baseline. Here, we prune all architectures to have 94.94% weight sparsity. In Figure 8 Right, the baseline architecture requires 2.9 MB for LIF neurons, which is equivalent to ∼60% of the memory cost for weight parameters. With cross-layer (denoted as L in Figure 8) and cross-channel sharing (denoted as C#2 in Figure 8), we can reduce the LIF memory cost by about half compared to the baseline. Cross-layer & channel sharing (denoted as L+C#2 in Figure 8) further reduces the memory cost, which takes only ∼23% memory compared to the baseline. Overall, the results demonstrate that LIF memory reduction is not only important for high-resolution images but also for relatively low-resolution images such as CIFAR10 especially when considering pruned SNNs.

. . . Hardware evaluation
As discussed in Section 4.4, both cross-channel and crosslayer sharing can significantly enhance hardware efficiency during deployment. From the top portion of Figure 9, it is evident that cross-channel sharing in EfficientLIF-Net can considerably decrease the number of required LIF units. Specifically, our approach reduces the compute requirement of LIF units inside the PE from 61.6 to 28.6% of the total PE computation when employing C#4 cross-channel sharing.
The bottom part of Figure 9 indicates that cross-layer sharing can effectively minimize the number of DRAM accesses, which is the most energy-consuming operation during on-chip SNN inference. For single-batch scenarios, the reduction is not significant, since weight data movement dominates the DRAM accesses, as outlined in (Yin et al., 2022). However, when employing mini-batches, the reduction becomes more substantial. We note a 23 and 25% reduction in total DRAM accesses on CIFAR10 and TinyImageNet, respectively, for 64 mini-batches. This reduction trend continues to rise with larger mini-batch numbers.

. . Evaluation on human activity recognition datasets
To further validate our method on datasets that rely heavily on temporal information, we conduct experiments using Human Activity Recognition (HAR) datasets obtained from wearable devices. Descriptions of these datasets are provided below: • UCI-HAR (Anguita et al., 2013) consists of 10.3 k instances collected from 30 subjects, involving six different activities: walking, walking upstairs, walking downstairs, sitting, standing, and lying. The dataset employs sensors such as a 3-axis accelerometer and a 3-axis gyroscope (both at 50Hz) from a Samsung Galaxy SII. • HHAR (Stisen et al., 2015) is collected from nine subjects and encompasses six daily activities: biking, sitting, standing, walking, stair ascent, and stair descent. The dataset utilizes accelerometers from eight smartphones and four smartwatches (with sampling rates ranging from 50 to 200 Hz).
Following previous work, we split both datasets into 64% for the training set, 16% for the validation set, and 20% for the test set. We report test accuracy when the model achieves its best validation accuracy.
(2) Comparing the different configurations of EfficientLIF-Net to the baseline Spiking MLP, we can see that the EfficientLIF-Net maintains a similar level of accuracy as the baseline on both datasets. These results suggest that our LIF-sharing method also works well with tasks that heavily rely on temporal information. Overall, our empirical results support the observation that gradients propagate through both temporal and spatial dimensions, effectively training the weight parameters to account for temporal information, as demonstrated in Eq. 15, 16, and 17.

. Conclusion
In this paper, we highlight and tackle the problem of LIF memory cost in SNNs. This problem becomes severe as the image resolution increases. To address this, we propose EfficientLIF-Net where we share the membrane potential across layers and channels, which can effectively reduce memory usage. During backpropagation, our EfficientLIF-Net also enables reverse computation on the previous layer and channel. Therefore, we only need to store the membrane potential of the last layer/channel during forward. In our experiments, EfficientLIF-Net achieves similar performance and computational cost while significantly reducing memory cost compared to standard SNN baseline. We also found that the LIF memory problem exists in sparse-weight SNNs where even a small resolution dataset causes LIF memory overhead. The memory benefit of EfficientLIF-Net is shown in pruned SNNs, which implies our method is complementary to previous compression methods.

Data availability statement
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

Author contributions
YK and PP conceived the work. YK, YL, AM, and RY carried out experiments. YK, RY, and PP contributed to the writing of the paper. All authors contributed to the article and approved the submitted version.

Funding
This work was supported in part by CoCoSys, a JUMP2.0 center sponsored by DARPA and SRC, Google Research Scholar Award, the National Science Foundation CAREER Award, TII (Abu Dhabi), the DARPA AI Exploration (AIE) program, and the DoE MMICC center SEA-CROGS (Award #DE-SC0023198).