<?xml version="1.0" encoding="utf-8"?>
    <rss version="2.0">
      <channel xmlns:content="http://purl.org/rss/1.0/modules/content/">
        <title>Frontiers in Electronics | Integrated Circuits and VLSI section | New and Recent Articles</title>
        <link>https://www.frontiersin.org/journals/electronics/sections/integrated-circuits-and-vlsi</link>
        <description>RSS Feed for Integrated Circuits and VLSI section in the Frontiers in Electronics journal | New and Recent Articles</description>
        <language>en-us</language>
        <generator>Frontiers Feed Generator,version:1</generator>
        <pubDate>2026-05-13T15:23:41.545+00:00</pubDate>
        <ttl>60</ttl>
        <item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/felec.2026.1743265</guid>
        <link>https://www.frontiersin.org/articles/10.3389/felec.2026.1743265</link>
        <title><![CDATA[Adiabatic capacitive neuron: an energy-efficient functional unit for artificial neural networks]]></title>
        <pubdate>2026-02-24T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Sachin Maheshwari</author><author>Mike Smart</author><author>Himadri Singh Raghav</author><author>Themis Prodromakis</author><author>Alexander Serb</author>
        <description><![CDATA[This paper presents a highly energy-efficient adiabatic capacitive neuron (ACN) hardware implementation of an artificial neuron (AN), with improved energy efficiency, robustness, and scalability over previous work. A single-neuron ACN with 12 one-bit capacitive synapses is implemented in 0.18 μm CMOS technology, supporting both positive and negative synaptic weights. A novel threshold logic (TL) circuit is introduced to realize the binary AN activation function, explicitly designed to minimize input-referred offset and ensure robust decision making under dynamic adiabatic operation. The TL performance is evaluated across three process corners and five temperatures ranging from –55 °C to 125 °C. Post-layout simulations show that the proposed TL achieves a maximum rising and falling offset voltage of 9 mV, compared to 27 mV (rising) and 5 mV (falling) for a conventional TL implementation across process and temperature variations. The proposed ACN achieves over 90% total synapse energy savings (over 12× improvement) relative to an equivalent non-adiabatic CMOS capacitive neuron (CCN) over operating frequencies from 500 kHz to 100 MHz. A 1000-sample Monte Carlo analysis incorporating process variation and mismatch confirms consistent energy savings exceeding 90% in the synapse energy profile. Supply voltage scaling further demonstrates sustained energy savings above 90%, except for the all-zero input condition, without loss of functionality. These results demonstrate that adiabatic charge recovery, combined with a robust low-offset threshold logic design, enables substantial energy reduction while maintaining reliable neuron operation across wide operating conditions.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/felec.2025.1645594</guid>
        <link>https://www.frontiersin.org/articles/10.3389/felec.2025.1645594</link>
        <title><![CDATA[An overview of advanced instruments for magnetic characterization and measurements]]></title>
        <pubdate>2025-09-01T00:00:00Z</pubdate>
        <category>Review</category>
        <author>Junbiao Zhao</author><author>Ligang Bai</author><author>Shen Li</author><author>Zhiqiang Cao</author><author>Yi Peng</author><author>Jinrui Bai</author><author>Xudong Cai</author><author>Xinmin Shi</author><author>Xiaoyang Lin</author><author>Guodong Wei</author><author>Xueying Zhang</author>
        <description><![CDATA[Magnetic materials play a pivotal role in emerging fields such as new energy, information technology, and biomedicine, where accurate magnetic characterization is essential for material innovation and device engineering. Notably, with the burgeoning development of nanomaterials and spintronics, the importance of magnetic characterization has grown significantly, accompanied by increasingly higher requirements for precision and multi-dimensional analysis. This paper elaborates on the working principles and structural components of static magnetic measurement techniques—including Vibrating Sample Magnetometer (VSM), Alternating Gradient Magnetometer (AGM), Magneto-Optical Kerr Effect (MOKE) Microscope, Magnetic Force Microscope (MFM) and Superconducting Quantum Interference Device (SQUID) Magnetometer, as well as dynamic magnetic measurement techniques such as Alternating Current (AC) susceptometry and Ferromagnetic Resonance (FMR). In addition, this review also introduces emerging techniques relevant to spintronics, including Magnetometer based on negatively charged nitrogen-vacancy (NV−) centers in diamond, Spin-polarized Scanning Tunneling Microscope (SP-STM), Lorentz Transmission Electron Microscope (LTEM), and Soft X-ray-based techniques, highlighting their principles and applications in quantum sensing, magnetic imaging, and element-specific spin analysis. This overview emphasizes the unique capabilities and measurement principles of each magnetic characterization instrument, providing users with practical guidance to identify the most appropriate tool based on specific research objectives, material properties, and experimental requirements, thereby improving characterization efficiency and accuracy.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/felec.2025.1469802</guid>
        <link>https://www.frontiersin.org/articles/10.3389/felec.2025.1469802</link>
        <title><![CDATA[Quantized convolutional neural networks: a hardware perspective]]></title>
        <pubdate>2025-07-03T00:00:00Z</pubdate>
        <category>Review</category>
        <author>Li Zhang</author><author>Olga Krestinskaya</author><author>Mohammed E. Fouda</author><author>Ahmed M. Eltawil</author><author>Khaled Nabil Salama</author>
        <description><![CDATA[With the rapid development of machine learning, Deep Neural Network (DNN) exhibits superior performance in solving complex problems like computer vision and natural language processing compared with classic machine learning techniques. On the other hand, the rise of the Internet of Things (IoT) and edge computing set a demand on executing those complex tasks on corresponding devices. As the name suggested, deep neural networks are sophisticated models with complex structures and millions of parameters, which overwhelm the capacity of IoT and edge devices. To facilitate the deployment, quantization, as one of the most promising methods, is proposed to alleviate the challenge in terms of memory usage and computation complexity by quantizing both the parameters and data flow in the DNN model into formats with shorter bit-width. Consistently, dedicated hardware accelerators are developed to further boost the execution efficiency of DNN models. In this work, we focus on Convolutional Neural Network (CNN) as an example of DNNs and conduct a comprehensive survey on various quantization and quantized training methods. We also discuss various hardware accelerator designs for quantized CNN (QCNN). Based on the review of both algorithm and hardware design, we provide general software-hardware co-design considerations. Based on the analysis, we discuss open challenges and future research directions for both algorithms and corresponding hardware designs of quantized neural networks (QNNs).]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/felec.2025.1608122</guid>
        <link>https://www.frontiersin.org/articles/10.3389/felec.2025.1608122</link>
        <title><![CDATA[Interface engineering induced Dzyaloshinskii-Moriya interaction enhancement in Py/Ti/CoFeB/MgO heterostructures]]></title>
        <pubdate>2025-05-19T00:00:00Z</pubdate>
        <category>Brief Research Report</category>
        <author>Xiaoyue Song</author><author>Zhiqiang Cao</author><author>Ligang Bai</author><author>Junbiao Zhao</author><author>Xudong Cai</author><author>Dapeng Zhu</author><author>Guodong Wei</author>
        <description><![CDATA[Dzyaloshinskii-Moriya interaction (DMI) is a key driver of chiral magnetism and has garnered significant interest in applied magnetism and spintronics. Interface engineering has been demonstrated to effectively enhance the DMI in many traditional heterostructures. The regulation of DMI is highly dependent on interface properties, which vary significantly across different material systems. Therefore, determining the optimal interface structure to maximize the DMI value presents a complex challenge. In this work, Brillouin light-scattering (BLS) spectroscopy quantitatively reveals a strong interfacial DMI of 17 μJ/m2 in Py/Ti (tTi)/CoFeB/MgO heterostructures with robust perpendicular magnetic anisotropy when the thickness of the Ti layer is 2 nm. Furthermore, we employed a field-modulated magneto-optical Kerr-effect microscope (MOKE) to visualize the existence of stable labyrinth domains in real space in the Py/Ti (2 nm)/CoFeB/MgO systems, which might be able to induce further skyrmions. By optimizing the thickness of a specific membrane configuration, this paper offers a critical materials foundation for advancing spintronics applications.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/felec.2025.1567562</guid>
        <link>https://www.frontiersin.org/articles/10.3389/felec.2025.1567562</link>
        <title><![CDATA[SPIKA: an energy-efficient time-domain hybrid CMOS-RRAM compute-in-memory macro]]></title>
        <pubdate>2025-04-30T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Khaled Humood</author><author>Yihan Pan</author><author>Grahame Reynolds</author><author>Mohammed Mughal</author><author>Shiwei Wang</author><author>Alexander Serb</author><author>Themis Prodromakis</author>
        <description><![CDATA[The increasing significance of machine learning (ML) has led to the development of circuit architectures suited to handling its multiply-accumulate-heavy computational load such as Compute-In-Memory (CIM). A big class of such architectures uses resistive RAM (RRAM) devices, typically in the role of neural weights, to save power and area. In this work, we introduce SPIKA, a novel RRAM-based ML accelerator implemented in 180nm CMOS technology. The design features a 64×128 crossbar array, supports 4-bit inputs, ternary weights, and 5-bit outputs. Post-layout analysis indicates a remarkable performance of the proposed system compared to state-of-the-art with a peak throughput of 1092 GOPS and energy efficiency of 195 TOPS/W. The key innovation of SPIKA lies in its natural signal domain crossing, which eliminates the need for power-hungry data converters. Specifically, digital input signals are converted to pulse-width modulated (time-domain), then applied on the RRAM weights that convert them to analog currents, and then aggregated into digital values using a simple switch capacitor read-out system.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/felec.2025.1513127</guid>
        <link>https://www.frontiersin.org/articles/10.3389/felec.2025.1513127</link>
        <title><![CDATA[Energy-efficient analog-domain aggregator circuit for RRAM-based neural network accelerators]]></title>
        <pubdate>2025-02-04T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Khaled Humood</author><author>Yihan Pan</author><author>Shiwei Wang</author><author>Alexander Serb</author><author>Themis Prodromakis</author>
        <description><![CDATA[Recently, there has been notable progress in the advancement of RRAM-based Compute-In-Memory (CIM) architectures, showing promise in accelerating neural networks with remarkable energy efficiency and parallelism. However, challenges persist in fully integrating large-scale networks onto a chip, particularly when the weights of a layer exceed the capacity of the RRAM crossbar. In such cases, weights are distributed across smaller RRAM crossbars and aggregated using tree adders and shifters in digital flow, leading to increased system complexity and energy consumption of hardware accelerators. In this work, we introduce a novel energy-efficient analog domain aggregator system designed for RRAM-based CIM systems. The proposed circuit has been verified and tested using Virtuoso Cadence circuit tools in 180 nm CMOS technology with post-layout simulations and analysis. Compared with the digital adder tree approach, the proposed analog aggregator offers improvements in three key areas: it can handle an arbitrary number of inputs not just powers of 2, achieves lower error through better rounding and improves power efficiency (2.15× lower consumption). These findings mark a substantial advancement towards the full implementation of efficient on-chip hardware accelerator systems.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/felec.2024.1377080</guid>
        <link>https://www.frontiersin.org/articles/10.3389/felec.2024.1377080</link>
        <title><![CDATA[Compact grounded memristor model with resistorless and tunability features]]></title>
        <pubdate>2024-11-13T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Ankit Mehta</author><author>Arash Ahmadi</author><author>Majid Ahmadi</author>
        <description><![CDATA[This research article provides a circuit illustration of a grounded memristor emulator. An operational transconductance amplifier (OTA) is one of its active components, along with two transistors and one capacitor. With a simple flip of the input ports, the incremental and decremental settings for the proposed memristor may be preserved. With the capacity to function in the megahertz band, the circuit offers a resistorless and controllable feature. Using the Cadence Virtuoso EDA tool in an analog design environment (ADE), PSPICE simulation with 0.18 µm TSMC technology parameter has been used to illustrate the viability of the suggested memristor. It has been confirmed in the simulation section that the operating frequency and tunability responses in the current-voltage (I-V) plane are in reasonable agreement with the theory. The suggested memristor model’s resilience has also been tested using process corner, Monte Carlo analysis, and temperature analyses, as well as single and parallel connected structures. The suggested memristor model is simple and does not need additional sub-circuit components, making it appropriate for implementation in integrated circuits. The experimental demonstration has been carried out by making a prototype on a breadboard using ICs, which exhibits good agreement with theoretical and simulation results. Single/parallel combinations of memristor, chaotic oscillator, and high pass filter have been presented to demonstrate its application.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/felec.2024.1366299</guid>
        <link>https://www.frontiersin.org/articles/10.3389/felec.2024.1366299</link>
        <title><![CDATA[LIF neuron —a memristive realization]]></title>
        <pubdate>2024-09-06T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Khalid Alammari</author><author>Moslem Heidarpur</author><author>Majid Ahmadi</author><author>Arash Ahmadi</author>
        <description><![CDATA[This study introduces a pioneering design for leaky integrate-and-fire (LIF) neurons by integrating memristor devices with CMOS transistors, thereby forming an innovative hybrid CMOS/memristor neuron circuit. Employing Pt/TaOx/Ta as the memristor device, the proposed model was meticulously implemented and rigorously evaluated using the Cadence Virtuoso simulation environment. The simulation outcomes affirm the effective functionality of the design, marking a significant advancement in hybrid circuit engineering. Notably, the proposed neuron circuit exhibits a compact footprint, attributed to the efficient utilization of hybrid CMOS/memristor gates. This characteristic is poised to address the critical challenge of scaling in current neuromorphic systems, offering a viable pathway to substantially augment density and cater to the escalating demands of advanced computational architectures. The findings of this research hold promising implications for enhancing the efficiency and scalability of neuromorphic systems, setting a new benchmark for future developments in this domain.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/felec.2024.1409548</guid>
        <link>https://www.frontiersin.org/articles/10.3389/felec.2024.1409548</link>
        <title><![CDATA[S-Tune: SOT-MTJ manufacturing parameters tuning for securing the next generation of computing]]></title>
        <pubdate>2024-08-07T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Muhtasim Alam Chowdhury</author><author>Mousam Hossain</author><author>Christopher Mastrangelo</author><author>Ronald F. DeMara</author><author>Soheil Salehi</author>
        <description><![CDATA[Hardware-based acceleration approaches for Machine Learning (ML) workloads have been embracing the significant potential of post-CMOS switching devices to attain reduced footprint and/or energy-efficient execution relative to transistor-based GPU and/or TPU-based accelerator architectures. Meanwhile, the promulgation of fabless IC chip manufacturing paradigms has heightened the hardware security concerns inherent in such approaches. Namely, unauthorized access to various supply chain stages may expose significant vulnerabilities resulting in malfunctions including subtle adversarial outcomes via the malicious generation of differentially-corrupted outputs. Whereas the Spin-Orbit Torque Magnetic Tunnel Junction (SOT-MTJ) is a leading spintronic device for use in ML accelerators, as well as holding security tokens, their manufacturing-only security exposures are identified and evaluated herein. Results indicate a novel vulnerability profile whereby an adversary without access to the circuit netlist could differentially-influence the machine learning application’s behavior. Specifically, ML recognition outputs can be significantly swayed via a global modification of oxide thickness (Tox) resulting in bit-flips of the weights in the crossbar array, thus corrupting the recognition of selected digits in MNIST dataset differentially creating an opportunity for an adversary. With just 0.05% of bits in crossbar having a flipped resistance state, digits “4” and “5” show the highest overall error rates, and digit “9” exhibit the lowest impact, with recognition accuracy of digits “2,” “3,” and “8” unaffected by changing the oxide thickness of SOT-MTJs uniformly from 0.75 nm to 1.2 nm without modifying the netlist nor even having access to the circuit design itself. Exposures and mitigation approaches to such novel and potentially damaging manufacturing-side intrusions are identified, postulated, and quantitatively assessed.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/felec.2023.1343612</guid>
        <link>https://www.frontiersin.org/articles/10.3389/felec.2023.1343612</link>
        <title><![CDATA[Hardware acceleration of DNA pattern matching using analog resistive CAMs]]></title>
        <pubdate>2024-02-12T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Jinane Bazzi</author><author>Jana Sweidan</author><author>Mohammed E. Fouda</author><author>Rouwaida Kanj</author><author>Ahmed M. Eltawil</author>
        <description><![CDATA[DNA pattern matching is essential for many widely used bioinformatics applications. Disease diagnosis is one of these applications since analyzing changes in DNA sequences can increase our understanding of possible genetic diseases. The remarkable growth in the size of DNA datasets has resulted in challenges in discovering DNA patterns efficiently in terms of run time and power consumption. In this paper, we propose an efficient pipelined hardware accelerator that determines the chance of the occurrence of repeat-expansion diseases using DNA pattern matching. The proposed design parallelizes the DNA pattern matching task using associative memory realized with analog content-addressable memory and implements an algorithm that returns the maximum number of consecutive occurrences of a specific pattern within a DNA sequence. We fully implement all the required hardware circuits with PTM 45-nm technology, and we evaluate the proposed architecture on a practical human DNA dataset. The results show that our design is energy-efficient and accelerates the DNA pattern matching task by more than 100× compared to the approaches described in the literature.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/felec.2023.1331280</guid>
        <link>https://www.frontiersin.org/articles/10.3389/felec.2023.1331280</link>
        <title><![CDATA[Demonstration of transfer learning using 14 nm technology analog ReRAM array]]></title>
        <pubdate>2024-01-15T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Fabia Farlin Athena</author><author>Omobayode Fagbohungbe</author><author>Nanbo Gong</author><author>Malte J. Rasch</author><author>Jimmy Penaloza</author><author>SoonCheon Seo</author><author>Arthur Gasasira</author><author>Paul Solomon</author><author>Valeria Bragaglia</author><author>Steven Consiglio</author><author>Hisashi Higuchi</author><author>Chanro Park</author><author>Kevin Brew</author><author>Paul Jamison</author><author>Christopher Catano</author><author>Iqbal Saraf</author><author>Claire Silvestre</author><author>Xuefeng Liu</author><author>Babar Khan</author><author>Nikhil Jain</author><author>Steven McDermott</author><author>Rick Johnson</author><author>I. Estrada-Raygoza</author><author>Juntao Li</author><author>Tayfun Gokmen</author><author>Ning Li</author><author>Ruturaj Pujari</author><author>Fabio Carta</author><author>Hiroyuki Miyazoe</author><author>Martin M. Frank</author><author>Antonio La Porta</author><author>Devi Koty</author><author>Qingyun Yang</author><author>Robert D. Clark</author><author>Kandabara Tapily</author><author>Cory Wajda</author><author>Aelan Mosden</author><author>Jeff Shearer</author><author>Andrew Metz</author><author>Sean Teehan</author><author>Nicole Saulnier</author><author>Bert Offrein</author><author>Takaaki Tsunomura</author><author>Gert Leusink</author><author>Vijay Narayanan</author><author>Takashi Ando</author>
        <description><![CDATA[Analog memory presents a promising solution in the face of the growing demand for energy-efficient artificial intelligence (AI) at the edge. In this study, we demonstrate efficient deep neural network transfer learning utilizing hardware and algorithm co-optimization in an analog resistive random-access memory (ReRAM) array. For the first time, we illustrate that in open-loop deep neural network (DNN) transfer learning for image classification tasks, convergence rates can be accelerated by approximately 3.5 times through the utilization of co-optimized analog ReRAM hardware and the hardware-aware Tiki-Taka v2 (TTv2) algorithm. A simulation based on statistical 14 nm CMOS ReRAM array data provides insights into the performance of transfer learning on larger network workloads, exhibiting notable improvement over conventional training with random initialization. This study shows that analog DNN transfer learning using an optimized ReRAM array can achieve faster convergence with a smaller dataset compared to training from scratch, thus augmenting AI capability at the edge.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/felec.2023.1129675</guid>
        <link>https://www.frontiersin.org/articles/10.3389/felec.2023.1129675</link>
        <title><![CDATA[Straightforward data transfer in a blockwise dataflow for an analog RRAM-based CIM system]]></title>
        <pubdate>2023-04-17T00:00:00Z</pubdate>
        <category>Methods</category>
        <author>Yuyi Liu</author><author>Bin Gao</author><author>Peng Yao</author><author>Qi Liu</author><author>Qingtian Zhang</author><author>Dong Wu</author><author>Jianshi Tang</author><author>He Qian</author><author>Huaqiang Wu</author>
        <description><![CDATA[Analog resistive random-access memory (RRAM)-based computation-in-memory (CIM) technology is promising for constructing artificial intelligence (AI) with high energy efficiency and excellent scalability. However, the large overhead of analog-to-digital converters (ADCs) is a key limitation. In this work, we propose a novel LINKAGE architecture that eliminates PE-level ADCs and leverages an analog data transfer module to implement inter-array data processing. A blockwise dataflow is further proposed to accelerate convolutional neural networks (CNNs) to speed up compute-intensive layers and solve the unbalanced pipeline problem. To obtain accurate and reliable benchmark results, key component modules, such as straightforward link (SFL) modules and Tile-level ADCs, are designed in standard 28 nm CMOS technology. The evaluation shows that LINKAGE outperforms the conventional ADC/DAC-based architecture with a 2.07×∼11.22× improvement in throughput, 2.45×∼7.00× in energy efficiency, and 22%–51% reduction in the area overhead while maintaining accuracy. Our LINKAGE architecture can achieve 22.9∼24.4 TOPS/W energy efficiency (4b-IN/4b-W) and 1.82 ∼4.53 TOPS throughput with the blockwise method. This work demonstrates a new method for significantly improving the energy efficiency of CIM chips, which can be applied to general CNNs/FCNNs.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/felec.2022.1091369</guid>
        <link>https://www.frontiersin.org/articles/10.3389/felec.2022.1091369</link>
        <title><![CDATA[CoFHE: Software and hardware Co-design for FHE-based machine learning as a service]]></title>
        <pubdate>2023-01-12T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Mengxin Zheng</author><author>Lei Ju</author><author>Lei Jiang</author>
        <description><![CDATA[Introduction: Privacy concerns arise whenever sensitive data is outsourced to untrusted Machine Learning as a Service (MLaaS) platforms. Fully Homomorphic Encryption (FHE) emerges one of the most promising solutions to implementing privacy-preserving MLaaS. But prior FHE-based MLaaS faces challenges from both software and hardware perspectives. First, FHE can be implemented by various schemes including BGV, BFV, and CKKS, which are good at different FHE operations, e.g., additions, multiplications, and rotations. Different neural network architectures require different numbers of FHE operations, thereby preferring different FHE schemes. However, state-of-the-art MLaaS just naïvely chooses one FHE scheme to build FHE-based neural networks without considering other FHE schemes. Second, state-of-the-art MLaaS uses power-hungry hardware accelerators to process FHE-based inferences. Typically, prior high-performance FHE accelerators consume >160 Watt, due to their huge capacity (e.g., 512 MB) on-chip SRAM scratchpad memories.Methods: In this paper, we propose a software and hardware co-designed FHE-based MLaaS framework, CoFHE. From the software perspective, we propose an FHE compiler to select the best FHE scheme for a network architecture. We also build a low-power and high-density NAND-SPIN and SRAM hybrid scratchpad memory system for FHE hardware accelerators.Results: On average, under the same security and accuracy constraints, on average, CoFHE accelerates various FHE-based inferences by 18%, and reduces the energy consumption of various FHE-based inferences by 26%.Discussion: CoFHE greatly improves the latency and energy efficiency of FHE-based MLaaS.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/felec.2022.1032485</guid>
        <link>https://www.frontiersin.org/articles/10.3389/felec.2022.1032485</link>
        <title><![CDATA[XMA2: A crossbar-aware multi-task adaption framework via 2-tier masks]]></title>
        <pubdate>2022-12-20T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Fan Zhang</author><author>Li Yang</author><author>Jian Meng</author><author>Jae-sun Seo</author><author>Yu Cao</author><author>Deliang Fan</author>
        <description><![CDATA[Recently, ReRAM crossbar-based deep neural network (DNN) accelerator has been widely investigated. However, most prior works focus on single-task inference due to the high energy consumption of weight reprogramming and ReRAM cells’ low endurance issue. Adapting the ReRAM crossbar-based DNN accelerator for multiple tasks has not been fully explored. In this study, we propose XMA2, a novel crossbar-aware learning method with a 2-tier masking technique to efficiently adapt a DNN backbone model deployed in the ReRAM crossbar for new task learning. During the XMA2-based multi-task adaption (MTA), the tier-1 ReRAM crossbar-based processing-element- (PE-) wise mask is first learned to identify the most critical PEs to be reprogrammed for essential new features of the new task. Subsequently, the tier-2 crossbar column-wise mask is applied within the rest of the weight-frozen PEs to learn a hardware-friendly and column-wise scaling factor for new task learning without modifying the weight values. With such crossbar-aware design innovations, we could implement the required masking operation in an existing crossbar-based convolution engine with minimal hardware/memory overhead to adapt to a new task. The extensive experimental results show that compared with other state-of-the-art multiple-task adaption methods, XMA2 achieves the highest accuracy on all popular multi-task learning datasets.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/felec.2022.877629</guid>
        <link>https://www.frontiersin.org/articles/10.3389/felec.2022.877629</link>
        <title><![CDATA[Energy-efficient neural network design using memristive MAC unit]]></title>
        <pubdate>2022-09-26T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Shengqi Yu</author><author>Thanasin Bunnam</author><author>Sirichai Triamlumlerd</author><author>Manoch Pracha</author><author>Fei Xia</author><author>Rishad Shafik</author><author>Alex Yakovlev</author>
        <description><![CDATA[Artificial intelligence applications implemented with neural networks require extensive arithmetic capabilities through multiply-accumulate (MAC) units. Traditional designs based on voltage-mode circuits feature complex logic chains for such purposes as carry processing. Additionally, as a separate memory block is used (e.g., in a von Neumann architecture), data movements incur on-chip communication bottlenecks. Furthermore, conventional multipliers have both operands encoded in the same physical quantity, which is either low cost to update or low cost to hold, but not both. This may be significant for low-energy edge operations. In this paper, we propose and present a mixed-signal multiply-accumulate unit design with in-memory computing to improve both latency and energy. This design is based on a single-bit multiplication cell consisting of a number of memristors and a single transistor switch (1TxM), arranged in a crossbar structure implementing the long-multiplication algorithm. The key innovation is that one of the operands is encoded in easy to update voltage and the other is encoded in non-volatile memristor conductance. This targets operations such as machine learning which feature asymmetric requirements for operand updates. Ohm’s Law and KCL take care of the multiplication in analog. When implemented as part of a NN, the MAC unit incorporates a current to digital stage to produce multi-bit voltage-mode output, in the same format as the input. The computation latency consists of memory writing and result encoding operations, with the Ohm’s Law and KCL operations contributing negligible delay. When compared with other memristor-based multipliers, the proposed work shows an order of magnitude of latency improvement in 4-bit implementations partly because of the Ohm’s Law and KCL time savings and partly because of the short writing operations for the frequently updated operand represented by voltages. In addition, the energy consumption per multiplication cycle of the proposed work is shown to improve by 74%–99% in corner cases. To investigate the usefulness of this MAC design in machine learning applications, its input/output relationships is characterized using multi-layer perceptrons to classify the well-known hand-writing digit dataset MNIST. This case study implements a quantization-aware training and includes the non-ideal effect of our MAC unit to allow the NN to learn and preserve its high accuracy. The simulation results show the NN using the proposed MAC unit yields an accuracy of 93%, which is only 1% lower than its baseline.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/felec.2022.898273</guid>
        <link>https://www.frontiersin.org/articles/10.3389/felec.2022.898273</link>
        <title><![CDATA[AI-PiM—Extending the RISC-V processor with Processing-in-Memory functional units for AI inference at the edge of IoT]]></title>
        <pubdate>2022-08-11T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Vaibhav Verma</author><author>Mircea R. Stan</author>
        <description><![CDATA[The recent advances in Artificial Intelligence (AI) achieving “better-than-human” accuracy in a variety of tasks such as image classification and the game of Go have come at the cost of exponential increase in the size of artificial neural networks. This has lead to AI hardware solutions becoming severely memory-bound and scrambling to keep-up with the ever increasing “von Neumann bottleneck”. Processing-in-Memory (PiM) architectures offer an excellent solution to ease the von Neumann bottleneck by embedding compute capabilities inside the memory and reducing the data traffic between the memory and the processor. But PiM accelerators break the standard von Neumann programming model by fusing memory and compute operations together which impedes their integration in the standard computing stack. There is an urgent requirement for system-level solutions to take full advantage of PiM accelerators for end-to-end acceleration of AI applications. This article presents AI-PiM as a solution to bridge this research gap. AI-PiM proposes a hardware, ISA and software co-design methodology which allows integration of PiM accelerators in the RISC-V processor pipeline as functional execution units. AI-PiM also extends the RISC-V ISA with custom instructions which directly target the PiM functional units resulting in their tight integration with the processor. This tight integration is especially important for edge AI devices which need to process both AI and non-AI tasks on the same hardware due to area, power, size and cost constraints. AI-PiM ISA extensions expose the PiM hardware functionality to software programmers allowing efficient mapping of applications to the PiM hardware. AI-PiM adds support for custom ISA extensions to the complete software stack including compiler, assembler, linker, simulator and profiler to ensure programmability and evaluation with popular AI domain-specific languages and frameworks like TensorFlow, PyTorch, MXNet, Keras etc. AI-PiM improves the performance for vector-matrix multiplication (VMM) kernel by 17.63x and provides a mean speed-up of 2.74x for MLPerf Tiny benchmark compared to RV64IMC RISC-V baseline. AI-PiM also speeds-up MLPerf Tiny benchmark inference cycles by 2.45x (average) compared to state-of-the-art Arm Cortex-A72 processor.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/felec.2022.792326</guid>
        <link>https://www.frontiersin.org/articles/10.3389/felec.2022.792326</link>
        <title><![CDATA[Design and Analysis of a Resistive Sensor Interface With Phase Noise-Energy-Resolution Scalability for a Time-Based Resistance-to-Digital Converter]]></title>
        <pubdate>2022-04-25T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Dong-Hyun Seo</author><author>Baibhab Chatterjee</author><author>Sean M. Scott</author><author>Daniel J. Valentino</author><author>Dimitrios Peroulis</author><author>Shreyas Sen</author>
        <description><![CDATA[This article presents the design and analysis of a resistive sensor interface with three different designs of phase noise-energy-resolution scalability in time-based resistance-to-digital converters (RDCs), including test chip implementations and measurements, targeted toward either minimizing the energy/conversion step or maximizing bit-resolution. The implemented RDCs consist of a three-stage differential ring oscillator, which is current starved using the resistive sensor, a differential-to-single-ended amplifier, and digital modules and serial interface. The first RDC design (baseline) included the basic structure of time-based RDC and targeted low-energy/conversion step. The second RDC design (goal: higher-resolution) aimed to improve the rms jitter/phase noise of the oscillator with help of speed-up latches, to achieve high bit-resolution as compared to the first RDC design. The third RDC design (goal: process portability) reduced the power consumption by scaling the technology with the improved phase-noise design, achieving 1-bit better resolution as that of the second RDC design. Using time-based implementation, the RDCs exhibit energy-resolution scalability and consume a measured power of 861 nW with 18-bit resolution in design 1 in TSMC 0.35 μm technology (with 10 ms read-time, with one readout every second). Measurements of designs 2 and 3 demonstrate power consumption of 19.2 μW with 20-bit resolution using TSMC 0.35μm and 17.6 μW with 20-bit resolution using TSMC 0.18μm, respectively (both with 10 ms read-time, repeated every second). With 30 ms read-time, design 3 achieves 21-bit resolution, which is the highest resolution reported for a time-based ADC. The 0.35-μm time-based RDC is the lowest-power time-based ADC reported, while the 0.18-μm time-based RDC with speed-up latch offers the highest resolution. The active chip-area for all three designs is less than 1.1 mm2.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/felec.2022.856284</guid>
        <link>https://www.frontiersin.org/articles/10.3389/felec.2022.856284</link>
        <title><![CDATA[Statistical Analysis Based Feature Selection Enhanced RF-PUF With >99.8% Accuracy on Unmodified Commodity Transmitters for IoT Physical Security]]></title>
        <pubdate>2022-04-25T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Md Faizul Bari </author><author>Parv Agrawal </author><author>Baibhab Chatterjee </author><author>Shreyas Sen </author>
        <description><![CDATA[Due to the diverse and mobile nature of the deployment environment, smart commodity devices are vulnerable to various spoofing attacks which can allow a rogue device to get access to a large network. The vulnerability of the traditional digital signature-based authentication system lies in the fact that it uses only a key/pin, ignoring the device fingerprint. To circumvent the inherent weakness of the traditional system, various physical signature-based RF fingerprinting methods have been proposed in literature and RF-PUF is a promising choice among them. RF-PUF utilizes the inherent nonidealities of the traditional RF communication system as features at the receiver to uniquely identify a transmitter. It is resilient to key-hacking methods due to the absence of secret key requirements and does not require any additional circuitry on the transmitter end (no additional power, area, and computational burden). However, the concept of RF-PUF was proposed using MATLAB-generated data, which cannot ensure the presence of device entropy mapped to the system-level nonidealities. Hence, an experimental validation using commercial devices is necessary to prove its efficacy. In this work, for the first time, we analyze the effectiveness of RF-PUF on commodity devices, purchased off-the-shelf, without any modifications whatsoever. We have collected data from 30 Xbee S2C modules used as transmitters and released as a public dataset. A new feature has been engineered through PCA and statistical property analysis. With a new and robust feature set, it has been shown that 95% accuracy can be achieved using only ∼1.8 ms of test data fed into a neural network of 10 neurons in 1 layer, reaching >99.8% accuracy with a network of higher model capacity, for the first time in literature without any assisting digital preamble. The design space has been explored in detail and the effect of the wireless channel has been investigated. The performance of some popular machine learning algorithms has been tested and compared with the neural network approach. A thorough investigation of various PUF properties has been done. With extensive testing of 41238000 cases, the detection probability for RF-PUF for our data is found to be 0.9987, which, for the first time, experimentally establishes RF-PUF as a strong authentication method. Finally, the potential attack models and the robustness of RF-PUF against them have been discussed.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/felec.2022.847069</guid>
        <link>https://www.frontiersin.org/articles/10.3389/felec.2022.847069</link>
        <title><![CDATA[Hardware-Software Co-Design of an In-Memory Transformer Network Accelerator]]></title>
        <pubdate>2022-04-11T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Ann Franchesca Laguna</author><author>Mohammed Mehdi Sharifi</author><author>Arman Kazemi</author><author>Xunzhao Yin</author><author>Michael Niemier</author><author>X. Sharon Hu</author>
        <description><![CDATA[Transformer networks have outperformed recurrent and convolutional neural networks in terms of accuracy in various sequential tasks. However, memory and compute bottlenecks prevent transformer networks from scaling to long sequences due to their high execution time and energy consumption. Different neural attention mechanisms have been proposed to lower computational load but still suffer from the memory bandwidth bottleneck. In-memory processing can help alleviate memory bottlenecks by reducing the transfer overhead between the memory and compute units, thus allowing transformer networks to scale to longer sequences. We propose an in-memory transformer network accelerator (iMTransformer) that uses a combination of crossbars and content-addressable memories to accelerate transformer networks. We accelerate transformer networks by (1) computing in-memory, thus minimizing the memory transfer overhead, (2) caching reusable parameters to reduce the number of operations, and (3) exploiting the available parallelism in the attention mechanism computation. To reduce energy consumption, the following techniques are introduced: (1) a configurable attention selector is used to choose different sparse attention patterns, (2) a content-addressable memory aided locality sensitive hashing helps to filter the number of sequence elements by their importance, and (3) FeFET-based crossbars are used to store projection weights while CMOS-based crossbars are used as an attentional cache to store attention scores for later reuse. Using a CMOS-FeFET hybrid iMTransformer introduced a significant energy improvement compared to the CMOS-only iMTransformer. The CMOS-FeFET hybrid iMTransformer achieved an 8.96× delay improvement and 12.57× energy improvement for the Vanilla transformers compared to the GPU baseline at a sequence length of 512. Implementing BERT using CMOS-FeFET hybrid iMTransformer achieves 13.71× delay improvement and 8.95× delay improvement compared to the GPU baseline at sequence length of 512. The hybrid iMTransformer also achieves a throughput of 2.23 K samples/sec and 124.8 samples/s/W using the MLPerf benchmark using BERT-large and SQuAD 1.1 dataset, an 11× speedup and 7.92× energy improvement compared to the GPU baseline.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/felec.2022.834146</guid>
        <link>https://www.frontiersin.org/articles/10.3389/felec.2022.834146</link>
        <title><![CDATA[CIDAN-XE: Computing in DRAM with Artificial Neurons]]></title>
        <pubdate>2022-02-18T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Gian Singh</author><author>Ankit Wagle</author><author>Sunil Khatri</author><author>Sarma Vrudhula</author>
        <description><![CDATA[This paper presents a DRAM-based processing-in-memory (PIM) architecture, called CIDAN-XE. It contains a novel computing unit called the neuron processing element (NPE). Each NPE can perform a variety of operations that include logical, arithmetic, relational, and predicate operations on multi-bit operands. Furthermore, they can be reconfigured to switch operations during run-time without increasing the overall latency or power of the operation. Since NPEs consume a small area and can operate at very high frequencies, they can be integrated inside the DRAM without disrupting its organization or timing constraints. Simulation results on a set of operations such as AND, OR, XOR, addition, multiplication, etc., show that CIDAN-XE achieves an average throughput improvement of 72X/5.4X and energy efficiency improvement of 244X/29X over CPU/GPU. To further demonstrate the benefits of using CIDAN-XE, we implement several convolutional neural networks and show that CIDAN-XE can improve upon the throughput and energy efficiency over the latest PIM architectures.]]></description>
      </item>
      </channel>
    </rss>