AUTHOR=Raha Arnab , Mathaikutty Deepak A. , Kundu Shamik , Ghosh Soumendu K. 

TITLE=FlexNPU: a dataflow-aware flexible deep learning accelerator for energy-efficient edge devices

JOURNAL=Frontiers in High Performance Computing

VOLUME=Volume 3 - 2025

YEAR=2025

URL=https://www.frontiersin.org/journals/high-performance-computing/articles/10.3389/fhpcp.2025.1570210

DOI=10.3389/fhpcp.2025.1570210

ISSN=2813-7337

ABSTRACT=This paper introduces FlexNPU, a Flexible Neural Processing Unit, which adopts agile design principles to enable versatile dataflows, enhancing energy efficiency. Unlike conventional convolutional neural network accelerator architectures that adhere to fixed dataflows (such as input, weight, output, or row stationary) to transfer activations and weights between storage and compute units, our design revolutionizes by enabling adaptable dataflows of any type through configurable software descriptors. Considering that data movement costs considerably outweigh compute costs from an energy perspective, the flexibility in dataflow allows us to optimize the movement per layer for minimal data transfer and energy consumption, a capability unattainable in fixed dataflow architectures. To further enhance throughput and reduce energy consumption in the FlexNPU architecture, we propose a novel sparsity-based acceleration logic that utilizes fine-grained sparsity in both the activation and weight tensors to bypass redundant computations, thus optimizing the convolution engine within the hardware accelerator. Extensive experimental results underscore a significant improvement in the performance and energy efficiency of FlexNPU compared to existing DNN accelerators.