AUTHOR=Raha Arnab , Mathaikutty Deepak A. , Kundu Shamik , Ghosh Soumendu K. TITLE=FlexNPU: a dataflow-aware flexible deep learning accelerator for energy-efficient edge devices JOURNAL=Frontiers in High Performance Computing VOLUME=Volume 3 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/high-performance-computing/articles/10.3389/fhpcp.2025.1570210 DOI=10.3389/fhpcp.2025.1570210 ISSN=2813-7337 ABSTRACT=This paper introduces FlexNPU, a Flexible Neural Processing Unit, which adopts agile design principles to enable versatile dataflows, enhancing energy efficiency. Unlike conventional convolutional neural network accelerator architectures that adhere to fixed dataflows (such as input, weight, output, or row stationary) to transfer activations and weights between storage and compute units, our design revolutionizes by enabling adaptable dataflows of any type through configurable software descriptors. Considering that data movement costs considerably outweigh compute costs from an energy perspective, the flexibility in dataflow allows us to optimize the movement per layer for minimal data transfer and energy consumption, a capability unattainable in fixed dataflow architectures. To further enhance throughput and reduce energy consumption in the FlexNPU architecture, we propose a novel sparsity-based acceleration logic that utilizes fine-grained sparsity in both the activation and weight tensors to bypass redundant computations, thus optimizing the convolution engine within the hardware accelerator. Extensive experimental results underscore a significant improvement in the performance and energy efficiency of FlexNPU compared to existing DNN accelerators.