ORIGINAL RESEARCH article
Front. Neurosci.
Sec. Neuromorphic Engineering
Volume 19 - 2025 | doi: 10.3389/fnins.2025.1567347
This article is part of the Research TopicTheoretical Advances and Practical Applications of Spiking Neural Networks, Volume IIView all articles
Three-stage Hybrid Spiking Neural Networks Fine-tuning For Speech Enhancement
Provisionally accepted- 1Ohio University, Athens, West Virginia, United States
- 2University of Kentucky, Lexington, Kentucky, United States
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
In the past decade, artificial neural networks (ANNs) have revolutionized many AI-related fields, including Speech Enhancement (SE). However, achieving high performance with ANNs often requires substantial power and memory resources. Recently, spiking neural networks (SNNs) have emerged as a promising low-power alternative to ANNs, leveraging their inherent sparsity to enable efficient computation while maintaining performance.While SNNs offer improved energy efficiency, they are generally more challenging to train compared to ANNs. In this study, we propose a three-stage hybrid ANN-to-SNN fine-tuning scheme and apply it to Wave-U-Net and ConvTasNet, two major network solutions for speech enhancement. Our framework first trains the ANN models, followed by converting them into their corresponding spiking versions. The converted SNNs are subsequently fine-tuned with a hybrid training scheme, where the forward pass uses spiking signals and the backward pass uses ANN signals to enable backpropagation. In order to maintain the performance of the original ANN models, various modifications to the original network architectures have been made.Our SNN models operate entirely in the temporal domain, eliminating the need to convert wave signals into the spectral domain for input and back to the waveform for output. Moreover, our models uniquely utilize spiking neurons, setting them apart from many models that incorporate regular ANN neurons in their architectures. Experiments on noisy VCTK and TIMIT datasets demonstrate the effectiveness of the hybrid training, where the fine-tuned SNNs show significant improvement and robustness over the baseline models.Combining the direct training and ANN-SNN conversion approaches, Baltes et al. (2023) proposed a hybrid SNN fine-tuning pipeline to achieve comparable performance in SNNs. Similar to the ANN-SNN
Keywords: Spiking neural network (SNN), Wave-U-Net, Speech enhancement, Conv-TasNet, ANN-SNN conversion
Received: 27 Jan 2025; Accepted: 08 Apr 2025.
Copyright: © 2025 Abuhajar, Wang, Baltes, Yue, Xu, Karanth, Smith and Liu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Jundong Liu, Ohio University, Athens, 45701, West Virginia, United States
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.