ORIGINAL RESEARCH article
Front. Mech. Eng.
Sec. Vibration Systems
Volume 11 - 2025 | doi: 10.3389/fmech.2025.1631818
This article is part of the Research TopicTechnical Briefs in Mechanical Engineering: Volume 2View all articles
A Multi-Modal Deep Learning Framework for Intelligent Mechanical Fault Diagnosis: Fusion of Acoustic and Vibration Sensor Data
Provisionally accepted- Guizhou University, Guiyang, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Accurate fault diagnosis is essential for ensuring the safety and reliability of mechanical systems. This study presents a multi-modal deep learning framework that integrates acoustic and vibration sensor data for intelligent fault detection. The proposed model employs a hybrid CNN-LSTM architecture with an attention mechanism to extract both spatial and temporal features. A two-stage fusion strategy is designed, combining feature-level adaptive weighting and decision-level cross-modal attention to enhance discriminative capability. The model is trained using a phased strategy involving single-modal pretraining followed by joint fine-tuning. Extensive experiments were conducted on two public datasets (CWRU and IEEE DataPort), achieving 99.3% and 97.1% accuracy respectively.The model demonstrated strong generalization ability in crossdataset evaluation, maintained robustness under noise conditions, and provided interpretable decision logic through attention visualizations. These results confirm that the proposed approach outperforms traditional and unimodal methods in terms of accuracy, robustness, and interpretability, and is suitable for practical industrial deployment.
Keywords: Multi-modal data fusion, deep learning, Mechanical fault diagnosis, attention mechanism, CNN-LSTM, Sensor robustness
Received: 20 May 2025; Accepted: 04 Aug 2025.
Copyright: © 2025 Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Hengyi Zhang, Guizhou University, Guiyang, China
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.