Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Phys.

Sec. Radiation Detectors and Imaging

Volume 13 - 2025 | doi: 10.3389/fphy.2025.1576591

This article is part of the Research TopicMulti-Sensor Imaging and Fusion: Methods, Evaluations, and Applications, Volume IIIView all 10 articles

Multi-Modal Action Recognition via Advanced Image Fusion Techniques for Cyber-Physical Systems

Provisionally accepted
  • Xinyang Normal University, Xinyang, China

The final, formatted version of the article will be published soon.

The increasing complexity of cyber-physical systems (CPS) demands robust and efficient action recognition frameworks capable of seamlessly integrating multi-modal data. Traditional methods often lack adaptability and perform poorly when integrating diverse information sources, such as spatial and temporal cues from diverse image sources. To address these limitations, we propose a novel Multi-Scale Attention-Guided Fusion Network (MSAF-Net), which leverages advanced image fusion techniques to significantly enhance action recognition performance in CPS environments. Our approach capitalizes on multi-scale feature extraction and attention mechanisms to dynamically adjust the contributions from multiple modalities, ensuring optimal preservation of both structural and textural information. Unlike conventional spatial or transformdomain fusion methods, MSAF-Net integrates adaptive weighting schemes and perceptual consistency measures, effectively mitigating challenges such as over-smoothing, noise sensitivity, and poor generalization to unseen scenarios. The model is designed to handle the dynamic and evolving nature of CPS data, making it particularly suitable for applications such as surveillance, autonomous systems, and human-computer interaction. Extensive experimental evaluations demonstrate that our approach not only outperforms state-of-the-art benchmarks in terms of accuracy and robustness but also exhibits superior scalability across diverse CPS contexts. This work marks a significant advancement in multi-modal action recognition, paving the way for more intelligent, adaptable, and resilient CPS frameworks. MSAF-Net has strong potential for application in medical imaging, particularly in multi-modal diagnostic tasks such as combining MRI, CT, or PET scans to enhance lesion detection and image clarity, which is essential in clinical decision-making.

Keywords: Multi-modal fusion, Action recognition, Cyber-physical Systems, attention mechanisms, Image fusion techniques

Received: 14 Feb 2025; Accepted: 16 Jun 2025.

Copyright: © 2025 Zhu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Daoyu Zhu, Xinyang Normal University, Xinyang, China

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.