Multi Modal Fusion of Medical Imaging and Biomechanical Data Using Attention based Swin-Unet and LSTM for Sports Injury Prediction

Li, Siyuan; Hou, Ziyu; Amjad, Kamran; Mushtaq, Husnain

doi:10.3389/fphys.2025.1687895

ORIGINAL RESEARCH article

Front. Physiol.

Sec. Medical Physics and Imaging

Multi Modal Fusion of Medical Imaging and Biomechanical Data Using Attention based Swin-Unet and LSTM for Sports Injury Prediction

Provisionally accepted

Siyuan Li¹

Ziyu Hou^2*

Kamran Amjad³

Husnain Mushtaq^4*

¹Wuhan Wuchang Shouyi College, Wuhan, China
²Hunan University of Science and Technology, Xiangtan, China
³Central South University School of Automation, Changsha, China
⁴South China University of Technology, Guangzhou, China

The final, formatted version of the article will be published soon.

Background: Accurately predicting sports injuries remains a significant challenge due to the complexity of factors involved, including anatomical structures and movement mechanics. Traditional approaches often rely on single data sources and fail to provide personalized risk assessments, limiting their effectiveness. Methodology: This study introduces a multimodal approach to predicting sports injuries by combining high resolution computed tomography (CT) scans with biomechanical data from motion capture systems, wearable inertial measurement units (IMUs), and force-sensitive insoles. CT images were denoised and contrast-enhanced before being analyzed with the Swin-UNet architecture, which captures both fine structural details and broader spatial patterns. At the same time, biomechanical signals such as joint movement, ground reaction forces, and loading patterns were processed using orthogonal component decomposition and analyzed with a Long Short-Term Memory (LSTM) network to capture changes over time. The results from both models were combined through a decision level fusion method, producing a single injury-risk score. By integrating anatomical and functional data, the framework provides a more accurate and timely assessment of injury risk, supporting early intervention and improved athlete safety. Results: The proposed model demonstrated strong predictive performance, achieving an accuracy of 94%, precision of 91%, recall of 92%, and an F1 score of 91%. These results highlight the advantage of combining high resolution imaging with biomechanical measurements through an advanced deep learning framework, outperforming traditional methods. Conclusion: By integrating CT imaging and biomechanical data within a Swin Unet based framework, this study offers a precise and personalized approach to sports injury prediction. The inclusion of real-time monitoring further enhances the practical value of the model, supporting early intervention and improving athlete safety and training efficiency.

Keywords: attention mechanism, Sports injury prediction, multimodal data fusion, real-timemonitoring, Personalized risk assessment, deep learning framework

Received: 26 Sep 2025; Accepted: 27 Nov 2025.

Copyright: © 2025 Li, Hou, Amjad and Mushtaq. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Ziyu Hou
Husnain Mushtaq

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.