ORIGINAL RESEARCH article

Front. Comput. Neurosci.

Volume 19 - 2025 | doi: 10.3389/fncom.2025.1604399

ModFus-PD: synergizing cross-modal attention and contrastive learning for enhanced multimodal diagnosis of Parkinson's disease

Provisionally accepted
  • 1Shandong University of Traditional Chinese Medicine, Jinan, China
  • 2Center for Medical Artificial Intelligence, Shandong University of Traditional Chinese Medicine, Qingdao, China
  • 3Qingdao Academy of Chinese Medical Sciences, Shandong University of Traditional Chinese Medicine, Qingdao, China
  • 4Qingdao Key Laboratory of Artificial Intelligence Technology for Chinese Medicine, Qingdao, China

The final, formatted version of the article will be published soon.

Parkinson's disease (PD) is a complex neurodegenerative disorder characterized by a high rate of misdiagnosis, underscoring the critical importance of early and accurate diagnosis. Although existing computer-aided diagnostic systems integrate clinical assessment scales with neuroimaging data, they typically rely on superficial feature concatenation, which fails to capture the deep inter-modal dependencies essential for effective multimodal fusion. To address this limitation, we propose ModFus-PD, Contrastive learning effectively aligns heterogeneous modalities such as imaging and clinical text, while the cross-modal attention mechanism further exploits semantic interactions between them to enhance feature fusion.The framework comprises three key components:1) a contrastive learning-based feature alignment module that projects MRI data and clinical text prompts into a unified embedding space via pretrained image and text encoders;2) a bidirectional cross-modal attention module in which textual semantics guide MRI feature refinement for improved sensitivity to PD-related brain regions, while MRI features simultaneously enhance the contextual understanding of clinical text;3) a hierarchical classification module that integrates the fused representations through two fully connected layers to produce final PD classification probabilities. Experiments on the PPMI dataset demonstrate the superior performance of ModFus-PD, achieving an accuracy of 0.903, AUC of 0.892, and F1 score of 0.840, surpassing several state-of-the-art baselines. These results validate the effectiveness of our cross-modal fusion strategy, which enables interpretable and reliable diagnostic support, holding promise for future clinical translation.

Keywords: EARLY DIAGNOSIS OF PARKINSON'S DISEASE, Multimodal representation learning, crossmodal attention, Contrastive learning, multimodal fusion

Received: 01 Apr 2025; Accepted: 17 Jun 2025.

Copyright: © 2025 Teng, Li and WEI. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Benzheng WEI, Center for Medical Artificial Intelligence, Shandong University of Traditional Chinese Medicine, Qingdao, China

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.