AUTHOR=Teng Xiangze , Li Xiang , Wei Benzheng TITLE=ModFus-PD: synergizing cross-modal attention and contrastive learning for enhanced multimodal diagnosis of Parkinson’s disease JOURNAL=Frontiers in Computational Neuroscience VOLUME=Volume 19 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/computational-neuroscience/articles/10.3389/fncom.2025.1604399 DOI=10.3389/fncom.2025.1604399 ISSN=1662-5188 ABSTRACT=Parkinson’s disease (PD) is a complex neurodegenerative disorder characterized by a high rate of misdiagnosis, underscoring the critical importance of early and accurate diagnosis. Although existing computer-aided diagnostic systems integrate clinical assessment scales with neuroimaging data, they typically rely on superficial feature concatenation, which fails to capture the deep inter-modal dependencies essential for effective multimodal fusion. To address this limitation, we propose ModFus-PD, Contrastive learning effectively aligns heterogeneous modalities such as imaging and clinical text, while the cross-modal attention mechanism further exploits semantic interactions between them to enhance feature fusion. The framework comprises three key components: (1) a contrastive learning-based feature alignment module that projects MRI data and clinical text prompts into a unified embedding space via pretrained image and text encoders; (2) a bidirectional cross-modal attention module in which textual semantics guide MRI feature refinement for improved sensitivity to PD-related brain regions, while MRI features simultaneously enhance the contextual understanding of clinical text; (3) a hierarchical classification module that integrates the fused representations through two fully connected layers to produce final PD classification probabilities. Experiments on the PPMI dataset demonstrate the superior performance of ModFus-PD, achieving an accuracy of 0.903, AUC of 0.892, and F1 score of 0.840, surpassing several state-of-the-art baselines. These results validate the effectiveness of our cross-modal fusion strategy, which enables interpretable and reliable diagnostic support, holding promise for future clinical translation.