MDL-CA: A Multimodal Deep Learning Approach with a Cross Attention Mechanism for Accurate Brain Cancer Diagnosis

Sarwar, Sumaira; Majeed, Saqib; Nawaz, Asif; Bibi, Ruqia; Lee, Seung Won

doi:10.3389/fpubh.2025.1687335

ORIGINAL RESEARCH article

Front. Public Health

Sec. Digital Public Health

MDL-CA: A Multimodal Deep Learning Approach with a Cross Attention Mechanism for Accurate Brain Cancer Diagnosis

Provisionally accepted

Sumaira Sarwar¹

Saqib Majeed¹

Asif Nawaz^1*

Ruqia Bibi¹

Seung Won Lee^2*

¹Pir Mehr Ali Shah Arid Agriculture University, Rawalpindi, Pakistan
²Sungkyunkwan University, Jongno-gu, Republic of Korea

The final, formatted version of the article will be published soon.

In medical imaging and genomics, brain cancer diagnosis remains a critical challenge due to the complex interplay between underlying molecular mechanisms and anatomical abnormalities. Conventional diagnostic methods, such as invasive biopsies, isolated genomic assays, and standalone Magnetic Resonance Imaging (MRI), suffer from limitations including procedural risks, insufficient sensitivity, and incomplete characterization of tumor heterogeneity. These shortcomings often lead to de-layed diagnosis, inaccurate tumor grading, and suboptimal treatment planning. Single-modality data, such as MRI scans or genomic profiles alone, tend to yield suboptimal accuracy and limited biological interpretability in computer-aided diagnostic models. To address these challenges, To overcome these gaps, this study introduces MDL-CA, a novel multimodal deep learning framework that uniquely fuses genomic graph embeddings with MRI features through a cross-attention mechanism to enable biologically informed and highly accurate brain cancer diagnosis. we The proposed MDL-CA, is a Multimodal Deep Learning framework with a Cross-Attention mechanism, which integrates genomic data with brain MRI to improve diagnostic accuracy. The genomic graph embeddings are fused into intermediate MRI feature maps through a cross-modal attention fusion mechanism, enabling the model to capture intricate biological relationships and spatial patterns. This fusion is further optimized to enhance interactions between molecular and anatomical features, resulting in a biologically informed representation. The Entmax sigmoid function is employed to promote sparsity and enhance interpretability in the final classification stage. Data were collected from The Cancer Imaging Archive (TCIA) and The Cancer Genome Atlas (TCGA). Modality-specific feature extraction is performed using a 3D DenseNet for MRI data and a Graph Attention Network (GAT) for genomic data, following rigorous preprocessing. Extensive experiments across four datasets demonstrate that MDL-CA achieves superior performance, with accuracies of 96.22%, 97.14%, 98.46%, and 98.21%, and F1-scores ranging from 95.95% to 98.40%, validating its robustness and generalizability.

Keywords: 3D DenseNet, brain cancer, deep learning, Entmax, GAT, multimodality

Received: 17 Aug 2025; Accepted: 28 Nov 2025.

Copyright: © 2025 Sarwar, Majeed, Nawaz, Bibi and Lee. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Asif Nawaz
Seung Won Lee

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.