ORIGINAL RESEARCH article
Front. Artif. Intell.
Sec. Medicine and Public Health
This article is part of the Research TopicAdvances in Artificial Intelligence for Early Cancer Detection and Precision OncologyView all 7 articles
Explainable Multi-modal Deep Learning for Transparent Cancer Diagnosis: Integrating Radiology, Clinical Features, and Decision Visualization
Provisionally accepted- 1Vishwakarma University, Pune, India
- 2Vishwakarma Institute of Technology, Pune, India
- 3MIT Art Design and Technology University, Pune, India
- 4San Jose State University, San Jose, United States
- 5Walmart, California, Canada
- 6Amazon, New York, United States
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Abstract. Introduction:Although artificial intelligence–based cancer diagnostic models have demonstrated strong predictive performance, their lack of transparency and reliance on single-modality data continue to limit clinical trust and adoption. Effectively integrating multi-modal data with interpret-able decision-making remains a key challenge. Methods:We propose an explainable multi-modal deep learning framework that integrates radiological imaging and structured clinical features using attention-based fusion. Image-level explanations are generated using Grad-CAM++, while SHAP is employed to quantify clinical feature contributions, enabling unified and cross-modal aligned interpretation rather than independent uni-modal explanations. The framework was evaluated on publicly available datasets, including CBIS-DDSM mammography, Duke Breast Cancer MRI, and TCGA cohorts (BRCA, LUAD, and GBM), comprising a total of 3,842 images from 2,917 patients. Results: The proposed model consistently outperformed uni-modal approaches and simple fusion baselines, achieving an improved balance between sensitivity and specificity. Attention-based fusion demonstrated superior performance compared with feature concatenation, and the integration of explainability did not compromise predictive accuracy. Visual and clinical explanations highlighted diagnostically relevant tumor regions and established oncological risk factors. Stable performance across datasets indicates strong generalization capability. Discussion: These results demonstrate that explainable multi-modal learning can effectively combine accuracy, interpret-ability, and robustness, supporting the development of reliable AI-based decision-support systems for cancer diagnosis. Keywords: Explainable artificial intelligence; Multi-modal deep learning; Cancer diagnosis; Medical imaging; Clinical data integration; Attention-based fusion; Model interpretability.
Keywords: Attention-based Fusion, cancer diagnosis, Clinical data integration, Explainable artificial intelligence, medical imaging, Model interpretability, Multi-modal deep learning
Received: 14 Dec 2025; Accepted: 27 Jan 2026.
Copyright: © 2026 Dash, Bewoor, Dongre, Bhosle, PATIL, Jadhav, Mohapatra and Walia. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
Sital Dash
Kailas PATIL
Shrikant Jadhav
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
