ORIGINAL RESEARCH article
Front. Oncol.
Sec. Cancer Imaging and Image-directed Interventions
ViTCNN: A Robust Hybrid CNN–Vision Transformer Based Deep Learning framework for Multi-Disease Diagnosis in Women's Healthcare
Provisionally accepted- 1Indian Institute of Information Technology Sonepat, Sonipat, India
- 2Yuan Ze University, Zhongli District, Taiwan
- 3Mittuniversitetet, Sundsvall, Sweden
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Accurate and efficient detection of multiple diseases from diagnostic images remains a major challenge in today's world , especially in women's health conditions such as breast cancer, cervical cancer, and Polycystic Ovary Syndrome (PCOS). Each of these diseases presents its own unique imaging characteristics and visual patterns, making detection of these diseases all together through a single model is highly challenging. In this respect, in order to overcome this, we have proposed a hybrid deep learning framework that combines EfficientNetB0 and Vision Transformer for multiple multi-disease detection. This shared backbone and multi-head architecture of the proposed framework integrate the strong spatial feature extraction ability of EfficientNetB0 with the contextual reasoning ability of the Vision Transformer, ensuring that the model is able to capture both local and global features of diseases. Our framework was trained on a different dataset containing several thousand of annotated diagnostic images using a two-stage learning strategy: 70 epochs of initial training followed by 30 epochs of fine-tuning. Experimental results shows very impressive diagnostic performance, where our approach has achieved accuracies of 96.8% for breast cancer, 95% for cervical cancer, and 99.03% for PCOS. These numbers are improved to 98.5%, 96.5%, and 99.07%, respectively, after a fine-tuning stage. Future work on this study will focus on dataset expansion and clinical validation for real-world diagnostic deployment.
Keywords: breast cancer, cervical cancer, diagnostic Image Classification, EfficientNetB0, Hybrid deep learning, Multi-Disease Identification, Polycystic OvarySyndrome (PCOS), Vision Transformer (ViT)
Received: 25 Nov 2025; Accepted: 09 Feb 2026.
Copyright: © 2026 Juneja, Bhati, Tejani and Mousavirad. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
Sonam Juneja
Bhoopesh Singh Bhati
Seyed Jalaleddin Mousavirad
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
