ORIGINAL RESEARCH article
Front. Artif. Intell.
Sec. Medicine and Public Health
Volume 8 - 2025 | doi: 10.3389/frai.2025.1679310
Early‑fusion hybrid CNN‑Transformer models for multiclass ovarian tumor ultrasound classification
Provisionally accepted- 1Escuela Politécnica Superior. Universidad Católica de Murcia (UCAM) Av. de los Jerónimos, 135, 30107, Murcia, Spain., Murcia, Spain
- 2Facultad de Medicina. Universidad Católica de Murcia (UCAM) Av. de los Jerónimos, 135, 30107, Murcia, Spain., Murcia, Spain
- 3Centro de Investigación en Ciencia Aplicada y Tecnología Avanzada, Instituto Politécnico Nacional. Boulevard de la Tecnología, 1036 Z-1, P 2/2, 62790 Atlacholoaya, Morelos., Xochitepec, Mexico
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Ovarian cancer remains the deadliest gynecologic malignancy, and transvaginal ultrasound (TVS), the first-line test, still suffers from limited specificity and operator dependence. We introduce a learned early-fusion (joint projection) hybrid that couples EfficientNet-B7 (local descriptors) with a Swin Transformer (hierarchical global context) to classify eight ovarian tumor categories from 2D TVS. Using the public, de-identified OTU-2D dataset (n=1,469 images across eight histopathologic classes), we conducted patient-level, stratified 5-fold cross-validation repeated 10×. To address class imbalance while preventing leakage, training used train-only oversampling, ultrasound-aware augmentations, and strong regularization; validation/test folds were never resampled. The hybrid achieved AUC 0.9904, accuracy 92.13%, sensitivity 92.38%, and specificity 98.90%, outperforming single CNN or ViT baselines. A soft ensemble of the top hybrids further improved performance to AUC 0.991, accuracy 93.3%, sensitivity 93.6%, and specificity 99.0%. Beyond discrimination, we provide deployment-oriented evaluation: isotonic calibration yielded reliable probabilities, decision-curve analysis showed net clinical benefit across 5–20% risk thresholds, entropy-based uncertainty supported confidence-based triage, and Grad-CAM highlighted clinically salient regions. All metrics are reported with 95% bootstrap confidence intervals, and the evaluation protocol preserves real-world data distributions. Taken together, this work advances ovarian ultrasound AI from accuracy-only reporting to calibrated, explainable, and uncertainty-aware decision support, offering a reproducible reference framework for multiclass ovarian ultrasound and a clear path toward clinical integration and prospective validation.
Keywords: ovarian cancer, ultrasound imaging, deep learning, CNN, vision Transformer, Hybrid model, early diagnosis
Received: 04 Aug 2025; Accepted: 17 Sep 2025.
Copyright: © 2025 Garcia-Atutxa, Martínez-Más, Bueno-Crespo and Villanueva Flores. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Francisca Villanueva Flores, fvillanuevaf@ipn.mx
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.