AUTHOR=Yuan Bo , Hu Yudie , Liang Yan , Zhu Yutong , Zhang Lingyu , Cai Shimin , Peng Rui , Wang Xianbin , Yang Zheng , Hu Jinhui 

TITLE=Comparative analysis of convolutional neural networks and transformer architectures for breast cancer histopathological image classification

JOURNAL=Frontiers in Medicine

VOLUME=Volume 12 - 2025

YEAR=2025

URL=https://www.frontiersin.org/journals/medicine/articles/10.3389/fmed.2025.1606336

DOI=10.3389/fmed.2025.1606336

ISSN=2296-858X

ABSTRACT=BackgroundBreast cancer remains the most prevalent malignancy in women globally, representing 11.7% of all new cancer cases (2.3 million annually) and causing approximately 685,000 deaths in 2020 (GLOBOCAN 2020). This multifactorial disease, influenced by genetic, hormonal and lifestyle factors, often presents with nonspecific early symptoms that delay detection. While mammography, ultrasound and MRI serve as primary screening modalities, histopathological examination remains the diagnostic gold standard—though subject to interpretation variability. Recent advances in deep learning demonstrate promising potential to improve diagnostic accuracy, reduce false positives/negatives, and alleviate radiologists’ workload, thereby enhancing clinical decision-making in breast cancer management.MethodsThis study trains and evaluates 14 deep learning models, including AlexNet, VGG16, InceptionV3, ResNet50, Densenet121, MobileNetV2, ResNeXt, RegNet, EfficientNet_B0, ConvNeXT, ViT, DINOV2, UNI, and GigaPath on the BreakHis v1 dataset. These models encompass both CNN-based and Transformer-based architectures. The study focuses on assessing their performance in breast cancer diagnosis using key evaluation metrics, including accuracy, specificity, recall (sensitivity), F1-score, Cohen’s Kappa coefficient, receiver operating characteristic (ROC) curve, and the area under the ROC curve (AUC).ResultsIn the binary classification task, due to its relatively low complexity, most models achieved excellent performance. Among them, CNN-based models such as ResNet50, RegNet, and ConvNeXT, as well as the Transformer-based foundation model UNI, all reached an AUC of 0.999. The best overall performance was achieved by ConvNeXT, which attained an accuracy of 99.2% (95% CI: 98.3%–1), a specificity of 99.6% (95% CI: 99.1%–1), an F1-score of 99.1% (95% CI: 98.0–1%), a Cohen’s Kappa coefficient of 0.983 (95% CI: 0.960–1), and an AUC of 0.999 (95% CI: 0.999–1). In the eight-class classification task, the increased complexity led to more pronounced performance differences among models, with CNN- and Transformer-based architectures performing comparably overall. The best-performing model was the fine-tuned foundation model UNI, which attained an accuracy of 95.5% (95% CI: 94.4–96.6%), a specificity of 95.6% (95% CI: 94.2–96.9%), an F1-score of 95.0% (95% CI: 93.9–96.1%), a Cohen’s Kappa coefficient of 0.939 (95% CI: 0.926–0.952), and an AUC of 0.998 (95% CI: 0.997–0.999). Additionally, using foundation model encoders directly without fine-tuning resulted in generally poor performance on the classification task.ConclusionOur findings suggest that deep learning models are highly effective in classifying breast cancer pathology images, particularly in binary tasks where multiple models reach near-perfect performance. Although recent Transformer-based foundation models such as UNI possess strong feature extraction capabilities, their zero-shot performance on this specific task was limited. However, with simple fine-tuning, they quickly achieved excellent results. This indicates that with minimal adaptation, foundation models can be valuable tools in digital pathology, especially in complex multi-class scenarios.