ORIGINAL RESEARCH article

Front. Med.

Sec. Pathology

Volume 12 - 2025 | doi: 10.3389/fmed.2025.1606336

This article is part of the Research TopicEvaluating Foundation Models in Medical ImagingView all articles

Comparative Analysis of Convolutional Neural Networks and Transformer Architectures for Breast Cancer Histopathological Image Classification

Provisionally accepted
Bo  YuanBo YuanYudie  HuYudie HuYan  LiangYan LiangYutong  ZhuYutong ZhuLingyu  ZhangLingyu ZhangShimin  CaiShimin CaiRui  PengRui PengXianbin  WangXianbin Wang*Zheng  YangZheng Yang*Jinhui  HuJinhui Hu*
  • The First Hospital of Hunan University of Chinese Medicine, Hunan University of Chinese Medicine, Changsha, China

The final, formatted version of the article will be published soon.

Background:Breast cancer remains the most prevalent malignancy in women globally, representing 11.7% of all new cancer cases(2.3 million annually) and causing approximately 685,000 deaths in 2020(GLOBOCAN 2020). This multifactorial disease, influenced by genetic, hormonal and lifestyle factors, often presents with nonspecific early symptoms that delay detection. While mammography,ultrasound and MRI serve as primary screening modalities, histopathological examination remains the diagnostic gold standard-though subject to interpretation variability. Recent advances in deep learning demonstrate promising potential to improve diagnostic accuracy,reduce false positives/negatives,and alleviate radiologists' workload,thereby enhancing clinical decision-making in breast cancer management.Methods:This study trains and evaluates fourteen deep learning models, including AlexNet,VGG16,InceptionV3,ResNet50,Densenet121,MobileNetV2,ResNeXt,RegNet,EfficientNet_B0,ConvNeXT,ViT,DINOV2,UNI and GigaPath on the BreakHis v1 dataset. These models encompass both CNN-based and Transformer-based architectures. The study focuses on assessing their performance in breast cancer diagnosis using key evaluation metrics,including accuracy,specificity,recall,F1-score,Cohen’s Kappa coefficient,ROC and AUC.Results: In the binary classification task,due to its relatively low complexity, most models achieved excellent performance. Among them,CNN-based models such as ResNet50,RegNet,and ConvNeXT,as well as the Transformer-based foundation model UNI,all reached an AUC of 0.999. The best overall performance was achieved by ConvNeXT, which attained an accuracy of 99.2%,a specificity of 99.6%,an F1-score of 99.1%,a Cohen’s Kappa coefficient of 0.983,and an AUC of 0.999.In the eight-class classification task, the increased complexity led to more pronounced performance differences among models,with CNN- and Transformer-based architectures performing comparably overall. The best-performing model was the fine-tuned foundation model UNI, which attained an accuracy of 95.5%, a specificity of 95.6%, an F1-score of 95.0%, a Cohen’s Kappa coefficient of 0.939, and an AUC of 0.998. Additionally, using foundation model encoders directly without fine-tuning resulted in generally poor performance on the classification task.Conclusion:Our findings suggest that deep learning models are highly effective in classifying breast cancer pathology images,particularly in binary tasks where multiple models reach near-perfect performance. Although recent Transformer-based foundation models such as UNI possess strong feature extraction capabilities, their zero-shot performance on this specific task was limited. However, with simple fine-tuning, they quickly achieved excellent results. This indicates that with minimal adaptation, foundation models can be valuable tools in digital pathology, especially in complex multi-class scenarios.

Keywords: breast cancer, deep learning (DL), Pathological Tissue Section, Artificial intelligence (AI), transformer, Convolutional Neural Networks

Received: 05 Apr 2025; Accepted: 22 May 2025.

Copyright: © 2025 Yuan, Hu, Liang, Zhu, Zhang, Cai, Peng, Wang, Yang and Hu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Xianbin Wang, The First Hospital of Hunan University of Chinese Medicine, Hunan University of Chinese Medicine, Changsha, China
Zheng Yang, The First Hospital of Hunan University of Chinese Medicine, Hunan University of Chinese Medicine, Changsha, China
Jinhui Hu, The First Hospital of Hunan University of Chinese Medicine, Hunan University of Chinese Medicine, Changsha, China

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.