Enhancing Preoperative HER2 Status Classification of Invasive Breast Cancers Using Machine Learning Models Based on Clinicopathological and MRI Features: A Multicenter Study

Zhao, Suhong; Li, Zhaohua; Wang, Yanan; Zhao, Fang; Chen, Peipei; Pang, Guodong

doi:10.3389/fcell.2025.1669651

ORIGINAL RESEARCH article

Front. Cell Dev. Biol.

Sec. Molecular and Cellular Pathology

Enhancing Preoperative HER2 Status Classification of Invasive Breast Cancers Using Machine Learning Models Based on Clinicopathological and MRI Features: A Multicenter Study

Provisionally accepted

Suhong Zhao¹

Zhaohua Li¹

Yanan Wang²

Fang Zhao³

Peipei Chen¹

Guodong Pang^1*

¹Department of Radiology, The Second Hospital of Shandong University, Jinan, China
²Department of Radiology, linglong yingcheng hospital, Yantai, China
³Department of Radiology, Qilu Hospital of Shandong University, Jinan, China

The final, formatted version of the article will be published soon.

Rationale and Objectives: The human epidermal growth factor receptor 2 (HER2) gene status is crucial for determining treatment efficacy. This study assessed preoperative HER2 classification in breast cancer using machine learning based on clinicopathological and MRI characteristics. Materials and Methods: This retrospective study involved 1015 patients (1030 lesions) across two centers. Patients were divided into training, internal validation, and external validation sets. Nomograms were developed using clinicopathological and MRI features. Predictive models were constructed using decision trees (DT), support vector machines (SVM), k-nearest neighbors (k-NN), artificial neural networks (ANN), and multivariable logistic regression (LR). Model performance was evaluated using receiver operating characteristic curves, decision curve analysis, and calibration curves. Model interpretability was achieved by developing nomograms and employing SHAP (SHapley Additive exPlanations) analysis. Results: Key variables for distinguishing HER2-positive from HER2-negative cases included regional N category, estrogen receptor, PR (progesterone receptor) status, Ki-67 status, lesion number, distribution quadrant, and accompanying signs. The SVM model achieved the highest AUC of 0.86 (95% confidence interval (CI): 0.81-0.90) in the training set, while the ANN model had an AUC of 0.77 (95% CI: 0.67-0.86) in the internal validation set. In the external validation set, the LR model achieved the highest AUC of 0.66 (95% CI: 0.56-0.76), although the overall performance was modest. For HER2-low versus HER2-zero differentiation, Ki-67 status, lesion number, distribution quadrant, mass shape, early enhancement rate, and ADC (apparent diffusion coefficient) were significant. The SVM model attained the highest AUC of 0.87 (95% CI: 0.83-0.91) in the training set, while the LR model demonstrated superior generalizability, yielding the highest AUCs in both the internal and external validation sets (internal: 0.67, 95% CI: 0.58-0.76; external: 0.74, 95% CI: 0.65-0.83). Radiologists benefited from the nomogram for improved diagnostic accuracy, especially junior radiologists. SHAP analysis revealed that PR status was paramount for HER2-positive classification, whereas mass shape and ADC values were dominant for identifying HER2-low status. Conclusion: Integrating machine learning with clinicopathological and MRI characteristics improves the accuracy of HER2 status classification in breast cancer and enhances diagnostic capabilities for radiologists in clinical practice.

Keywords: Magnetic Resonance Imaging, Clinicopathological features, breast cancer, Humanepidermal growth factor 2, machine learning

Received: 20 Jul 2025; Accepted: 10 Nov 2025.

Copyright: © 2025 Zhao, Li, Wang, Zhao, Chen and Pang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Guodong Pang, pgd226@aliyun.com

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.