Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Oncol.

Sec. Breast Cancer

Volume 15 - 2025 | doi: 10.3389/fonc.2025.1683164

This article is part of the Research TopicAI-Powered Insights: Predicting Treatment Response and Prognosis in Breast CancerView all 13 articles

Machine Learning Model for Predicting Epidermal Growth Factor Receptor Expression Status in Breast Cancer Using Ultrasound Radiomics

Provisionally accepted
Zhirong  XuZhirong XuJiayi  YeJiayi YeHuohu  ZhongHuohu ZhongJiemin  ChenJiemin ChenHan  WangHan WangXiaoqian  ZhangXiaoqian ZhangGuorong  LyuGuorong Lyu*Shanshan  SuShanshan Su*
  • The Second Affiliated Hospital of Fujian Medical University, Quanzhou, China

The final, formatted version of the article will be published soon.

Background/Objectives: The epidermal growth factor receptor (EGFR) is a clinically important target, as its expression in patients with breast cancer influences both overall and disease-free survival. Current methods for assessing EGFR expression status in a patient are invasive. Therefore, in this study, we developed a machine learning-based approach utilizing ultrasound radiomics to non-invasively predict EGFR expression status in patients with breast cancer. Methods: Radiomic features were extracted from grayscale and wavelet-transformed ultrasound images of 321 patients. The dataset was randomly split into training (n = 225) and test (n = 96) sets at a 7:3 ratio with stratified sampling to preserve the EGFR+/– ratio. Key predictors were identified using a multi-step procedure—including reproducibility filtering (ICC > 0.75), univariate F-test filtering (p < 0.05), and L1-regularized selection via LASSO regression. Seven machine-learning models were trained. Model interpretability was assessed using SHAP (Shapley Additive Explanations). In addition to the holdout evaluation, we performed stratified 10-fold cross-validation to reduce selection bias. Results: The random forest model demonstrated the optimal performance, with an area under the receiver operating characteristic curve of 0.86 in the training set and 0.70 in the test set. It significantly outperformed the other models (P < 0.001). The Shapley additive explanation method was used to interpret the model, revealing that original_ngtdm_Coarseness, original_ngtdm_Strength, and wavelet.LL_glcm_ClusterProminence were the top predictors. These features reflect structural compactness and heterogeneity associated with EGFR overexpression. Conclusions: We present a reliable and interpretable tool for non-invasively assessing EGFR expression status in patients with breast cancer. The most important predictors captured tumor heterogeneity and microstructural uniformity, highlighting the biological relevance of radiomic patterns in EGFR-positive tumors. This model integrates advanced imaging analyses with machine learning, underscoring the potential of radiomics to advance precision oncology.

Keywords: breast cancer, machine learning, epidermal growth factor receptor, ultrasound, Radiomics

Received: 10 Aug 2025; Accepted: 06 Oct 2025.

Copyright: © 2025 Xu, Ye, Zhong, Chen, Wang, Zhang, Lyu and Su. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Guorong Lyu, lgr-feus@sina.com
Shanshan Su, susan@fjmu.edu.cn

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.