ORIGINAL RESEARCH article
Front. Oncol.
Sec. Breast Cancer
This article is part of the Research TopicAI-Powered Insights: Predicting Treatment Response and Prognosis in Breast CancerView all 18 articles
Development and Validation of a Pathomics-driven Machine Learning Model for Individualized Prediction of Neoadjuvant Chemotherapy Response and Early Recurrence in HR-positive, HER2-negative Breast Cancer
Provisionally accepted- 1Cancer Hospital Chinese Academy of Medical Sciences, Beijing, China
- 2Beijing Tsinghua Changgung Hospital, Beijing, China
- 3Thorough Lab, Thorough Future, Beijing, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Background: Hormone receptor (HR)-positive, human epidermal growth factor receptor 2 (HER2)-negative breast cancer is the most prevalent subtype among women but has a modest response to neoadjuvant chemotherapy (NAC). Accurately predicting NAC efficacy and recurrence risk remains challenging, as conventional clinical and molecular markers have limited predictive power. Advances in digital pathology and artificial intelligence now enable quantitative pathomics analysis, offering new opportunities for precise prediction and prognostic assessment. Methods: In this retrospective study, 162 HR-positive, HER2-negative breast cancer patients treated with NAC between 2014 and 2021 were included. Hematoxylin and eosin (H&E)-stained pretreatment biopsy slides were digitized and analyzed using Vision Transformer (ViT) and Unified Network for Image (UNI) deep learning models to extract pathomic features. Thirteen clinical variables were collected. After least absolute shrinkage and selection operator (LASSO)-based feature selection, multiple machine learning models were developed for both response prediction and prognostic evaluation of recurrence, with performance evaluated by receiver operating characteristic (ROC) curves, area under the curve (AUC), sensitivity, specificity, confusion matrix, calibration curves, and decision curve analysis (DCA). Furthermore, SHapley Additive exPlanations (SHAP) was used to rank the importance of features for each model. Results: The CatBoost model achieved the best predictive performance (AUC = 0.900 in training and 0.848 in validation) when a combination of clinical and pathomics-derived variables was used. Key predictive factors included Ki-67 expression, age, histological grade, PR status, and prominent pathomic features. A Kaplan–Meier survival plot indicated that regardless of stratification by MP grade or pCR status, there was no significant difference in recurrence status or survival outcomes between the two groups in this cohort. Furthermore, the recurrence models developed mainly using pathomics were strongly accurate for predicting 1-year recurrence (AUC = 0.907 in training and 0.769 in validation). Conclusions: Integrating pathomic features with clinical variables via machine learning enables robust pretreatment prediction of NAC efficacy and short-term recurrence in HR-positive, HER2-negative breast cancer. This approach has the potential to offer a clinically practical tool to optimize individualized therapy and improve patient management, highlighting the translational value of AI-powered digital pathology in breast cancer care.
Keywords: Artificial Intelligence5, Breast cancer1, HR-positive/HER2-Negative2, machine learning6, Neoadjuvant chemotherapy3, Pathomics4, prediction7
Received: 17 Dec 2025; Accepted: 09 Feb 2026.
Copyright: © 2026 Yue, Liu, Kang, Yuan, Wang, Wang, Shang, Shang, Li, Dong, Wang, Yang, Wang, Yang, Ying and Wang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Xin Wang
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
