Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Oncol.

Sec. Breast Cancer

This article is part of the Research TopicAI-Powered Insights: Predicting Treatment Response and Prognosis in Breast CancerView all 18 articles

Development and Validation of a Pathomics-driven Machine Learning Model for Individualized Prediction of Neoadjuvant Chemotherapy Response and Early Recurrence in HR-positive, HER2-negative Breast Cancer

Provisionally accepted
Jiaxian  YueJiaxian Yue1Jiaxiang  LiuJiaxiang Liu1,2Xiyu  KangXiyu Kang1Pei  YuanPei Yuan1Wei  WangWei Wang3Zhanyu  WangZhanyu Wang1Chao  ShangChao Shang1Qingyao  ShangQingyao Shang1Guangyu  LiGuangyu Li1Xubin  DongXubin Dong1Tianxiao  WangTianxiao Wang1Dongmin  YangDongmin Yang1Shuhao  WangShuhao Wang3Chenxuan  YangChenxuan Yang1Jianming  YingJianming Ying1Xin  WangXin Wang1*
  • 1Cancer Hospital Chinese Academy of Medical Sciences, Beijing, China
  • 2Beijing Tsinghua Changgung Hospital, Beijing, China
  • 3Thorough Lab, Thorough Future, Beijing, China

The final, formatted version of the article will be published soon.

Background: Hormone receptor (HR)-positive, human epidermal growth factor receptor 2 (HER2)-negative breast cancer is the most prevalent subtype among women but has a modest response to neoadjuvant chemotherapy (NAC). Accurately predicting NAC efficacy and recurrence risk remains challenging, as conventional clinical and molecular markers have limited predictive power. Advances in digital pathology and artificial intelligence now enable quantitative pathomics analysis, offering new opportunities for precise prediction and prognostic assessment. Methods: In this retrospective study, 162 HR-positive, HER2-negative breast cancer patients treated with NAC between 2014 and 2021 were included. Hematoxylin and eosin (H&E)-stained pretreatment biopsy slides were digitized and analyzed using Vision Transformer (ViT) and Unified Network for Image (UNI) deep learning models to extract pathomic features. Thirteen clinical variables were collected. After least absolute shrinkage and selection operator (LASSO)-based feature selection, multiple machine learning models were developed for both response prediction and prognostic evaluation of recurrence, with performance evaluated by receiver operating characteristic (ROC) curves, area under the curve (AUC), sensitivity, specificity, confusion matrix, calibration curves, and decision curve analysis (DCA). Furthermore, SHapley Additive exPlanations (SHAP) was used to rank the importance of features for each model. Results: The CatBoost model achieved the best predictive performance (AUC = 0.900 in training and 0.848 in validation) when a combination of clinical and pathomics-derived variables was used. Key predictive factors included Ki-67 expression, age, histological grade, PR status, and prominent pathomic features. A Kaplan–Meier survival plot indicated that regardless of stratification by MP grade or pCR status, there was no significant difference in recurrence status or survival outcomes between the two groups in this cohort. Furthermore, the recurrence models developed mainly using pathomics were strongly accurate for predicting 1-year recurrence (AUC = 0.907 in training and 0.769 in validation). Conclusions: Integrating pathomic features with clinical variables via machine learning enables robust pretreatment prediction of NAC efficacy and short-term recurrence in HR-positive, HER2-negative breast cancer. This approach has the potential to offer a clinically practical tool to optimize individualized therapy and improve patient management, highlighting the translational value of AI-powered digital pathology in breast cancer care.

Keywords: Artificial Intelligence5, Breast cancer1, HR-positive/HER2-Negative2, machine learning6, Neoadjuvant chemotherapy3, Pathomics4, prediction7

Received: 17 Dec 2025; Accepted: 09 Feb 2026.

Copyright: © 2026 Yue, Liu, Kang, Yuan, Wang, Wang, Shang, Shang, Li, Dong, Wang, Yang, Wang, Yang, Ying and Wang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Xin Wang

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.