ORIGINAL RESEARCH article
Front. Med.
Sec. Precision Medicine
An interpretable machine learning model for predicting brain metastasis in breast cancer
1. Tangshan Fengrun People's Hospital, Tangshan, China
2. Department of Breast Surgery, FirstWard,Tangshan People’s Hospital, Tangshan, Hebei, 063001, Tangshan, China
3. Department of Breast Surgery, FirstWard,Tangshan People’s Hospital, Tangshan, Hebei, 063001, China, Tangshan, China
4. Breast center, West China Hospital, Sichuan University., chengdu, China
5. The Fourth Hospital of Shijiazhuang, Shijiazhuang, China
6. Radiotherapy Department, The Fourth Hospital of Hebei Medical University, Shijiazhuang, China, shijiazhuang, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Abstract
Background Breast cancer is the most common malignancy worldwide. Brain metastasis in breast cancer severely impacts prognosis, and the objective of this study is to develop a machine learning model for predicting the risk of brain metastasis in breast cancer patients to assist clinical management. Methods Univariate and multivariate logistic regression analyses were employed to screen the final included variables, and eight machine learning algorithms were utilized for model construction. Model performance was evaluated using receiver operating characteristic curves, precision-recall curves, decision curve analysis (DCA), and calibration curves, with the optimal model selected based on these metrics. The model was trained on a cohort of 154,193 patients, internally validated on 66,084 patients, and externally validated on 765 real-world cases, incorporating metrics such as area under the curve (AUC), area under the precision-recall curve (AUPRC), decision curves, and calibration plots, while SHAP analysis was applied to enhance interpretability. A web-based calculator was developed based on the optimal model to facilitate clinical application. Results Univariate logistic regression identified higher tumor grade, advanced T/N stage, advanced clinical stage, and PR positivity as risk factors, whereas radiotherapy, chemotherapy, surgery, HR+/HER2−subtype, and unilateral tumors served as protective factors (P < 0.001). Multivariate analysis confirmed independent risk factors, including poorer pathological grade, N3 lymph node status, later stage, and PR positivity, and protective factors, including radiotherapy, chemotherapy, surgery, non-HR−/HER2−subtypes, and HER2 positivity. The XGBoost model achieved an AUC of 0.98 in 10-fold cross-validation, with AUCs of 0.99 and 0.97 in the internal test set and external validation set, respectively; AUPRC values were 0.933, 0.864, and 0.648; decision curve analysis demonstrated superior net benefit compared to alternative models within the 0.1-0.8 threshold range; calibration curves showed high concordance between predicted and observed event rates. SHAP analysis highlighted surgery as the primary protective factor, followed by stage and T
Summary
Keywords
Brain, breast cancer, machine learning, metastasis, Model
Received
27 August 2025
Accepted
02 February 2026
Copyright
© 2026 Wang, Zhang, Chang, Chen, Binxu, Yang and Gao. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Chao Yang; Chao Gao
Disclaimer
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.