Machine learning algorithms for individualized prediction of prognosis in breast cancer liver metastases and the prognostic impact of primary tumor surgery: a multicenter study

Chen, Chunmei; Wu, Jundong; Xu, Bo; Li, Weiwen; Zhong, Chengming; Yan, Zhibing; Zhong, Qipeng; Li, Ronggang; Shao, Mingtao; Dong, Yan; Fang, Yutong; Li, Yong; Zhang, Qunchen

doi:10.3389/fendo.2025.1656191

ORIGINAL RESEARCH article

Front. Endocrinol., 13 October 2025

Sec. Cancer Endocrinology

Volume 16 - 2025 | https://doi.org/10.3389/fendo.2025.1656191

This article is part of the Research TopicAI-Powered Insights: Predicting Treatment Response and Prognosis in Breast CancerView all 17 articles

Machine learning algorithms for individualized prediction of prognosis in breast cancer liver metastases and the prognostic impact of primary tumor surgery: a multicenter study

Chunmei Chen¹

Jundong Wu²

Bo Xu^3,4

Weiwen Li¹

Chengming Zhong⁵

Zhibing Yan⁶

Qipeng Zhong⁶

Ronggang Li⁷

Mingtao Shao¹

Yan Dong¹

Yutong Fang²

Yong Li^1*†

Qunchen Zhang^1*†

¹Department of Breast, Jiangmen Central Hospital, Jiangmen, Guangdong, China
²The Breast Center, Cancer Hospital of Shantou University Medical College, Shantou, Guangdong, China
³Department of General Surgery, The First Affiliated Hospital of Jinan University, Guangzhou, Guangdong, China
⁴Department of General Surgery, Guangzhou First People’s Hospital, School of Medicine, South China University of Technology, Guangzhou, Guangdong, China
⁵Department of Equipment, Jiangmen Xinhui Maternal and Child Health Hospital, Jiangmen, Guangdong, China
⁶Department of Anesthesiology, Jiangmen Central Hospital, Jiangmen, Guangdong, China
⁷Department of Pathology, Jiangmen Central Hospital, Jiangmen, Guangdong, China

Background: The prognosis of patients with breast cancer liver metastasis (BCLM) is generally poor, and there are no specific treatment guidelines. Accurate prognostic tools are needed to estimate survival and support individualized management.

Methods: The study cohort consisted of 4,817 patients diagnosed with BCLM from the SEER database spanning 2010 to 2020. Candidate predictors were screened using univariate and multivariate Cox regression. Five machine-learning algorithms—Random Forest (RF), Logistic Regression, XGBoost, Decision Tree, and Gradient Boosting—were trained to predict 6-month, 1-, 3-, and 5-year overall survival (OS) and breast cancer–specific survival (BCSS). Labels at each horizon were handled with inverse probability-of-censoring weighting, and performance was assessed with IPCW-AUC, accuracy, F1 score, calibration, and decision curve analysis. External validation included 124 BCLM patients from two Chinese hospitals. To evaluate the effect of primary tumor surgery (PTS), we modeled PTS as a time-dependent exposure and performed time-dependent Cox analyses with time-varying effects, Simon–Makuch curves, piecewise Cox modeling, 2-month landmark analysis, and E-value calculations.

Results: RF was the top performer for both OS and BCSS across horizons (training AUCs = 0.840–0.899; internal test AUCs = 0.763–0.787), with good calibration and net benefit. External validation showed consistent discrimination (AUCs 0.779–0.815). SHAP analyses highlighted chemotherapy, age, subtype, and surgery as dominant contributors. In time-dependent analyses, PTS was associated with reduced risks of death (OS: HR 0.80, 95% CI 0.72–0.88) and breast cancer–specific mortality (BCSS: HR 0.77, 95% CI 0.69–0.86); findings were directionally consistent in piecewise and landmark analyses, and E-values (≥1.81) supported robustness to moderate unmeasured confounding.

Conclusion: We developed and externally validated robust RF-based models for predicting OS and BCSS in BCLM. Our results indicate that PTS is associated with longer survival and lower breast cancer-specific mortality in carefully selected patients, supporting consideration within individualized, multidisciplinary decision-making.

Introduction

Breast cancer (BC) is increasingly becoming the most common cancer worldwide and is the primary cause of cancer-related mortality among women (1). Common sites of metastasis in advanced BC include the bones, lungs, liver, and brain (2), with liver metastases occurring in 50–61% of cases (3). Although the liver is not the organ with the highest probability of metastasis, involvement of the liver is an independent predictor of worse overall survival (OS) in BC patients, compared to factors such as bone metastasis (2). Once liver metastasis occurs, the prognosis is generally poor, with a median survival of only 4–8 months without treatment. For patients who respond to systemic treatment, the median survival from the date of diagnosis is only 18–24 months, with most patients showing disease progression after approximately 1–2 years of stable treatment (4). The 5-year and 10-year survival rates are as low as 27% and 13%, respectively (5, 6). Currently, there are no specific treatment guidelines for patients with breast cancer liver metastases (BCLM) (7). Furthermore, various clinical characteristics significantly influence the prognosis of BCLM patients (8). Therefore, there is an urgent need for precise prognostic models to assess the survival of BCLM patients and to assist in optimizing their individualized management.

Machine learning (ML) algorithms have emerged as pivotal tools in the field of medicine, notably in the construction of models that predict patient outcomes with high accuracy (9–11). Ranging from conventional regression models to advanced deep learning architectures, ML excels in processing extensive, complex datasets by identifying non-linear relationships between inputs and outcomes (12). The capacity of ML to consolidate and analyze various forms of clinical data enables the provision of more accurate and personalized care for patients (13, 14). Despite its extensive application in cancer prognostics, the potential of ML to forecast outcomes for BCLM patients remains largely unexplored. The individualized identification of these high-risk BCLM patients could significantly influence our clinical decisions and lead to the creation of tailored therapeutic strategies.

The role of primary tumor surgery (PTS) and its approaches remains controversial in the management of patients initially diagnosed with stage IV BC. Several retrospective studies have highlighted that appropriately selected subsets of patients—such as those with smaller metastatic burdens, positive hormone receptors, and younger ages—may benefit in terms of OS from PTS (15–20). The MF07–01 trial further demonstrated significant survival benefits for patients who underwent PTS followed by comprehensive systemic treatments (21). However, other prospective randomized controlled trials have shown that primary tumor surgery does not confer an OS advantage in stage IV patients, except in carefully selected subgroups (22–24). Typically, the local treatment of BCLM patients focuses more on the management of liver metastases, and the benefits of PTS based on comprehensive systemic therapy remains uncertain and under debate.

In this study, we utilized the Surveillance, Epidemiology, and End Results (SEER) database to develop models predicting OS and breast cancer-specific survival (BCSS) in BCLM patients, using five different ML algorithms. We also retrospectively collected data from two Chinese hospitals on BCLM patients to evaluate the applicability of these models. To further investigate the prognostic impact of PTS in BCLM patients, we applied time-dependent Cox regression models and conducted multiple sensitivity analyses. The aim of this study is to provide robust, individualized survival predictions and to explore the potential survival benefits of PTS to inform clinical decision-making in BCLM management.

Materials and methods

Patients and study design

This study adhered to the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD + AI) checklist (25). The flowchart illustrates our study design (Figure 1). This study encompassed three patient cohorts. The SEER database, a publicly accessible repository, is established and maintained by the National Cancer Institute. Initially, data from the SEER 17 registries research data [(2000–2020); version 8.4.3] were utilized. Given that Human epidermal growth factor receptor 2 (HER2) status and specific metastatic organ sites were incorporated from 2010 onwards, the inclusion criteria were as follows: (1) female sex; (2) diagnosis year falling within 2010 to 2020; (3) anatomical site and morphological coding aligned with the International Classification of Diseases Oncology Third Edition (ICD-O-3); (4) concurrent liver metastases at breast cancer diagnosis; and (5) survival time >0 months. Patients with multiple primary tumors were excluded. Furthermore, retrospective data were gathered from BCLM patients treated at Jiangmen Central Hospital (JCH) and Cancer Hospital of Shantou University Medical College (CHSU) between January 2010 and June 2023. This study was approved by the Ethics Committees of JCH (No. 2024312) and CHSU (No. 2024216). Our Ethics Committees, in accordance with the Guidelines for Ethical Review Committees of Clinical Research Involving Humans (2023 edition), determined that informed consent was not required and thus approved the waiver of informed consent.

Figure 1

Flowchart detailing the selection and analysis process for female patients with breast cancer liver metastases in the SEER database (2010-2020). Out of 9024 patients, exclusions were made for multiple primary cancers, unknown survival or classification, race, subtype, TN stage, surgery. 4817 patients were analyzed and stratified into training and test groups. Cox regression and model evaluations were conducted, including prognostic models and time-dependent analyses. Further exclusions were applied for unknown surgery dates, leading to various analyses like Simon-Makuch curve, time-varying hazard ratio, and landmark analysis.

Figure 1. Flow chart of this study. SEER, surveillance, epidemiology, and end results; OS, overall survival; BCSS, breast cancer-specific survival; RF, random forest; LR, logistic regression; XGBoost, extreme gradient boosting; DT, decision tree; GBDT, gradient boosting decision tree; ROC, receiver operating characteristic; DCA, decision curve analysis; SHAP, SHapley Additive exPlanations; PTS, primary tumor surgery.

Data collection

The following patient characteristics were obtained: age, race, marital status, median household income (inflation adjusted), histological type, tumor location, histologic grade, molecular subtype, T stage, N stage, surgery, radiotherapy, chemotherapy, bone metastases, brain metastases, and lung metastases. Tumor-related variables comprised histological type, coded according to the ICD-O-3; primary tumor location, based on ICD-O-3 topography codes; and histological grade. Subtype was determined using available molecular or clinicopathological surrogates (hormone receptor and HER2 status). Tumor stage was assessed using the American Joint Committee on Cancer (AJCC) TNM staging system (8th edition). The primary endpoint was OS and the secondary endpoint was BCSS. The median follow-up time was 56 months (IQR: 28.0–86.0) for patients from the SEER database and 35 months (20.0–58.0) for patients from two hospitals in China. Survival rates and event counts for each prediction interval are presented in Supplementary Table S1.

Feature selection, model construction and evaluation

In order to reduce redundant variables and model overfitting, we employed univariate and multivariate Cox regression analyses within each training fold to select independent factors associated with prognosis in the training group. Statistically significant features were utilized for subsequent ML model construction. Five commonly used ML algorithms, including Random Forest (RF), Logistic Regression (LR), XGBoost, Decision Tree (DT), and Gradient Boosting Decision Tree (GBDT), were applied to predict 6-month, 1-year, 3-year, and 5-year Overall Survival (OS) and Breast Cancer-Specific Survival (BCSS). For each prediction horizon $t$ ∈{0.5,1,3,5} year, we defined a binary outcome $Y_{t}$ = 1 if the event occurred by $t$ and $Y_{t}$ = 0 otherwise. Patients censored prior to $t$ had unknown labels and were handled using inverse probability-of-censoring weighting (IPCW). The censoring survival function $\hat{G} (t)$ was estimated via the Kaplan–Meier method, with right-censoring defined as alive at last follow-up. Each labeled observation received weight:

w_{i} (t) = \frac{1}{\hat{G} (\min {(T_{i}, t)}^{-})}

, i.e., at the time the label became known. Classifiers were then trained with these weights and evaluated using IPCW-weighted Area Under the Curve (AUC) and Brier score at each prediction horizon (26). RF is a non-parametric machine learning algorithm based on ensemble learning, which predicts by constructing multiple decision trees and integrating them through voting or averaging. Each decision tree is generated by bootstrapping the training set, with only a random subset of features considered for splitting at each node. This randomness effectively reduces model variance and enhances generalization performance. LR is a generalized linear model that estimates probabilities by mapping the output of a linear function to a sigmoid function and estimates model parameters by maximizing the likelihood function. XGBoost is an efficient gradient boosting decision tree algorithm that iteratively trains decision tree models by optimizing the negative gradient of the loss function and utilizes regularization techniques to control model complexity. DT is a classification and regression method based on tree structure, which partitions the dataset into different categories or values through a series of decision nodes. GBDT is an algorithm that enhances model performance by iteratively training decision trees and optimizing the loss function. It constructs decision trees sequentially and trains each subsequent tree using the residual of the previous tree, effectively reducing model bias.

To enhance model robustness, hyperparameters were optimized via grid search with 10-fold cross-validation. Patients from the SEER database were randomly divided into training and internal testing groups at an 8:2 ratio. Patients from two Chinese hospitals were used as an external testing group to further validate the generalizability of the optimal model. The performance of ML models was evaluated using AUC, accuracy, and F1 score of the training and testing groups. Calibration curves were used to assess the accuracy and reliability of model predictions. Decision Curve Analysis (DCA) was employed to determine the clinical utility of the models. Confusion matrices intuitively displayed the classification performance of the models. SHAPley Additive exPlanations (SHAP) values were computed using the “shap” library to explain the contribution or importance of each feature to the model.

PTS

To minimize immortal-time bias and avoid exposure misclassification around surgery, PTS was modeled as a time-dependent covariate. For patients undergoing surgery, follow-up time before surgery was classified as “non-PTS,” and time after surgery as “PTS.” Patients with a surgical history but no recorded surgery date were excluded from the main analysis, as their exposure periods could not be reliably defined. We performed multivariable analyses using a time-dependent Cox regression model based on PTS, incorporating an interaction term with log(time) to examine time-varying effects, and generated hazard ratio (HR) curves over time. For descriptive visualization, Simon–Makuch curves were additionally plotted. A segmented Cox model was applied with the follow-up time divided at 24 months, and a 2-month landmark analysis was conducted among patients who survived beyond this time point. To evaluate robustness against unmeasured confounding, the E-value was calculated using the following formula:

E = {HR}^{*} + \sqrt{{HR}^{*} {\times (HR}^{*} - 1)},if HR < 1, {HR}^{*} = \frac{1}{HR}

Statistical analysis

Categorical data were presented as counts and percentages and were compared using either the Chi-squared test or Fisher’s exact test depending on the sample size. Kaplan-Meier survival analysis was conducted using the log-rank test. Univariate and multivariate Cox regression analyses were employed to identify modeling features. The generalized variance inflation factor (GVIF) was utilized to detect multicollinearity in the Cox regression model. An adjusted GVIF value less than 2 was considered acceptable. Statistical analyses were performed using R software version 4.2.1 (r-project.org/) or Python version 3.8 (Python Software Foundation). A significance level of P< 0.05 indicated statistical significance.

Results

Clinicopathologic characteristics

Finally, we collected data on 4817 BCLM patients who met the eligibility criteria from the SEER database. As shown in Table 1, 1545 patients (32.07%) were aged 50 years or younger, 1945 patients (40.38%) were between the ages of 51 and 65, and 1327 patients (27.55%) were 66 years or older. The majority of patients were Caucasian (72.33%). Approximately 45.84% of the patients were married, and 24.60% were single or homosexual. In terms of household income, 54.35% of the patients had incomes exceeding $70,000. About 80% of the patients were diagnosed with invasive ductal carcinoma (IDC), and 25.76% had tumors located in the upper outer quadrant. Patients classified as G2 and G3 accounted for 22.69% and 36.64%, respectively, while only 2.47% were G1. The molecular subtype HR+/HER2- was present in 44.2% of patients, with dual-positive and dual-negative subtypes accounting for 24.02% and 15.11%, respectively. The distribution of T stages from T1 to T4 was 11.21%, 34.57%, 18.21%, and 36.02%, and for N stages from N0 to N3 was 19.27%, 53.35%, 12.21%, and 15.18%, respectively. Bone metastases or lung metastases were present in 59.54% and 32.74% of patients, respectively, with only 8.95% having brain metastases. Regarding treatment, 23.92% of patients underwent primary tumor surgery, 27.40% received radiotherapy, and the majority (75.77%) received chemotherapy. Additionally, 124 BCLM patients from the JCH and CHSU cohorts were included as the external group. In contrast to the SEER cohort, all patients in the external cohorts were Asian, and only 8.06% underwent surgery.

Table 1

Table 1. Baseline characteristics of patients with breast cancer liver metastases.

Feature selection

We analyzed the correlation between variables and generated a heatmap, which indicated no multicollinearity among the variables (Supplementary Figure S1). For clarity, the Cox regression results presented in Table 2 were performed on the overall training cohort to illustrate variable associations, while per-fold selections were used internally during cross-validation. Univariate Cox regression analysis revealed that age, race, marital status, median household income, histological type, tumor location, subtype, T stage, N stage, surgery, chemotherapy, bone metastases, brain metastases, and lung metastases significantly affected OS. Meanwhile, significantly affecting BCSS were age, race, marital status, median household income, histological type, subtype, tumor location, T stage, N stage, surgery, chemotherapy, bone metastases, brain metastases, and lung metastases.

Table 2

Table 2. Univariate and multivariate Cox analyses of patients with breast cancer liver metastases in the SEER database.

Multivariate Cox regression analysis was employed to further explore independent risk factors affecting prognosis. GVIF suggested that there was no multicollinearity between the variables in the regression model (Supplementary Table S2). Older age, other pathological type and tumor location, HR-/HER2-, T4 stage, and the presence of other distant metastases were associated with poorer OS. Socially, Black and divorced patients had worse OS, whereas patients with household income exceeding $70,000 exhibited better OS. Similarly, patients who underwent surgery and radiotherapy demonstrated improved OS. Moreover, factors indicating worse BCSS included older age, being black, being divorced, other pathological types, HR-/HER2-, T4 stage, and the presence of other distant metastases. In contrast, household income over $70,000, HR+/HER2+, N1 stage, and having undergone surgery and chemotherapy were associated with better BCSS.

Establishment and evaluation of prognostic models

We incorporated significant features into subsequent ML model construction to predict the 6-month, 1-year, 3-year, and 5-year OS and BCSS for BCLM patients. The performance of five ML models in the training and internal test groups was depicted in Figure 2. The RF models demonstrated excellent performance in predicting 6-month OS (Training: AUC=0.850; Internal Test: AUC=0.765), 1-year OS (Training: AUC=0.840; Internal Test: AUC=0.774), 3-year OS (Training: AUC=0.871; Internal Test: AUC=0.774), and 5-year OS (Training: AUC=0.889; Internal Test: AUC=0.786). Additionally, the RF models also performed well in predicting BCSS at 6 months (Training: AUC=0.849; Internal Test: AUC=0.787), 1 year (Training: AUC=0.864; Internal Test: AUC=0.765), 3 years (Training: AUC=0.867; Internal Test: AUC=0.775), and 5 years (Training: AUC=0.862; Internal Test: AUC=0.774). Other ML models such as LR (6-month OS: AUC=0.779; 1-year OS: AUC = 0.748; 3-year OS: AUC = 0.735; 5-year OS: AUC = 0.768; 6-month BCSS: AUC=0.774; 1-year BCSS: AUC = 0.758; 3-year BCSS: AUC = 0.731; 5-year BCSS: AUC = 0.766), XGBoost (6-month OS: AUC=0.797; 1-year OS: AUC = 0.822; 3-year OS: AUC = 0.836; 5-year OS: AUC = 0.864; 6-month BCSS: AUC=0.786; 1-year BCSS: AUC = 0.798; 3-year BCSS: AUC = 0.822; 5-year BCSS: AUC = 0.855), DT (6-month OS: AUC=0.796; 1-year OS: AUC = 0.779; 3-year OS: AUC = 0.787; 5-year OS: AUC = 0.810; 6-month BCSS: AUC=0.793; 1-year BCSS: AUC = 0.777; 3-year BCSS: AUC = 0.826; 5-year BCSS: AUC = 0.857), and GBDT (6-month OS: AUC=0.800; 1-year OS: AUC = 0.813; 3-year OS: AUC = 0.761; 5-year OS: AUC = 0.865; 6-month BCSS: AUC=0.823; 1-year BCSS: AUC = 0.758; 3-year BCSS: AUC = 0.784; 5-year BCSS: AUC = 0.810) showed slightly inferior performance compared to RF in the training group. Moreover, the RF models’ accuracy (6-month OS: 0.776 and 0.745; 1-year OS: 0.761 and 0.714; 3-year OS: 0.764 and 0.711; 5-year OS: 0.807 and 0.732; 6-month BCSS: 0.793 and 0.759; 1-year BCSS: 0.782 and 0.708; 3-year BCSS: 0.784 and 0.714; 5-year BCSS: 0.759 and 0.786) and F1-scores (6-month OS: 0.615 and 0.498; 1-year OS: 0.686 and 0.650; 3-year OS: 0.828 and 0.768; 5-year OS: 0.863 and 0.823; 6-month BCSS: 0.600 and 0.526; 1-year BCSS: 0.703 and 605; 3-year BCSS:0.832 and 0.736; 5-year BCSS: 0.841 and 0.865) in both the training and internal testing groups were satisfactory (Table 3).

Figure 2

Sixteen ROC curves showing model performance for survival outcomes. Panels A-D and I-L illustrate overall survival (OS) for 6-month, 1-year, 3-year, and 5-year forecasts in training and internal test groups. Panels E-H and M-P depict breast cancer-specific survival (BCSS) for the same intervals and groups. Each curve represents different models: Random Forest (RF), Logistic Regression (LR), XGBoost (XGB), Decision Tree (DT). AUC values measure model accuracy, with RF often showing the highest AUC. Sensitivity and specificity are plotted on the y and x axes, respectively.

Figure 2. Receiver operating characteristic curves of five machine learning models in the training (A-H) and internal test (I-P) groups. OS, overall survival; BCSS, breast cancer-specific survival; AUC, area under the curve; RF, random forest; LR, logistic regression; XGBoost, extreme gradient boosting; DT, decision tree; GBDT, gradient boosting decision tree.

Table 3

Table 3. Performance of machine learning prognostic models in the training and internal test groups.

DCA was used to evaluate the clinical utility of the models. The results indicated that all five models provided good net benefits in predicting survival rates at different time points, with the RF models showing a slightly higher net benefit in the training group (Figure 3). Based on 1,000 bootstrap resamples, the RF models also exhibited consistent net benefits across a range of threshold probabilities (Supplementary Table S3). Calibration curves further demonstrated the good agreement between the predicted survival outcomes by the RF model and actual outcomes (Figure 4). The confusion matrix illustrated the performance of the RF classifier in the training and internal test groups (Supplementary Figure S2). Therefore, the RF models were identified as the optimal models for predicting prognosis in BCLM patients.

Figure 3

Sixteen line graphs illustrate net benefit versus risk threshold for different models, including RF, LR, XGBoost, GBDT, ABIB, and None. Panels A-P cover 6 months to 5 years for overall survival (OS) and breast cancer-specific survival (BCSS) in training and internal test groups.

Figure 3. Decision curves for the five machine learning models in the training (A-H) and internal test (I-P) groups. OS, overall survival; BCSS, breast cancer-specific survival; RF, random forest; LR, logistic regression; XGBoost, extreme gradient boosting; DT, decision tree; GBDT, gradient boosting decision tree.

Figure 4

Grid of calibration plots showing predicted versus actual probabilities for overall survival (OS) and breast cancer-specific survival (BCSS) at 6-month, 1-year, 3-year, and 5-year intervals. The charts are divided into training and internal test groups, labeled A to H for training and I to P for testing. Each plot includes curves for apparent, bias-corrected, and ideal predictions, with slopes and intercepts noted for test group plots.

Figure 4. Calibration curves for the random forest models in the training (A-H) and internal test (I-P) groups. OS, overall survival; BCSS, breast cancer-specific survival.

To further validate the robustness and applicability of the RF models, we included a total of 124 BCLM patients from the JCH and CHSU cohorts. The RF models demonstrated excellent performance in the external test cohort, as evidenced by their high discriminatory ability (Figures 5A–H). Specifically, the AUC values were as follows: 6-month OS, 0.779 (95% CI: 0.688–0.871); 1-year OS, 0.803 (95% CI:0.720–0.887); 3-year OS, 0.812 (95% CI: 0.714–0.911); 5-year OS, 0.792 (95% CI: 0.536–0.995); 6-month BCSS, 0.802 (95% CI: 0.714–0.890); 1-year BCSS, 0.801 (95% CI: 0.713–0.889); 3-year BCSS, 0.784 (95% CI: 0.672–0.897); and 5-year BCSS, 0.815 (95% CI: 0.637–0.993). Moreover, calibration curves further indicated good agreement between the predicted and actual outcomes (Figures 5I–P). DCA revealed that the RF models also exhibited good net benefit in the external cohort (Figure 6). Therefore, the RF models were considered the optimal tools for predicting the prognosis of BCLM patients.

Figure 5

Eight graphs displaying ROC curves (A-H) and calibration plots (I-P) for overall survival (OS) and breast cancer-specific survival (BCSS) at different time intervals (6-month, 1-year, 3-year, and 5-year) for an external test group. ROC curves show sensitivity versus 1-specificity with AUC values noted. Calibration plots depict actual versus predicted probabilities with slope and intercept values. Each graph includes a reference line for comparison.

Figure 5. Receiver operating characteristic (A-H) and calibration curves (I-P) of the random forest models in the external test group. OS, overall survival; BCSS, breast cancer-specific survival; AUC, area under the curve.

Figure 6

Eight line graphs display net benefit versus risk threshold. Graphs A to D show outcomes at six months, one year, three years, and five years for overall survival (OS). Graphs E to H illustrate the same time points for breast cancer-specific survival (BCSS). Each graph compares a predictive model, depicted by blue lines, against baseline and no-treatment strategies shown in red and green.

Figure 6. Decision curves for the random forest models in the external test group. OS, overall survival; BCSS, breast cancer-specific survival.

Sensitivity analysis

Since 38.20% of patients had unknown grade, we excluded these cases to validate the robustness of the RF models. The RF models were retrained with the same hyperparameters as the original model (Supplementary Table S4) and evaluated on the identical internal and external test groups. The retrained RF models achieved AUCs ranging from 0.858 to 0.899 in the training group, 0.741 to 0.831 in the internal test group, and 0.772 to 0.848 in the external test group (Supplementary Figure S3). Given that performance remained comparable, these findings suggested that including patients with unknown grade did not materially compromise the robustness of our RF models.

Interpretability of the RF models

The SHAP analysis provided an explanation of how the RF models predicted survival outcomes and calculated the importance of features. Figures 7A–H showed the SHAP values for each feature at different levels. As the feature values increased, the redder the color became, and vice versa, the bluer the color was. Additionally, we ranked the features of the models (Figures 7I–P). The higher feature ranking indicated that the feature was more important, which meant that the feature contributed more to the model. Overall, chemotherapy, age, subtype, and surgery were the most valued by the RF models for predicting 6-month, 1-year, 3-year, and 5-year OS and BCSS. Additionally, two representative samples were provided for each model to elucidate the composition of the model outputs. Each feature’s weight was depicted in blue or red, indicative of its contribution toward a positive outcome. The value of each feature denoted its proportionate influence on the final score. Supplementary Figures S4A–H presented the detailed scores of surviving patients predicted to be alive, whereas Supplementary Figures S4I–P displayed the scores of dying patients predicted to be dead.

Figure 7

Graphs display statistical data for overall survival (OS) and breast cancer-specific survival (BCSS) across multiple time intervals: 6 months, 1 year, 3 years, and 5 years. Panels A to H show hazard ratios with forest plots highlighting various factors like chemotherapy, age, and subtype. Panels I to P present bar charts of their respective contributions with mean absolute SHAP (SHapley Additive exPlanations) values. Red and blue colors distinguish influence strengths. Each graph compares numerous factors affecting survival outcomes in breast cancer patients over time.

Figure 7. SHAP interprets the random forest models. SHAP values for each feature at different levels in the random forest models (A-H); Importance of features in the random forest models (I-P). OS, overall survival; BCSS, breast cancer-specific survival.

Prognostic impact of PTS in BCLM patients

Baseline characteristics of the PTS and non-PTS groups were shown in Supplementary Table S5. The adjusted time-dependent Cox model demonstrated a reduced risk of OS events in the PTS group (HR=0.80, 95% CI: 0.72–0.88; P< 0.001; Table 4). Similarly, BCSS risk was also lower (HR=0.77, 95% CI: 0.69–0.86; P< 0.001). The Simon–Makuch curves illustrated survival differences over time between the PTS and non-PTS groups (Figures 8A, B). As shown in Figures 8C, D, even after accounting for time-varying effects, the instantaneous HRs for both OS and BCSS remained consistently< 1 across the majority of the follow-up period.

Table 4

Table 4. Time-dependent Cox analyses of patients with breast cancer liver metastases in the SEER database.

Figure 8

Two graphs illustrate survival rates among PTS and non-PTS groups over 125 months post-diagnosis. Chart A shows overall survival with a significant PTS effect (p = 1.60e-05). Chart B presents breast cancer-specific survival with similar results (p = 2.95e-06). Charts C and D display the time-varying hazard ratio of PTS for overall survival (OS) and breast cancer-specific survival (BCSS), respectively, with curves trending upward, indicating a changing risk over time.

Figure 8. Simon–Makuch plots with primary tumor surgery as a time-dependent variable for overall survival (A) and breast cancer–specific survival (B). Time-varying hazard ratio curves for the effect of primary tumor surgery on overall survival (C) and breast cancer–specific survival (D). PTS, primary tumor surgery; OS, overall survival; BCSS, breast cancer-specific survival.

The piecewise Cox model further supported these findings: for OS, the HR was 0.83 (95% CI: 0.75–0.92) within the first 24 months and 0.73 (95% CI: 0.61–0.88) thereafter; for BCSS, the HR was 0.80 (95% CI: 0.72–0.89) within 0–24 months and 0.68 (95% CI: 0.56–0.83) beyond 24 months (Supplementary Table S6). The 2-month landmark analysis similarly showed that patients receiving PTS after systemic therapy had improved OS (HR=0.60, 95% CI: 0.37–0.97; p = 0.038) and BCSS (HR=0.60, 95% CI: 0.36–0.98; p = 0.043), consistent with the primary model (Supplementary Table S7). We further quantified the robustness of our findings using the E-value. For OS and BCSS, the E-values were 1.81 and 1.92, respectively (lower bound E-values: OS=1.53; BCSS=1.60). This suggested that to nullify the observed survival benefits of PTS, an unmeasured confounder would need to be associated with both PTS treatment and the outcomes with an HR ≥ 1.81 or 1.92. Similarly, in the 2-month landmark analysis, the E-value for OS (HR=0.60) was 2.72 (lower bound: 1.21), and for BCSS (HR=0.60) was 2.72 (lower bound: 1.16). These results indicated that only an unmeasured confounder with at least moderate strength could substantially alter our conclusions. In summary, our findings confirm the survival benefit of PTS in improving both OS and BCSS.

Discussion

Although the prevalence of BC screening has led to a year-by-year decrease in the proportion of newly diagnosed patients with advanced-stage BC, approximately 6%–7% of newly diagnosed BC patients still present with metastatic lesions at the initial diagnosis (27). Patients with liver-only metastasis are relatively rare, accounting for 5%–12% of cases (3). Individuals with BCLM generally exhibit weaker physical and nutritional conditions. This vulnerability heightens the risk of encountering complications during treatment, including liver failure, jaundice, continuous ascites, portal vein thrombosis, and cachexia, potentially leading to premature mortality (28). Furthermore, most chemotherapeutic and endocrine agents, such as taxanes, fluoropyrimidines, aromatase inhibitors, and CDK4/6 inhibitors, are metabolized by the liver, thereby increasing its workload. Consequently, it is imperative to comprehensively consider the status of metastatic organs, overall nutritional condition, anticipated survival, as well as the efficacy and safety of the treatment in the treatment decision-making for BCLM patients. This study aimed to develop novel predictive models for OS and BCSS in BCLM patients using ML algorithms, facilitating the formulation of effective treatment strategies. To our knowledge, this was the largest study to date analyzing prognosis and PTS in BCLM patients. For the first time, we had developed OS and BCSS prediction models based on five different ML algorithms. Our RF models demonstrated outstanding accuracy in predicting 6-month, 1-year, 3-year, and 5-year OS and BCSS for BCLM patients.

Our study identified several independent risk factors significantly associated with OS and BCSS, including older age, being African American, being divorced, having other pathological types, HR-/HER2- status, stage T4, and the presence of additional distant metastases. Conversely, protective factors included a household income exceeding $60,000, HR+/HER2+ status, stage N1, and the administration of surgery and chemotherapy. Recent studies have corroborated that older age correlates with poorer OS and BCSS (8, 29, 30). Our findings revealed that HR+/HER2+ patients exhibited the most favorable prognosis, whereas those with HR-/HER2- had poorer outcomes. Historically, luminal-type BC was deemed to have a relatively favorable prognosis. However, with the widespread use of HER2-targeted therapies, similar survival rates have been observed among patients with advanced BC in the HR+/HER2- and HR-/HER2+ subtypes. For HR+ patients, the combination of CDK4/6 inhibitors and aromatase inhibitors as first-line treatment significantly enhanced progression-free survival (PFS) in advanced cases (31, 32). Data from the PHOEBE trial demonstrated that for HER2+ patients with visceral metastases, pyrrolitinib combined with capecitabine extended PFS to 12.5 months, compared to lapatinib combined with capecitabine in those previously treated with trastuzumab and paclitaxel (33). Furthermore, consistent with other studies findings, chemotherapy has been shown to significantly improve patient prognosis (8, 34). The T4 stage indicates locally advanced disease, reflecting the extent of primary tumor invasion and burden, and is often associated with a poorer prognosis. Unlike oligometastatic disease, the presence of concurrent metastases to other organs in BCLM patients is another critical factor leading to adverse outcomes (35, 36).

In our comparative analysis of five ML models, the RF algorithm emerged as the best-performing model. It outperformed all other models in the large American cohort and was consistently validated in cross-national cohorts. Chen et al. constructed a nomogram to predict OS in BCLM patients, which achieved an internal concordance index (C-index) of 0.685 (29). A recent study built a traditional nomogram prediction model for young BCLM patients using a single SEER dataset (34), which showed AUCs of 0.753, 0.773, and 0.761 for predicting 1-year, 2-year, and 5-year OS, respectively, and 0.755, 0.781, and 0.787 for 1-year, 2-year, and 5-year BCSS, respectively. In contrast, our RF model demonstrated superior predictive performance, with AUCs ranging from 0.840 to 0.889 for predicting OS at different time points in the training group, and 0.763 to 0.775 for the internal test group. Furthermore, the RF model showed AUCs ranging from 0.849 to 0.867 for predicting BCSS at different time points in the training group, and 0.765 to 0.787 in the internal test group. Importantly, our model was validated using independent cohorts from two hospitals in China, with AUC values ranging from 0.779 to 0.815. To probe robustness to missingness in key clinicopathologic variables, we performed a sensitivity analysis that excluded patients with unknown grade and retrained the RF models with the same hyperparameters; the models maintained comparable discrimination across training, internal, and external cohorts (AUCs ranged 0.858–0.899, 0.741–0.831, and 0.772–0.848, respectively), indicating that our findings are not materially drivenxby grade-related missingness. Collectively, these results underscore the stability and generalizability of the RF framework across populations and analytic specifications. The enhanced predictive performance provided more reliable evidence for clinicians in managing and stratifying BCLM patients. Meanwhile, DCA confirmed that our RF model demonstrated excellent clinical utility.

Studies have shown that PTS in advanced BC can improve local symptoms, significantly prolong progression free survival (PFS), and enhance quality of life (37–39). However, there remains controversy over whether PTS can extend OS. The TATA study in India found that in patients with stage IV BC who responded to first-line therapy, PTS did not provide an OS benefit (23). Similarly, the EA2108 study, after 4–8 months of systemic treatment, performed PTS in patients with effective systemic therapy but did not extend OS, although it did improve PFS (39). However, no definitive conclusion has been reached on whether PTS should be performed in BCLM patients. In our study, to reduce selection bias and, critically, to minimize immortal-time bias, we modeled PTS as a time-dependent exposure and applied a suite of complementary analyses, including time-dependent Cox models with time-varying effects, Simon-Makuch curves, a segmented Cox model, and a 2-month landmark analysis. Across these approaches, PTS was consistently associated with lower risks of death and BC–specific mortality (OS: HR=0.80, 95% CI 0.72–0.88; BCSS: HR=0.77, 95% CI 0.69–0.86), with hazard ratios remaining<1 for most of follow-up. Effect sizes were comparable in piecewise analyses (e.g., OS HR=0.83 within 24 months and 0.73 beyond 24 months; BCSS HR=0.80 and 0.68, respectively) and persisted in the landmark cohort (OS HR=0.60; BCSS HR=0.60). The SEER database lacks information on endocrine and targeted therapies, both of which are established prognostic factors in BCLM; their absence may introduce residual confounding. E-values (OS 1.81; BCSS 1.92; landmark OS/BCSS 2.72) indicate that only an unmeasured confounder of at least moderate strength could negate these associations, supporting the robustness of our findings. While causality cannot be definitively established in an observational setting, our results are consistent with longer survival being associated with PTS among carefully selected BCLM patients—particularly those managed with contemporary systemic therapy—supporting careful consideration of individualized, multidisciplinary evaluation rather than a blanket proscription or endorsement.

Despite the promising results, there are several limitations to our study. First, the SEER database lacks granular treatment details (e.g., endocrine and targeted therapies) and progression/recurrence data, which may limit predictive precision. Although time-dependent Cox modeling, landmark analyses, and E-value calculations were used to mitigate immortal-time bias and test robustness, unmeasured factors may still affect the observed PTS associations. Additionally, missing grade information—despite supportive sensitivity analyses—could introduce residual bias. Furthermore, limited sample sizes in certain subgroups precluded stratified analyses, thereby restricting assessment of model stability across subsets. Finally, our external validation cohorts were relatively small and regionally restricted, underscoring the need for larger multicenter studies.

Conclusion

In conclusion, we developed and externally validated robust RF-based models for predicting OS and BCSS in BCLM patients. Our results indicate that PTS is associated with longer survival and lower breast cancer-specific mortality in carefully selected patients, supporting consideration within individualized, multidisciplinary decision-making.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement

The studies involving humans were approved by the Ethics Committees of Jiangmen Central Hospital (approval no. 2024312) and Cancer Hospital of Shantou University Medical College (approval no. 2024216). The studies were conducted in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required from the participants or the participants' legal guardians/next of kin in accordance with the national legislation and institutional requirements.

Author contributions

CC: Writing – original draft, Formal analysis, Data curation. JW: Validation, Data curation, Writing – review & editing. BX: Conceptualization, Writing – review & editing. WL: Conceptualization, Validation, Writing – review & editing. CZ: Writing – review & editing. ZY: Writing – review & editing. QpZ: Writing – review & editing. RL: Writing – review & editing. MS: Writing – review & editing. YD: Writing – review & editing. YF: Writing – review & editing. YL: Writing – review & editing. QcZ: Writing – original draft, Funding acquisition, Conceptualization, Writing – review & editing.

Funding

The author(s) declare financial support was received for the research and/or publication of this article. This work was supported by the Guangdong Provincial Medical Science and Technology Research Fund Project (B2025738), and Youth Science Foundation of Jiangmen Central Hospital (No. J202404). The name of the first fund was omitted.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Generative AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fendo.2025.1656191/full#supplementary-material

Glossary

AUC: area under the curve

BC: breast cancer

BCSS: breast cancer-specific survival

BCLM: breast cancer liver metastases

CHSU: Cancer Hospital of Shantou University Medical College

CI: confidence interval

DCA: decision curve analysis

DT: decision tree

HER2: human epidermal growth factor receptor 2

GBDT: gradient boosting decision tree

GVIF: generalized variance inflation factor

HR: hazard ratio

ICD-O-3: International Classification of Diseases for Oncology, 3rd Edition

IDC: invasive ductal carcinoma

IPCW: inverse probability-of-censoring weighting

JCH: Jiangmen Central Hospital

LR: logistic regression

ML: machine learning

OS: overall survival

PTS: primary tumor surgery

RF: Random Forest

SEER: Surveillance, Epidemiology, and End Results

SHAP: SHAPley Additive exPlanations

XGBoost: extreme gradient boosting

References

1. Bray F, Laversanne M, Sung H, Ferlay J, Siegel RL, Soerjomataram I, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. (2024) 74:229–63. doi: 10.3322/caac.21834

PubMed Abstract | Crossref Full Text | Google Scholar

2. Gerratana L, Fanotto V, Bonotto M, Bolzonello S, Minisini AM, Fasola G, et al. Pattern of metastasis and outcome in patients with breast cancer. Clin Exp Metastasis. (2015) 32:125–33. doi: 10.1007/s10585-015-9697-2

PubMed Abstract | Crossref Full Text | Google Scholar

3. Igci A and Özkurt E. (2019), 597–606.

Google Scholar

4. Cristofanilli M and Hortobagyi GN. New horizons in treating metastatic disease. Clin Breast Cancer. (2001) 1:276–87. doi: 10.3816/CBC.2001.n.002

PubMed Abstract | Crossref Full Text | Google Scholar

5. Adam R, Aloia T, Krissat J, Bralet MP, Paule B, Giacchetti S, et al. Is liver resection justified for patients with hepatic metastases from breast cancer. Ann Surg. (2006) 244:897–907; discussion 907-8. doi: 10.1097/01.sla.0000246847.02058.1b

PubMed Abstract | Crossref Full Text | Google Scholar

6. Eng LG, Dawood S, Sopik V, Haaland B, Tan PS, Bhoo-Pathy N, et al. Ten-year survival in women with primary stage IV breast cancer. Breast Cancer Res Treat. (2016) 160:145–52. doi: 10.1007/s10549-016-3974-x

PubMed Abstract | Crossref Full Text | Google Scholar

7. Rashid NS, Grible JM, Clevenger CV, and Harrell JC. Breast cancer liver metastasis: current and future treatment approaches. Clin Exp Metastasis. (2021) 38:263–77. doi: 10.1007/s10585-021-10080-4

PubMed Abstract | Crossref Full Text | Google Scholar

8. Ji L, Cheng L, Zhu X, Gao Y, Fan L, and Wang Z. Risk and prognostic factors of breast cancer with liver metastases. BMC Cancer. (2021) 21:238. doi: 10.1186/s12885-021-07968-5

PubMed Abstract | Crossref Full Text | Google Scholar

9. Giordano C, Brennan M, Mohamed B, Rashidi P, Modave F, and Tighe P. Accessing artificial intelligence for clinical decision-making. Front Digit Health. (2021) 3:645232. doi: 10.3389/fdgth.2021.645232

PubMed Abstract | Crossref Full Text | Google Scholar

10. Jayatilake S and Ganegoda GU. Involvement of machine learning tools in healthcare decision making. J Healthc Eng. (2021) 2021:6679512. doi: 10.1155/2021/6679512

PubMed Abstract | Crossref Full Text | Google Scholar

11. Haug CJ and Drazen JM. Artificial intelligence and machine learning in clinical medicine, 2023. N Engl J Med. (2023) 388:1201–8. doi: 10.1056/NEJMra2302038

PubMed Abstract | Crossref Full Text | Google Scholar

12. Rajula H, Verlato G, Manchia M, Antonucci N, and Fanos V. Comparison of conventional statistical methods with machine learning in medicine: diagnosis, drug development, and treatment. Med (Kaunas). (2020) 56:455. doi: 10.3390/medicina56090455

PubMed Abstract | Crossref Full Text | Google Scholar

13. Ngiam KY and Khor IW. Big data and machine learning algorithms for health-care delivery. Lancet Oncol. (2019) 20:e262–73. doi: 10.1016/S1470-2045(19)30149-4

PubMed Abstract | Crossref Full Text | Google Scholar

14. Ley C, Martin RK, Pareek A, Groll A, Seil R, and Tischer T. Machine learning and conventional statistics: making sense of the differences. Knee Surg Sports Traumatol Arthrosc. (2022) 30:753–7. doi: 10.1007/s00167-022-06896-6

PubMed Abstract | Crossref Full Text | Google Scholar

15. Petrelli F and Barni S. Surgery of primary tumors in stage IV breast cancer: an updated meta-analysis of published studies with meta-regression. Med Oncol. (2012) 29:3282–90. doi: 10.1007/s12032-012-0310-0

PubMed Abstract | Crossref Full Text | Google Scholar

16. Harris E, Barry M, and Kell MR. Meta-analysis to determine if surgical resection of the primary tumour in the setting of stage IV breast cancer impacts on survival. Ann Surg Oncol. (2013) 20:2828–34. doi: 10.1245/s10434-013-2998-2

PubMed Abstract | Crossref Full Text | Google Scholar

17. Quinn EM, Kealy R, O’Meara S, Whelan M, Ennis R, Malone C, et al. Is there a role for locoregional surgery in stage IV breast cancer. Breast. (2015) 24:32–7. doi: 10.1016/j.breast.2014.10.009

PubMed Abstract | Crossref Full Text | Google Scholar

18. Tan Y, Li X, Chen H, Hu Y, Jiang M, Fu J, et al. Hormone receptor status may impact the survival benefit of surgery in stage iv breast cancer: a population-based study. Oncotarget. (2016) 7:70991–1000. doi: 10.18632/oncotarget.11235

PubMed Abstract | Crossref Full Text | Google Scholar

19. Thomas A, Khan SA, Chrischilles EA, and Schroeder MC. Initial surgery and survival in stage IV breast cancer in the United States, 1988-2011. JAMA Surg. (2016) 151:424–31. doi: 10.1001/jamasurg.2015.4539

PubMed Abstract | Crossref Full Text | Google Scholar

20. Vohra NA, Brinkley J, Kachare S, and Muzaffar M. Primary tumor resection in metastatic breast cancer: A propensity-matched analysis, 1988–2011 SEER data base. Breast J. (2018) 24:549–54. doi: 10.1111/tbj.13005

PubMed Abstract | Crossref Full Text | Google Scholar

21. Soran A, Ozmen V, Ozbas S, Karanlik H, Muslumanoglu M, Igci A, et al. Randomized trial comparing resection of primary tumor with no surgery in stage IV breast cancer at presentation: protocol MF07-01. Ann Surg Oncol. (2018) 25:3141–9. doi: 10.1245/s10434-018-6494-6

PubMed Abstract | Crossref Full Text | Google Scholar

22. Khan SA, Zhao F, Solin LJ, Goldstein LJ, Cella D, Basik M, et al. A randomized phase III trial of systemic therapy plus early local therapy versus systemic therapy alone in women with de novo stage IV breast cancer: A trial of the ECOG-ACRIN Research Group (E2108). J Clin Oncol. (2020) 38:LBA2–2. doi: 10.1200/JCO.2020.38.18_suppl.LBA2

Crossref Full Text | Google Scholar

23. Badwe R, Hawaldar R, Nair N, Kaushik R, Parmar V, Siddique S, et al. Locoregional treatment versus no treatment of the primary tumour in metastatic breast cancer: an open-label randomised controlled trial. Lancet Oncol. (2015) 16:1380–8. doi: 10.1016/S1470-2045(15)00135-7

PubMed Abstract | Crossref Full Text | Google Scholar

24. Shien T, Hara F, Aogi K, Yanagita Y, Tsuneizumi M, Yamamoto N, et al. A randomized controlled trial comparing primary tumor resection plus systemic therapy with systemic therapy alone in metastatic breast cancer (PRIM-BC): Japan Clinical Oncology Group study JCOG1017. J Clin Oncol. (2023) 41:523–3. doi: 10.1200/JCO.2023.41.16_suppl.523

PubMed Abstract | Crossref Full Text | Google Scholar

25. Collins GS, Dhiman P, Andaur Navarro CL, Ma J, Hooft L, Reitsma JB, et al. Protocol for development of a reporting guideline (TRIPOD-AI) and risk of bias tool (PROBAST-AI) for diagnostic and prognostic prediction model studies based on artificial intelligence. BMJ Open. (2021) 11:e048008. doi: 10.1136/bmjopen-2020-048008

PubMed Abstract | Crossref Full Text | Google Scholar

26. Uno H, Cai T, Pencina MJ, D’Agostino RB, and Wei LJ. On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat Med. (2011) 30:1105–17. doi: 10.1002/sim.4154

PubMed Abstract | Crossref Full Text | Google Scholar

27. Khodari W, Sedrati A, Naisse I, Bosc R, and Belkacemi Y. Impact of loco-regional treatment on metastatic breast cancer outcome: a review. Crit Rev Oncol Hematol. (2013) 87:69–79. doi: 10.1016/j.critrevonc.2012.12.005

PubMed Abstract | Crossref Full Text | Google Scholar

28. Howlader M, Heaton N, and Rela M. Resection of liver metastases from breast cancer: towards a management guideline. Int J Surg. (2011) 9:285–91. doi: 10.1016/j.ijsu.2011.01.009

PubMed Abstract | Crossref Full Text | Google Scholar

29. Chen QF, Huang T, Shen L, Wu P, Huang ZL, and Li W. Prognostic factors and survival according to tumor subtype in newly diagnosed breast cancer with liver metastases: A competing risk analysis. Mol Clin Oncol. (2019) 11:259–69. doi: 10.3892/mco.2019.1890

PubMed Abstract | Crossref Full Text | Google Scholar

30. Liu S, Jia Y, Chai J, Ge H, Huang R, Li A, et al. A predictive model for the early death of breast cancer with synchronous liver metastases: A population-based study. Cancer Control. (2023) 30:10732748231202851. doi: 10.1177/10732748231202851

PubMed Abstract | Crossref Full Text | Google Scholar

31. Goetz MP, Toi M, Campone M, Sohn J, Paluch-Shimon S, Huober J, et al. MONARCH 3: abemaciclib as initial therapy for advanced breast cancer. J Clin Oncol. (2017) 35:3638–46. doi: 10.1200/JCO.2017.75.6155

PubMed Abstract | Crossref Full Text | Google Scholar

32. Bialas M, Pankowska V, Bondos J, and Roszkowski K. Comparative assessment of hypofractionated radiation therapy and standard treatment in patients with prostate cancer. J Clin Oncol. (2016) 34:e16616–6. doi: 10.1200/JCO.2016.34.15_suppl.e16616

Crossref Full Text | Google Scholar

33. Xu B, Yan M, Ma F, Hu X, Feng J, Ouyang Q, et al. Pyrotinib plus capecitabine versus lapatinib plus capecitabine for the treatment of HER2-positive metastatic breast cancer (PHOEBE): a multicentre, open-label, randomised, controlled, phase 3 trial. Lancet Oncol. (2021) 22:351–60. doi: 10.1016/S1470-2045(20)30702-6

PubMed Abstract | Crossref Full Text | Google Scholar

34. Pu CC, Yin L, and Yan JM. Risk factors and survival prediction of young breast cancer patients with liver metastases: a population-based study. Front Endocrinol (Lausanne). (2023) 14:1158759. doi: 10.3389/fendo.2023.1158759

PubMed Abstract | Crossref Full Text | Google Scholar

35. Golse N and Adam R. Liver metastases from breast cancer: what role for surgery? Indications and results. Clin Breast Cancer. (2017) 17:256–65. doi: 10.1016/j.clbc.2016.12.012

PubMed Abstract | Crossref Full Text | Google Scholar

36. Wang M, Zhang J, Ji S, Shao G, Zhao K, Wang Z, et al. Transarterial chemoembolisation for breast cancer with liver metastasis: A systematic review. Breast. (2017) 36:25–30. doi: 10.1016/j.breast.2017.09.001

PubMed Abstract | Crossref Full Text | Google Scholar

37. Si Y, Yuan P, Hu N, Wang X, Ju J, Wang J, et al. Primary tumor surgery for patients with de novo stage IV breast cancer can decrease local symptoms and improve quality of life. Ann Surg Oncol. (2020) 27:1025–33. doi: 10.1245/s10434-019-08092-2

PubMed Abstract | Crossref Full Text | Google Scholar

38. Yu Y, Hong H, Wang Y, Fu T, Chen Y, Zhao J, et al. Clinical evidence for locoregional surgery of the primary tumor in patients with de novo stage IV breast cancer. Ann Surg Oncol. (2021) 28:5059–70. doi: 10.1245/s10434-021-09650-3

PubMed Abstract | Crossref Full Text | Google Scholar

39. Khan SA, Zhao F, Goldstein LJ, Cella D, Basik M, Golshan M, et al. Early local therapy for the primary site in de novo stage IV breast cancer: results of a randomized clinical trial (EA2108). J Clin Oncol. (2022) 40:978–87. doi: 10.1200/JCO.21.02006

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: breast cancer liver metastases, machine learning, prognosis, surgery, random forest

Citation: Chen C, Wu J, Xu B, Li W, Zhong C, Yan Z, Zhong Q, Li R, Shao M, Dong Y, Fang Y, Li Y and Zhang Q (2025) Machine learning algorithms for individualized prediction of prognosis in breast cancer liver metastases and the prognostic impact of primary tumor surgery: a multicenter study. Front. Endocrinol. 16:1656191. doi: 10.3389/fendo.2025.1656191

Received: 29 June 2025; Accepted: 23 September 2025;
Published: 13 October 2025.

Edited by:

Mostafa Elhosseini, Mansoura University, Egypt

Reviewed by:

Shaochun Liu, Second Hospital of Anhui Medical University, China
Mohammed Qaraad, Hormel Foundation, United States

Copyright © 2025 Chen, Wu, Xu, Li, Zhong, Yan, Zhong, Li, Shao, Dong, Fang, Li and Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Qunchen Zhang, cWN6aGFuZzIwMTRAMTYzLmNvbQ==; Yong Li, ZG9jbGVvMTk4NUBzaW5hLmNvbQ==

^†These authors have contributed equally to this work

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.