- 1The Second Affiliated Hospital, Jiangxi Medical College, Nanchang University, Nanchang, China
- 2Department of Urology, Second Affiliated Hospital of Nanchang University, Nanchang, China
Background and purpose: The occurrence of bone metastasis (BM) in advanced bladder cancer (BC) often signifies a poor prognosis. Currently, the accurate prediction of BM in BC remains a challenge. This study develops predictive models using machine learning algorithms to predict bladder cancer bone metastasis (BCBM) and aid in personalized clinical decisions.
Patients and methods: We reviewed and analyzed data from patients diagnosed with BC between 2010 and 2015 in the Surveillance, Epidemiology, and End Results (SEER) database. In addition, we included 327 patients treated at the Second Affiliated Hospital of Nanchang University and Jiangxi Cancer Hospital as an external validation cohort. Independent risk factors for BM in patients with BC were identified through univariate and multivariate logistic regression analyses. These features were then integrated into seven machine learning algorithms to build predictive models: logistic regression (LR), support vector machine (SVM), gradient boosting machine (GBM), neural network (NN), random forest (RF), extreme gradient boosting (XGB), and k-nearest neighbors (KNN). The performance of these models was evaluated using the area under the receiver operating characteristic curve (AUC), along with accuracy, sensitivity (recall), and specificity.
Results: A total of 22,114 patients diagnosed with BC were included in this study, with 537 (2.4%) patients developing BM. The identified independent risk factors for BCBM included age, race, tumor histology, tumor grade, T stage, N stage, the presence of brain metastasis, liver metastasis, and lung metastasis, and history of radiotherapy. Among the seven developed machine learning models, the tree-based GBM model exhibited the best performance in the test set, achieving AUC, accuracy, sensitivity, and specificity values of 0.855, 0.813, 0.733, and 0.815, respectively. The GBM model also demonstrated robust performance in the external validation set, achieving an AUC of 0.766 and accuracy of 0.945. According to Shapley additive explanations (SHAP), the most significant feature in the GBM prediction model is the T stage, followed by the N stage and radiotherapy.
Conclusion: The GBM model offers a precise and personalized approach to predicting BCBM, potentially enhancing clinical decision-making and the efficiency of BM screening in patients with BC.
Introduction
Bladder cancer (BC) is the second most common urogenital cancer (1). Worldwide, it ranks as the ninth most prevalent cancer, with approximately 614,000 new cases and 220,000 deaths reported in 2022 (2). BC is characterized by a high rate of recurrence and metastasis (3). Metastatic bladder cancer (mBC) primarily spreads to the lymph nodes, the bones, the lungs, and the liver (4). Approximately 10%–15% of patients with BC are diagnosed with metastasis at the initial presentation (5), with the bone being the most common site of metastasis (6, 7). Bone metastasis (BM) can lead to skeletal-related events (SREs), which often result in complications such as pain, hypercalcemia, spinal cord compression, pathological fractures, and neurological deficits. These complications significantly diminish the patient’s quality of life (8) and adversely affect survival rates (9), with the 1-year survival rate for patients with bladder cancer bone metastasis (BCBM) as low as 21% (10). The TNM staging system established by the American Joint Cancer Committee (AJCC) is widely recognized for predicting the metastasis risk and the prognosis of various cancer patients (11). However, the TNM system does not account for additional risk factors such as age, gender, and previous treatment history, which have been shown to be valuable in predicting BC metastasis (12, 13). Consequently, the predictive accuracy of the TNM staging system for patients with BM may be limited. Many patients with BC may not receive a timely diagnosis of BM, potentially missing optimal treatment windows and leading to poorer prognosis. Therefore, accurately predicting the occurrence of BM in patients with BC is of great significance.
In recent years, artificial intelligence (AI) models based on machine learning (ML) algorithms have been increasingly integrated into clinical practice (14, 15). As a key branch of AI, ML has been utilized to independently extract features from large datasets and construct high-precision prediction models, continuously optimizing the performance of these algorithms.
In medical research, the construction and validation of models based on ML can uncover potential patterns in large clinical datasets, providing valuable tools for early diagnosis and prognosis assessment. ML has been widely applied in the prognostic evaluation of prostate cancer, kidney cancer, and gastrointestinal cancer, as well as in studies of organ metastasis (16, 17). The rapid advancement of health big data in biomedical science has revealed the significant potential of ML applications in understanding disease and in health management (18).
Currently, there are limited studies exploring ML models for the prediction of BCBM. In this study, we evaluated seven ML algorithms and observed that, among them, the gradient boosting machine (GBM) model showed relatively better performance. This study extracted data on patients with BC, as well as their clinical and pathological characteristics, from the Surveillance, Epidemiology, and End Results (SEER) database for the years 2010–2015. Accurate and reliable ML models to predict BCBM were constructed, which could assist clinicians in promptly identifying patients with BM. This approach aims to provide personalized clinical strategies for patients and promote the rational allocation of medical resources.
Methods
Ethics statement
The SEER database is a publicly available, anonymized cancer registry where all patient data have been de-identified. Therefore, this study was exempt from ethics review and patient consent requirements.
Patient selection and variables
All data were extracted from the SEER database using SEERStat software (version 8.4.4). This database covers approximately 28% of the US population and includes 17 population-based cancer registries, providing clinicopathological, demographic, and survival outcome information. The case listing was based on the dataset of Incidence—SEER Research Data, 17 Registries, Nov 2023 Sub (2000–2021). Subjects with BC were identified using site codes C67.0–C67.9. In this study, patients with a diagnosis of malignant BC by positive histology diagnosed between 2010 and 2015 were selected. The exclusion criteria were as follows: 1) patients under the age of 18 years; 2) patients with unknown AJCC T or N staging; 3) patients with unknown race or histological grade; 4) patients with unknown bone, brain, liver, or lung metastasis status; 5) patients with unknown radiotherapy or chemotherapy information; and 6) patients with two or more primary tumors. The flowchart for the case screening is shown in Figure 1. The external validation cohort comprised 327 patients with pathologically confirmed BC diagnosed between 2016 and 2023, among whom 11 developed BM. The final follow-up was completed in November 2024. This study was approved by the Institutional Review Boards of the Second Affiliated Hospital of Nanchang University and Jiangxi Cancer Hospital, with a waiver of informed consent granted. A total of 13 variables related to patient demographics and clinicopathological characteristics were extracted for analysis. The demographic variables included age, sex, and race, while the clinicopathological variables included tumor histology type, tumor grade, T stage, N stage, radiotherapy, chemotherapy, brain metastasis, BM, lung metastasis, and liver metastasis. Patient age was categorized into three subgroups, <60 years, 60–80 years, and >80 years, and the tumor grade into two subgroups. The histological types were classified into transitional cell carcinoma, squamous cell carcinoma, adenocarcinoma, and other types. All cancer patients exhibited histopathological and morphological evidence consistent with the International Classification of Diseases for Oncology, Third Edition (ICD-O-3), and all BC patients were staged according to the AJCC 7th Edition guidelines and the SEER staging information.
Data processing and feature engineering
All statistical analyses and data descriptions were conducted using R version 4.4.1 and SPSS version 27. The continuous variable age was converted into a categorical variable, which was then processed using the label encoding method. In this study, logistic regression analysis was performed on the variables collected from the SEER database using R software to identify features suitable for ML models. Significant variables in patients with BCBM were identified through univariate logistic regression analysis (p < 0.05). These variables were subsequently included in a multifactorial logistic regression analysis, and the ML models were built using the variables that remained statistically significant (p < 0.05) in the multivariate analysis. Correlation analysis was conducted to examine the relationships between the selected variables. In addition, to compare the importance of each feature, the feature importance in the ML model was extracted based on the principle of permutation importance. Finally, the importance of each feature was ranked using Shapley additive explanations (SHAP), helping decision-makers understand how to effectively utilize the model and comprehend the impact of each feature on the final predicted outcome. To achieve this, SHAP was employed to quantify the contribution of each feature to the model predictions, providing a transparent and interpretable analysis. Given that this dataset is unbalanced, which may affect the model performance, the synthetic minority oversampling technique (SMOTE) was employed as the sampling method in the training set to mitigate the impact of sample imbalance on the evaluation results.
Model construction and evaluation
The data from the SEER database were randomly divided into a training set and a test set at a ratio of 7:3. In this study, seven ML algorithms were selected, including three tree-based models [random forest (RF), GBM, and extreme gradient boosting (XGB)]; a linear model (logistic regression, LR); a kernel-based model (support vector machine, SVM); a distance-based model (k-nearest neighbors, KNN); and neural networks. External validation was subsequently conducted to further evaluate the generalizability of the model. The evaluation indicators for the ML algorithms included the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, and specificity. The ML models were developed using the caret framework in R software. The relevant parameters of the model can be found in Supplementary Table S1.
Results
Patient characteristics and metastasis
A total of 22,114 patients with BC were included in this study. At the time of initial diagnosis, 21,577 patients (97.6%) had no BM, while 537 patients (2.4%) had BM. The patients were randomly divided into a training set (n = 15,480) and a test set (n = 6,634) at a 7:3 ratio. In the external validation cohort, 316 patients (96.6%) showed no evidence of BM, while 11 patients (3.4%) developed BM. The characteristics of all cohorts are presented in Tables 1 and 2.
Feature filter
A total of 10 independent risk factors related to BM were identified through univariate and multivariate logistic regression analyses. These included age, race, tumor histology, tumor grade, T stage, N stage, radiotherapy, brain metastasis, lung metastasis, and liver metastasis (p < 0.05) (Table 3). Among these, the three most significant risk factors were brain metastasis (OR = 5.98, 95%CI = 2.37–15.14), liver metastasis (OR = 5.89, 95%CI = 4.05–8.56), and lung metastasis (OR = 5.87, 95%CI = 4.25–8.09). Based on these features, seven different models were developed in this study using ML algorithms.
Importance of correlation analysis and features for prediction
Spearman’s correlation analysis was used to evaluate the correlation between factors and examine the independence of the data characteristics. As shown in Figure 2, the correlation heatmap illustrates no significant correlation among the 10 variables filtered using logistic regression. Figure 3 displays the importance of the features extracted from the different ML algorithms. Notably, in the majority of the predictive models, T stage consistently emerged as the most influential feature, underscoring its critical role in predicting BM in BC. In contrast, tumor histology, tumor grade, race, and brain metastasis contributed relatively little to the model across most algorithms, with no significant differences in their importance. In the GBM model, the features ranked from the highest to the lowest importance were: T stage, N stage, lung metastasis, radiotherapy, liver metastasis, age, race, tumor histology, tumor grade, and brain metastasis. The SHAP values were then calculated for each variable in the GBM model, with the SHAP bar graph (Figure 4A) illustrating the importance of each feature. The results indicated that T stage, N stage, and radiotherapy are the most significant contributors to the GBM model. Both methods were consistent in identifying T stage and N stage as the top two characteristics, while the bottom four—race, tumor histology, tumor grade, and brain metastasis—were also nearly identical. A summary plot of the SHAP values is presented in Figure 4B, which explains the impact of each feature on the model predictions.

Figure 4. Interpretability of the gradient boosting machine (GBM) model assessed using the SHAP method. (A) SHAP bar chart showing the importance of each feature based on the mean SHAP values. (B) SHAP summary plot showing the impact of each feature on the model predictions. Individual dots symbolize patients, and different colors represent different levels of influence on the model output. SHAP, Shapley additive explanations.
Model performance and subgroup analysis
Figure 5 and Table 4 present the performance of the seven prediction models. The training set, balanced using SMOTE, was employed to train the models, while the test set was used to evaluate the accuracy and generalization ability of the models. To further validate the generalizability of the GBM model, external validation was performed using an independent cohort. Seven ML models were developed using the identified risk factors. After a comprehensive comparison, the GBM model demonstrated the best predictive value, achieving the highest AUC value of 0.855, along with accuracy, sensitivity (recall), and specificity values of 0.813, 0.733, and 0.815, respectively. The GBM model demonstrated favorable performance in the external validation cohort, achieving an AUC of 0.766 and an accuracy of 0.945 (Supplementary Table S2). The discrepancy between the model accuracy and AUC may be attributed to sample imbalance. Given that only 11 cases of BM were available, the model likely exhibited bias toward the majority class. This results in superficially high accuracy while limiting the model’s ability to identify minority class samples, consequently compromising the AUC performance. The confusion matrices for the GBM model in both the training and test sets are displayed in Figure 6. The predictive performance of the GBM model was compared with that of TNM staging to evaluate whether the model could provide more accurate and clinically meaningful predictions. As shown in Figure 7, the GBM model demonstrated superior performance to TNM staging alone, achieving an AUC of 0.855 compared with the lower AUC of TNM staging. This suggests that the GBM model may better capture features associated with the risk of BM. Stratified analyses of the model predictions were conducted to evaluate its fairness across demographic subgroups (Figure 8). Patients were stratified by gender, race, and age, with the model performance metrics calculated separately for each subgroup. The results showed comparable predictive performance between genders (AUC of 0.865 for male vs. 0.831 for female patients). Racial subgroup analysis revealed AUCs of 0.859 (white), 0.781 (black), and 0.847 (other). Age-stratified performance demonstrated AUCs of 0.920 (<60 years), 0.840 (60–80 years), and 0.788 (>80 years). While some inter-subgroup variability was observed, the model maintained clinically acceptable performance across all demographic strata.

Figure 5. Receiver operating characteristic (ROC) curves of the prognostic models based on machine learning in the training set (A), test set (B), and the external validation set (C).

Figure 6. Confusion matrices of the gradient boosting machine (GBM) model in the training set (A) and the test set (B).

Figure 7. Performance comparison between the gradient boosting machine (GBM) model and TNM staging alone in both the training set (A) and the test set (B).

Figure 8. Stratified analysis of the gradient boosting machine (GBM) model performance by gender (A, B), race (C–E), and age (F–H) subgroups in the test set.
Discussion
BC is a fatal urinary tumor that can be classified into non-muscle-invasive bladder cancer (NMIBC), muscle-invasive bladder cancer (MIBC), or clinical metastatic disease (19). The 5-year survival rate for mBC is only 5% (20). Patients with BCBM have the worst prognosis compared to other BM patients with urogenital cancers (21). The early identification of BM in BC could help improve the clinical outcomes. The available prediction methods have certain limitations. In this study, a GBM model was developed to assess the risk of BM in patients with BC. The model provides individualized risk stratification based on patient-specific characteristics (e.g., age, tumor stage, and histologic subtype), thereby informing personalized clinical decision-making. For patients across different risk categories, therapeutic strategies may be judiciously tailored—individuals at high risk might benefit from intensified multimodal regimens combining chemotherapy, immunotherapy, and targeted agents, while patients at low risk could potentially undergo reduced-frequency bone imaging surveillance—measures that may help alleviate financial burden, enhance quality of life, and mitigate metastasis-related complications.
Currently, the treatment strategies for BC are rapidly evolving. Immunotherapies and targeted therapies have transformed the treatment paradigm, offering broader and more effective therapeutic options for patients. Particularly noteworthy are the latest antibody–drug conjugates (ADCs), which have demonstrated significant benefits in BC (22, 23). The BM prediction model (GBM) developed in this study can provide decision-making support for ADC-based treatment strategies. For patients predicted to be at high risk of BM, we recommend direct adoption of combination therapy with ADCs and immune checkpoint inhibitors (ICIs). Studies have indicated that patients with metastatic predisposition who receive ADC+ICI combination therapy achieve a remarkable 1-year disease-free survival (DFS) rate of 97.4%, while the overall pathological downstaging rate reaches 75.5% (24), fully demonstrating the substantial advantage of this combined approach. AI is a research field that utilizes computers to simulate human intelligence, which has been successfully utilized in various domains, including autonomous driving, facial recognition, and music creation (25–27). ML, as a subset of AI, can assist clinicians in making better clinical decisions, thereby improving patient care and overall health (28). Tsai et al. (29) conducted a diagnostic study involving 1,336 patients with cystitis, BC, renal cancer, uterine cancer, and prostate cancer. The authors innovatively combined clinical laboratory data with ML methods to establish a diagnostic model for BC. Key indicators included calcium, alkaline phosphatase (ALP), albumin, urinary ketones, urethral occult blood, creatinine, alanine aminotransferase (ALT), and diabetes. Of the five models constructed in the study, LightGBM exhibited the best predictive performance, achieving an AUC value of 0.923 and an accuracy of 87.6%, demonstrating the potential of using clinical laboratory data for cancer detection. Xiong et al. (30) conducted a retrospective study involving 105 patients with BC. By comparing the performance of clinical models, radiomic models, and clinical–radiomic fusion models, the authors found that ML models combining radiomic features with clinical variables could more accurately predict the clinical staging of BC. Liosis et al. (31) developed an elastic net ML prediction model that successfully identified gene markers related to BC treatment response and disease progression, effectively predicting patients’ treatment responses and disease progression. Zheng et al. (32) created an ML algorithm based on pathological sections of MIBC to accurately quantify the tumor–stratum ratio (TSR) in patients. Their study showed a significant correlation between a low TSR and poorer overall survival, providing an automated TSR quantification method that reduces the subjectivity and inter-observer variability associated with traditional visual assessment methods. Despite significant progress in the construction and utilization of various models for the diagnosis, staging, treatment, and prediction of the prognosis of BC, there remains considerable room for improvement in the development of models that predict BCBM. For instance, Fan et al. (33) constructed a nomogram based on traditional logistic models to predict BCBM, identifying age, lung metastasis, liver metastasis, brain metastasis, N stage, T stage, histological type, pathological grading, primary tumor sites, and race as independent risk factors for BM in patients with BC. This study did not include patients’ previous treatment information, which could be considered in future model refinements. Zhang et al. (10) identified risk factors for BM in patients with BC, including age, race, marital status, T stage, N stage, tumor grading, lung metastasis, liver metastasis, and brain metastasis, but did not construct a corresponding predictive model.
In summary, while previous studies have developed nomogram models based on LR for predicting BM in patients with BC, these traditional approaches may have limitations in handling complex datasets. Our ML-based method offers an alternative approach that could potentially provide additional insights for clinical decision-making (34, 35). The existing prediction models for BCBM have shown varying performance. Identifying the risk factors for BCBM remains important for risk stratification and clinical management. In this study, ML algorithms were applied to analyze potential associations between clinical factors and BCBM risk, with the aim of developing an improved predictive approach (36).
Based on a big data analysis of the SEER database, this study identified independent risk factors related to BM through logistic regression analysis. A total of 12 clinically relevant variables associated with BCBM were included, namely, age, gender, race, tumor histology, tumor grade, T stage, N stage, radiotherapy, chemotherapy, brain metastasis, liver metastasis, and lung metastasis. Using multiple logistic regression analysis, 10 independent risk factors related to BM were identified: brain metastasis, lung metastasis, liver metastasis, radiotherapy, tumor grade, tumor histology, T stage, N stage, race, and age. BC exhibits diverse histological subtypes, including transitional cell carcinoma, squamous cell carcinoma, adenocarcinoma, and other subtypes. These variants demonstrated significant differences in biological behavior and prognostic outcomes (37). In this study, the limited number of BM-positive cases may have precluded comprehensive stratification to fully capture the heterogeneous impact of the histological subtypes on metastatic risk. Nevertheless, SHAP analysis confirmed their non-negligible contribution to the predictive model. Notably, chemotherapy was not identified as an independent risk factor for BM. This may be attributed to its predominant use in the advanced stage or in patients with mBC, who inherently exhibit a higher baseline risk of BM. Consequently, while chemotherapy appeared associated with BM in the univariable analysis, its effect became non-significant in the multivariable analysis after adjusting for T stage, N stage, and the presence of other metastases (e.g., liver/lung). These variables were incorporated into the model, enabling the development of an ML-based predictive tool. Model performance was assessed using standard metrics such as AUC, accuracy, sensitivity, and specificity on the test set. The GBM model demonstrated an AUC of 0.855, with a sensitivity of 0.733 and a specificity of 0.815, showing improved predictive capability compared with the other models developed in the study. These results suggest that this model may help identify patients with BC at an increased risk for BM. Furthermore, the subgroup analysis revealed diminished predictive performance of the model in two specific populations: black patients and those aged over 80 years. This observed reduction in accuracy may be attributable to data limitations and potential selection biases inherent in the study design. The GBM model, an ensemble learning algorithm, iteratively builds decision trees to correct prediction errors. Its ability to capture complex nonlinear relationships makes it highly effective for disease prognosis and risk stratification (38). Using the SHAP method, we determined that the T stage, the N stage, radiotherapy, age, lung metastasis, and liver metastasis are important predictors of BCBM. By comparing the characteristic rankings from the ML model with the SHAP analysis results, it was found that the T stage and the N stage consistently ranked as the top two features, indicating their significant contribution to model predictions. In addition, it was observed that four variables—radiotherapy, age, liver metastasis, and lung metastasis—ranked among the top six in importance across both methods, highlighting their value in predicting BCBM. Furthermore, it is noteworthy that radiotherapy emerged as a significant risk factor in the multifactor logistic regression analysis, with its importance ranking third in the SHAP graph, following T stage and N stage. This result may be related to the potential of radiotherapy to alter the tumor microenvironment and disrupt the normal synthesis and folding processes of the endoplasmic reticulum (ER) proteins, thereby promoting tumor aggressiveness and metastatic potential (39).
This study has several advantages. Firstly, an ML-based prediction model that can accurately predict BCBM was established, offering a more reliable alternative to traditional nomogram prediction models. Secondly, this research further explored the relationships among different independent high-risk factors, providing new directions for future clinical studies. Thirdly, for interpretability, SHAP values were used to show how each feature affected the predictions, helping to explain the model’s behavior. Finally, the generalizability of the model was independently evaluated using an external validation cohort, thereby mitigating potential performance overestimation due to data-splitting bias or overfitting.
However, this study does have certain limitations. Firstly, this large retrospective SEER-based study may introduce selection bias, particularly for the exclusion of patients due to missing data who might have a higher BM risk or unique clinical characteristics that the model failed to adequately learn, potentially compromising the prediction accuracy for these subgroups in clinical practice. Secondly, SEER lacks detailed treatment variables such as chemotherapy regimens and dosages, reducing the clinical prediction credibility and precluding treatment effect analysis. Future studies should integrate electronic health records (EHRs) with chemotherapy/radiotherapy planning systems. Thirdly, the established BC risk factors (i.e., smoking and occupational/environmental exposures) are unavailable in SEER and were thus excluded from the model, limiting the prediction accuracy. A fourth limitation is SEER’s hospital-reported diagnosis risk misclassification: BM may be underreported in asymptomatic patients without confirmatory imaging, while clinical–pathological T/N-staging discrepancies may exist. Fifthly, SEER does not track post-metastasis survival or SREs, which hinders assessment of whether early prediction improves outcomes. Although the reliability of the model was validated using AUC, accuracy, sensitivity, and specificity metrics and its generalizability was confirmed through external validation, its predictive capability remains limited and requires prospective clinical trial validation. Finally, the external validation dataset exhibits both class imbalance and geographic homogeneity (originating from a single region), resulting in performance fluctuations and predictive bias in the external cohort. Furthermore, disproportionate representation across subgroups may contribute to diminished predictive accuracy for specific demographic strata.
Today, with the rapid development of AI technology, the combination of AI with imaging omics plays a significant role in precision medicine (40) and is widely applied in the diagnosis, risk stratification, and treatment of various tumors, including BC, liver cancer, lung cancer, and parotid cancer (41–45). Overall, radiomics plays a significant role in the diagnosis, treatment, and prognosis of patients with BC, which enables timely interventions and thereby improves patients’ quality of life (46, 47). Future research plans include applying ML in conjunction with imaging omics to predict BCBM. We believe that, with the continued advancement of AI technology, ML will become increasingly prevalent in biomedical science, demonstrating substantial potential for clinical transformation and promising to significantly transform future medical practices (48–50).
Clinical implementation and challenges
The GBM model developed in this study demonstrated good performance in predicting BM in patients with BC. We plan to implement this model as an interactive risk calculator in clinical practice, where patients’ clinical characteristics can be input after BC diagnosis to obtain a preliminary BM risk score (represented as a 0–1 value, e.g., 0.30 indicating 30% risk). Patients at high risk would be prioritized for imaging examinations to assist clinical decision-making (see the model card in the Supplementary Material for details). However, several potential barriers exist for clinical integration: Firstly, clinical data integration poses challenges due to fragmented data across different information systems with inconsistent formats and missing values, potentially compromising input data quality. Secondly, establishing a multidisciplinary team that involves clinicians, data scientists, and other experts is crucial to develop implementation strategies, determine risk thresholds, create clinical guidelines and workflows, and obtain regulatory approvals and ethical clearance. Thirdly, considering the severe consequences of BM and the healthcare cost-effectiveness, action thresholds should be established through cost–benefit analysis to minimize the expected costs based on model-predicted probabilities. In addition, clinicians accustomed to traditional approaches might exhibit skepticism toward the new model, questioning its reliability and perceiving it as interfering with clinical autonomy, while the complex algorithm and multiple input features may hinder interpretability and clinician trust. To enable developers, clinicians, regulatory agencies, and other stakeholders to quickly understand the model’s applicable scope and potential risks, we have created a model card (see Supplementary Table S3).
Conclusion
In this study, we developed a ML model to predict BM in BC using 10 routinely available clinical features. Among the tested models, the GBM algorithm showed the highest predictive performance, including in the external validation cohort. These results suggest that the GBM model may aid in the clinical assessment of metastasis risk and inform treatment decisions.
Data availability statement
Publicly available datasets were analyzed in this study. This data can be found here: https://seer.cancer.gov/data-software/.
Author contributions
ZY: Conceptualization, Data curation, Methodology, Validation, Writing – original draft. XX: Conceptualization, Data curation, Formal Analysis, Validation, Writing – original draft. XZ: Conceptualization, Formal Analysis, Methodology, Project administration, Supervision, Validation, Writing – review & editing. PS: Conceptualization, Investigation, Software, Supervision, Writing – review & editing. HC: Formal Analysis, Investigation, Project administration, Software, Supervision, Validation, Writing – review & editing. TZ: Funding acquisition, Project administration, Resources, Validation, Visualization, Writing – review & editing.
Funding
The author(s) declare financial support was received for the research and/or publication of this article. This research was funded by the National Natural Science Foundation of China (no. 82260598) and the Jiangxi Provincial Academic and Technical Leader Training Program in Major Disciplines (no. 20225BCJ22009).
Acknowledgments
We are extremely grateful to Dr. Huang Jianbiao from Jiangxi Cancer Hospital for providing the clinicopathological data on bladder cancer and for his valuable insights and critical scientific discussions on the research. We are grateful to Xiao Pang for the technical support he provided for this research.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The reviewer LC declared a shared parent affiliation with the authors to the handling editor at the time of review.
Generative AI statement
The author(s) declare that no Generative AI was used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fonc.2025.1653506/full#supplementary-material
References
1. Siegel RL, Giaquinto AN, and Jemal A. Cancer statistics, 2024[J. CA: A Cancer J Clin. (2024) 74:12–49. doi: 10.3322/caac.21820
2. Bray F, Laversanne M, Sung H, Ferlay J, Siegel RL, Soerjomataram I, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: A Cancer J Clin. (2024) 74:229–63. doi: 10.3322/caac.21834
3. Stellato M, Santini D, Cursano MC, Foderaro S, Tonini G, Procopio G, et al. Bone metastases from urothelial carcinoma. The dark side of the moon. J Bone Oncol. (2021) 31:100405. doi: 10.1016/j.jbo.2021.100405
4. Tran L, Xiao JF, Agarwal N, Duex JE, and Theodorescu D. Advances in bladder cancer biology and therapy. Nat Rev Cancer. (2021) 21:104–21. doi: 10.1038/s41568-020-00313-1
5. Luzzago S, Palumbo C, Rosiello G, Pecoraro A, Deuker M, Tian Z, et al. The effect of radical cystectomy on survival in patients with metastatic urothelial carcinoma of the urinary bladder. J Surg Oncol. (2019) 120:1266–75. doi: 10.1002/jso.25717
6. Tao L, Pan X, Zhang L, Wang J, Zhang Z, Zhang L, et al. Marital status and prognostic nomogram for bladder cancer with distant metastasis: A SEER-based study. Front Oncol. (2020) 10:586458. doi: 10.3389/fonc.2020.586458
7. Shinagare AB, Ramaiya NH, Jagannathan JP, Fennessy FM, Taplin ME, and Van den Abbeele AD. Metastatic pattern of bladder cancer: correlation with the characteristics of the primary tumor. AJR Am J Roentgenol. (2011) 196:117–22. doi: 10.2214/AJR.10.5036
8. Fornetti J, Welm AL, and Stewart SA. Understanding the bone in cancer metastasis. J Bone Miner Res: Off J Am Soc Bone Miner Res. (2018) 33:2099–113. doi: 10.1002/jbmr.3618
9. Selvaggi G and Scagliotti GV. Management of bone metastases in cancer: a review. Crit Rev Oncol Hematol. (2005) 56:365–78. doi: 10.1016/j.critrevonc.2005.03.011
10. Zhang C, Liu L, Tao F, Guo X, Feng G, Chen F, et al. Bone metastases pattern in newly diagnosed metastatic bladder cancer: A population-based study. J Cancer. (2018) 9:4706–11. doi: 10.7150/jca.28706
11. Burke HB. Outcome prediction and the future of the TNM staging system. J Natl Cancer Inst. (2004) 96:1408–9. doi: 10.1093/jnci/djh293
12. Zou XC, Rao XP, Huang JB, Zhou J, Chao HC, and Zeng T. Predicting distant metastasis of bladder cancer using multiple machine learning models: a study based on the SEER database with external validation. Front Oncol. (2024) 14:1477166. doi: 10.3389/fonc.2024.1477166
13. Shi S, Peng G, Luo L, and Li D. Predictive nomograms for risk and prognostic factors in metastatic bladder cancer: a population-based study. Trans Cancer Res. (2023) 12:3284–302. doi: 10.21037/tcr-23-1229
14. Jones OT, Calanzani N, Saji S, Duffy SW, Emery J, Hamilton W, et al. Artificial intelligence techniques that may be applied to primary care data to facilitate earlier diagnosis of cancer: systematic review. J Med Internet Res. (2021) 23:e23483. doi: 10.2196/23483
15. Yin J, Ngiam KY, and Teo HH. Role of artificial intelligence applications in real-life clinical practice: systematic review. J Med Internet Res. (2021) 23:e25759. doi: 10.2196/25759
16. Peng ZH, Tian JH, Chen BH, Zhou HB, Bi H, He MX, et al. Development of machine learning prognostic models for overall survival of prostate cancer patients with lymph node-positive. Sci Rep. (2023) 13:18424. doi: 10.1038/s41598-023-45804-x
17. Wang Z, Xu C, Liu W, Zhang M, Zou J, Shao M, et al. A clinical prediction model for predicting the risk of liver metastasis from renal cell carcinoma based on machine learning. Front Endocrinol. (2023) 13:1083569. doi: 10.3389/fendo.2022.1083569
18. Zhuang Y, Chen YW, Shae ZY, and Shyu C-R. Generalizable layered blockchain architecture for health care applications: development, case studies, and evaluation. J Med Internet Res. (2020) 22:e19029. doi: 10.2196/19029
19. Compérat E, Amin MB, Cathomas R, Choudhury A, De Santis M, Kamat A, et al. Current best practice for bladder cancer: a narrative review of diagnostics and treatments. Lancet. (2022) 400:1712–21. doi: 10.1016/S0140-6736(22)01188-6
20. Patel VG, Oh WK, and Galsky MD. Treatment of muscle-invasive and advanced bladder cancer in 2020. CA: A Cancer J Clin. (2020) 70:404–23. doi: 10.3322/caac.21631
21. Owari T, Miyake M, Nakai Y, Morizawa Y, Itami Y, Hori S, et al. Clinical features and risk factors of skeletal-related events in genitourinary cancer patients with bone metastasis: A retrospective analysis of prostate cancer, renal cell carcinoma, and urothelial carcinoma. Oncology. (2018) 95:170–8. doi: 10.1159/000489218
22. Hu J, Chen J, Ou Z, Chen H, Liu Z, Chen M, et al. Neoadjuvant immunotherapy, chemotherapy, and combination therapy in muscle-invasive bladder cancer: A multi-center real-world retrospective study. Cell Rep Med. (2022) 3:100785. doi: 10.1016/j.xcrm.2022.100785
23. Hu J, Yan L, Liu J, Chen M, He Y, Fan B, et al. Neoadjuvant immunotherapy driven bladder preservation for muscle invasive bladder cancer. iMeta. (2025) 4:e70063. doi: 10.1002/imt2.70063
24. Hu J, Yan L, Liu J, Chen M, Liu P, Deng D, et al. Efficacy and biomarker analysis of neoadjuvant disitamab vedotin (RC48-ADC) combined immunotherapy in patients with muscle-invasive bladder cancer: A multi-center real-world study. iMeta. (2025) 4:e70033. doi: 10.1002/imt2.70033
25. Khatua A, Khatua A, Chi X, and Cambria E. Artificial intelligence, social media and supply chain management: the way forward. Electronics. (2021) 10:2348. doi: 10.3390/electronics10192348
26. Molas G and Nowak E. Advances in emerging memory technologies: from data storage to artificial intelligence. Appl Sci. (2021) 11:11254. doi: 10.3390/app112311254
27. Kikon A and Deka PC. Artificial intelligence application in drought assessment, monitoring and forecasting: a review. Stoch Environ Res Risk Assess. (2022) 36:1197–214. doi: 10.1007/s00477-021-02129-3
28. Bhavsar KA, Singla J, Al-Otaibi YD, Song OY, Zikria YB, and Bashir AK. Medical diagnosis using machine learning: A statistical review. Computers Mater Continua. (2021) 67:107–25. doi: 10.32604/cmc.2021.014604
29. Tsai IJ, Shen WC, Lee CL, Wang DR, and Lin CY. Machine learning in prediction of bladder cancer on clinical laboratory data. Diagn (Basel Switzerland). (2022) 12:203. doi: 10.3390/diagnostics12010203
30. Xiong S, Fu Z, Deng Z, Li S, Zhan X, Zheng F, et al. Machine learning-based CT radiomics enhances bladder cancer staging predictions: A comparative study of clinical, radiomics, and combined models. Med Phys. (2024) 51:5965–77. doi: 10.1002/mp.17288
31. Liosis KC, Marouf AA, Rokne JG, Ghosh S, Bismar TA, and Alhajj R. Genomic biomarker discovery in disease progression and therapy response in bladder cancer utilizing machine learning. Cancers. (2023) 15:4801. doi: 10.3390/cancers15194801
32. Zheng Q, Jiang Z, Ni X, Yang S, Jiao P, Wu J, et al. Machine learning quantified tumor-stroma ratio is an independent prognosticator in muscle-invasive bladder cancer. Int J Mol Sci. (2023) 24:2746. doi: 10.3390/ijms24032746
33. Fan Z, Huang Z, Hu C, Tong Y, and Zhao C. Risk factors and nomogram for newly diagnosis of bone metastasis in bladder cancer. Medicine. (2020) 99:e22675. doi: 10.1097/MD.0000000000022675
34. Naik K, Goyal RK, Foschini L, Chak CW, Thielscher C, Zhu H, et al. Current status and future directions: the application of artificial intelligence/machine learning for precision medicine. Clin Pharmacol Ther. (2024) 115:673–86. doi: 10.1002/cpt.3152
35. Goecks J, Jalili V, Heiser LM, and Gray JW. How machine learning will transform biomedicine. Cell. (2020) 181:92–101. doi: 10.1016/j.cell.2020.03.022
36. Buch VH, Ahmed I, and Maruthappu M. Artificial intelligence in medicine: current trends and future possibilities. Br J Gen Pract: J R Coll Gen Pract. (2018) 68:143–4. doi: 10.3399/bjgp18X695213
37. Claps F, Biasatti A, Di GianFrancesco L, Ongaro L, Giannarini G, Pavan N, et al. The prognostic significance of histological subtypes in patients with muscle-invasive bladder cancer: an overview of the current literature. J Clin Med. (2024) 13:4349. doi: 10.3390/jcm13154349
38. Dash TK, Chakraborty C, Mahapatra S, and Panda G. Gradient boosting machine and efficient combination of features for speech-based detection of COVID-19. IEEE J Biomed Health Inf. (2022) 26:5364–71. doi: 10.1109/JBHI.2022.3197910
39. Nie Z, Chen M, Wen X, Gao Y, Huang D, Cao H, et al. Endoplasmic reticulum stress and tumor microenvironment in bladder cancer: the missing link. Front Cell Dev Biol. (2021) 9:683940. doi: 10.3389/fcell.2021.683940
40. Hosny A, Parmar C, Quackenbush J, Schwartz LH, and Aerts JHWL. Artificial intelligence in radiology. Nat Rev Cancer. (2018) 18:500–10. doi: 10.1038/s41568-018-0016-5
41. Zheng Y, Zhou D, Liu H, and Wen M. CT-based radiomics analysis of different machine learning models for differentiating benign and Malignant parotid tumors. Eur Radiol. (2022) 32:6953–64. doi: 10.1007/s00330-022-08830-3
42. Mao B, Zhang L, Ning P, Ding F, Wu F, Lu G, et al. Preoperative prediction for pathological grade of hepatocellular carcinoma via machine learning–based radiomics. Eur Radiol. (2020) 30:6924–32. doi: 10.1007/s00330-020-07056-5
43. Jiang C, Luo Y, Yuan J, You S, Chen Z, Wu M, et al. CT-based radiomics and machine learning to predict spread through air space in lung adenocarcinoma. Eur Radiol. (2020) 30:4050–7. doi: 10.1007/s00330-020-06694-z
44. Wei Z, Xv Y, Liu H, Li Y, Yin S, Xie Y, et al. A CT-based deep learning model predicts overall survival in patients with muscle invasive bladder cancer after radical cystectomy: a multicenter retrospective cohort study. Int J Surg. (2024) 110:2922–2932. doi: 10.1097/JS9.0000000000001194
45. Bizzarri FP, Nelson AW, Colquhoun AJ, and Lobo N. Utility of fluorodeoxyglucose positron emission tomography/computed tomography in detecting lymph node involvement in comparison to conventional imaging in patients with bladder cancer with variant histology. Eur Urol Oncol. (2025):S2588931125000975. doi: 10.1016/j.euo.2025.03.019
46. Gavi F, Foschi N, Fettucciari D, Russo P, Giannarelli D, Ragonese M, et al. Assessing trifecta and pentafecta success rates between robot-assisted vs. Open radical cystectomy: A propensity score-matched analysis. Cancers. (2024) 16:1270. doi: 10.3390/cancers16071270
47. Palermo G, Bizzarri FP, Scarciglia E, Sacco E, Moosavi Seyed K, Russo P, et al. The mental and emotional status after radical cystectomy and different urinary diversion orthotopic bladder substitution versus external urinary diversion after radical cystectomy: A propensity score-matched study. Int J Urol. (2024) 31:1423–8. doi: 10.1111/iju.15586
48. Hofer IS, Burns M, Kendale S, and Wanderer J. Realistically integrating machine learning into clinical practice: A road map of opportunities, challenges, and a potential future. Anesth Analg. (2020) 130:1115–8. doi: 10.1213/ANE.0000000000004575
49. Liu L, Zhang R, Shi Y, et al. Automated machine learning for predicting liver metastasis in patients with gastrointestinal stromal tumor: a SEER-based analysis. Sci Rep. (2024) 14:12415. doi: 10.1038/s41598-024-62311-9
50. Lee C, Light A, Alaa A, Thurtle D, Van Der Schaar M, and Gnanapragasam VJ. Application of a novel machine learning framework for predicting non-metastatic prostate cancer-specific mortality in men using the Surveillance, Epidemiology, and End Results (SEER) database. Lancet Digit Health. (2021) 3:e158–65. doi: 10.1016/S2589-7500(20)30314-9
Keywords: machine learning, bladder cancer, SEER database, bone metastasis, predictive value
Citation: Yu ZJ, Xu XD, Zou XC, Su PD, Chao HC and Zeng T (2025) Ensemble machine learning models for predicting bone metastasis in bladder cancer. Front. Oncol. 15:1653506. doi: 10.3389/fonc.2025.1653506
Received: 25 June 2025; Accepted: 02 September 2025;
Published: 25 September 2025.
Edited by:
Sanja Stifter-Vretenar, Skejby Sygehus, DenmarkReviewed by:
Jure Murgic, Sisters of Charity Hospital, CroatiaJiao Hu, Central South University, China
Luca Ongaro, Royal Free Hospital, United Kingdom
Luyao Chen, The First Affiliated Hospital of Nanchang University, China
Copyright © 2025 Yu, Xu, Zou, Su, Chao and Zeng. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Tao Zeng, dGFvemVuZzQwNzA5QHNpbmEuY29t
†These authors have contributed equally to this work