Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Med., 29 July 2025

Sec. Precision Medicine

Volume 12 - 2025 | https://doi.org/10.3389/fmed.2025.1624198

Explainable machine learning for predicting distant metastases in renal cell carcinoma patients: a population-based retrospective study


Zhao Hou,Zhao Hou1,2Peipei Wang,Peipei Wang1,3Dingyang Lv,Dingyang Lv1,4Huiyu Zhou,Huiyu Zhou1,4Zhiwei Guo,Zhiwei Guo1,4Jinshuai Li,Jinshuai Li1,4Mohan Jia,Mohan Jia1,4Hongyang Du,Hongyang Du1,4Weibing Shuang,*Weibing Shuang1,4*
  • 1Department of Urology, The First Hospital of Shanxi Medical University, Taiyuan, Shanxi, China
  • 2Academy of Medical Sciences, Shanxi Medical University, Taiyuan, Shanxi, China
  • 3School of Public Health, Shanxi Medical University, Taiyuan, Shanxi, China
  • 4Department of First Clinical Medical College, Shanxi Medical University, Taiyuan, Shanxi, China

Background: Distant metastasis is a key factor contributing to poor prognosis in renal cell carcinoma (RCC). Early prediction of metastasis is crucial for developing personalized treatment plans and improving patient outcomes. This study aimed to establish and validate a clinical prediction model for distant metastasis in RCC patients.

Methods: Ten machine learning algorithms were employed to develop a predictive model for distant metastasis in RCC. Data from 51,566 RCC patients in The Surveillance, Epidemiology, and End Results (SEER) database (2010–2018) were used for model development, while 726 RCC patients from the First Hospital of Shanxi Medical University were selected for external validation. Hyperparameters were optimized using grid search and tenfold cross-validation. Model performance was assessed using metrics such as the area under the receiver operating characteristic curve (AUC), the area under the precision-recall curve (AUPRC), decision curve analysis, calibration curves, precision, and accuracy. Shapley additive explanations (SHAP) were used for model interpretation. The best-performing model was then used to create a web-based calculator to predict metastasis risk in RCC patients.

Results: The study included 51,566 RCC patients, with 3,667 showing distant metastases. Logistic regression identified tumor size, grade, T-stage, N-stage, radiotherapy, chemotherapy, and surgery as independent risk factors. The Extreme Gradient Boosting (XGB) model demonstrated superior performance (AUC: 0.957, Accuracy: 0.898) in the training set and was validated externally (AUC: 0.742, Accuracy: 0.904). A web-based calculator was developed using the XGB model.

Conclusion: This study designed and validated an XGB model using clinicopathologic data to predict the risk of distant metastasis in RCC patients, potentially aiding clinical decision-making.

1 Introduction

Renal cell carcinoma is the 14th most common malignancy worldwide, with over 430,000 new cases reported in 2020, and is the most common histopathological subtype, constituting approximately 90% of all renal malignancies (1). According to relevant epidemiological evidence, renal cancer is the ninth most diagnosed cancer in female patients and the sixth most diagnosed in male patients, accounting for 3% and 5% of all malignant tumor diagnoses, respectively (2). Despite the increase in its incidence, overall mortality from RCC has been decreasing (3, 4). Advances in therapeutic strategies such as targeted therapies and immune checkpoint inhibitors (ICIs) have led to an improvement in the prognosis of patients with RCC (5), but there is a significant difference in the prognosis of patients with limited and metastatic renal cancer. The 5 years survival rate is nearly 93% for limited renal cancer and only 17% for patients with distant metastases (2, 6). Previous studies have shown that 18%–30% of RCC patients present with systemic metastases at initial diagnosis, and an additional one-third develop metastatic disease following nephrectomy during long-term follow-up (7, 8). Among metastatic RCC (mRCC) cases, approximately 75% exhibit three or more metastatic locations (9). Therefore, identifying risk factors for RCC metastasis, as well as developing metastasis prediction models, is essential for improving the survival prospects of patients with RCC.

Artificial Intelligence (AI) is a branch of computer science focused on creating systems capable of performing tasks that typically require human intelligence (10). Machine learning (ML) is at the heart of AI, which uses algorithms to enable machines to train or learn from large amounts of empirical data without specific computer programming, to generate patterns to form corresponding models, and iteratively refine predictive models without explicit programming (11). Traditional statistical methods emphasize hypothesis testing and causal inference under rigid parametric assumptions, which inherently constrains their capacity to enhance predictive accuracy and generalizability in complex, real-world clinical scenarios. Furthermore, their reliance on manual feature engineering and linearity assumptions fundamentally limits scalability when analyzing high-dimensional biomedical data or unstructured clinical data streams (12). ML can integrate computer science and statistics with medical problems, and its use of complex algorithms running on large-scale, heterogeneous datasets can be used to discover useful models. As summarized by Suarez-Ibarrola et al. (13) ML and Deep Learning (DL) were found to outperform traditional statistical methods in diagnosis, prediction of response to treatment, prediction of pathology grading, and patient survival in urological disorders such as urolithiasis, renal cancer, prostate cancer, and bladder cancer.

In this study, ten ML predictive models were developed based on conventional clinicopathological parameters to identify the key factors influencing distant metastasis in RCC. The performance of these models was comprehensively evaluated using multiple metrics, and the interpretability of their key features was thoroughly addressed. Ultimately, the optimal models were integrated into a clinical practice framework to assist in the screening of high-risk patients, thereby improving the accuracy of the diagnosis of distant metastasis of RCC and providing an evidence-based basis for the development of therapeutic guidelines and standards of care.

2 Materials and methods

2.1 Data source and patient cohorts

This study used a retrospective cohort design with data from the Surveillance, Epidemiology, and End Results (SEER) database (2010–2018) established by the National Cancer Institute and an independent validation cohort at the First Hospital of Shanxi Medical University (2013–2021) The SEER database covers 28% of the United States population and provides us with a large amount of data from cancer-related research, and information on metastatic tumors has been systematically collected since 2010 (14). Patients with RCC meeting the following criteria were extracted by SEER*STAT 8.4.4 software. Inclusion criteria: histologically confirmed primary RCC (International Classification of Diseases of Oncology ICD-O-3 codes: 8120/3 for migratory cell carcinoma, 8130/3 for papillary migratory cell carcinoma, 8260/3 for papillary adenocarcinoma, 8310/3 for clear cell adenocarcinoma, 8312/3 for renal cell carcinoma, 8317/3 for chromophobe cell carcinoma) Exclusion criteria: (1) missing demographic/tumor characteristics (age, sex, tumor size, TNM stage, etc.); (2) autopsy-confirmed diagnosis; and (3) unknown survival time or cause of death. A total of 51,566 patients from the SEER cohort were ultimately included. The data were divided into training and testing sets in a 7:3 ratio, with 726 patients from a single-center cohort in China used for external validation. The study variables included three major dimensions: demographic characteristics (age, gender, race, and marital status), tumor characteristics: (size, histological subtype, laterality, grading, and T/N/M staging), and treatment modalities: (surgery, radiotherapy, and chemotherapy). This study followed the Declaration of Helsinki, SEER data were granted an ethical exemption due to de-identified characteristics (open access number)1, and the external validation cohort received approval from the Ethics Committee of the First Hospital of Shanxi Medical University (approval number: 2018 K006). Specific information about the SEER and external validation of the RCC cohort is shown in Table 1. The study flow of this paper is shown in Figure 1.

TABLE 1
www.frontiersin.org

Table 1. Characterization of clinical and pathological data in the training, test, and validation cohort.

FIGURE 1
Flowchart detailing the selection and processing of renal cell carcinoma patient data from the SEER database (2010-2018). Out of 347,786 patients, 51,566 were included; 296,220 were excluded due to inadequate data. The dataset was split into a training set of 36,096 and a testing set of 15,470 in a 7:3 ratio. Various machine learning models were applied, including LR, DT, RF, NBC, KNN, SVM, Enet, MLP, XGB, and LghtGBM, using k-fold cross-validation with stochastic optimization. The best model was validated with 726 external patients from Shanxi Medical University for constructing a web calculator.

Figure 1. Study design and patient screening workflow diagram.

2.2 Feature screening

This study employed LASSO regression for feature dimensionality reduction to select candidate variables. Subsequently, univariate logistic regression was performed to preliminarily identify potential predictors associated with metastasis. Multivariate logistic regression was then used to determine the independent risk factors for distant metastasis in RCC (P < 0.05). These key variables were subsequently incorporated into the machine-learning modeling process.

2.3 Model development and evaluation

This study employs ten machine learning algorithms: Logistic Regression (LR) (15), Decision Trees (DT) (16), Random Forests (RF) (17), Naive Bayes (NBC) (18), K Nearest Neighbors (KNN) (19), Support Vector Machines (SVM) (20), Elastic Networks (Enet) (21), Multilayer Perceptrons (MLP) (22), Extreme Gradient Boosting (XGB) (23), Lightweight Gradient Boosting Machine (LightGBM) (24).

The models were developed using a training dataset. Notably, Logistic Regression was evaluated using 10-fold cross-validation but did not require hyperparameter tuning due to its straightforward nature. Conversely, grid search hyperparameter tuning was conducted for the remaining nine machine learning algorithms, building upon the results from the 10-fold cross-validation, to ensure optimal performance and mitigate the risk of overfitting. The specifics of the hyperparameter optimizations are as follows:

Decision Tree (DT): Parameters optimized included the Cost-Complexity Parameter (cost_complexity), maximum depth of the tree (tree_depth), and the minimum number of data points in a node (min_n).

Random Forest (RF): Optimization focused on the number of features randomly selected for splitting (mtry), the number of trees (trees), and the minimum number of data points in a node (min_n).

Naive Bayes (NBC): The relative smoothness of the class boundary (smoothness) and the Laplace correction parameter (Laplace) were optimized.

K Nearest Neighbors (KNN): A single integer representing the number of neighbors to consider (neighbors) was optimized.

Support Vector Machines (SVM): Parameters optimized included the cost of predicting samples within or beyond the margin (cost) and the sigma value for the Radial Basis Function (rbf_sigma).

Elastic Networks (Enet): Optimized parameters included the amount of regularization (penalty) and the proportion of Lasso Penalty (mixture).

Multilayer Perceptrons (MLP): Key parameters optimized were the number of units in the hidden layer (hidden_units), the amount of regularization (penalty), and the number of training iterations (epochs).

Extreme Gradient Boosting (XGB): The optimization process included the number of predictors randomly sampled at each split (mtry), maximum depth of the tree (tree_depth), minimum number of data points in a node (min_n), learning rate (learn_rate), loss reduction required for additional splits (loss_reduction), and size of the dataset exposed during fitting (sample_size).

Lightweight Gradient Boosting Machine (LightGBM): The number of predictors sampled at each split (mtry), the number of trees in model training (trees), minimum number of data points in a node (min_n), maximum depth of the tree (tree_depth), learning rate (learn_rate), and minimum loss reduction (loss_reduction) were all subjects of optimization. The specific hyperparameter values for each model are provided in Supplementary Table 1.

To assess the generalization ability of the models, the ten developed models were applied to both the internal test set and external validation set. The performance was comprehensively evaluated using receiver operating characteristic (ROC) curves, Precision-Recall (PR) curves, calibration curves, and confusion matrix results on the training set, internal test set, and external validation set. The model with the best performance was selected based on the relevant metrics.

Additionally, Shapley Additive Explanations (SHAP), a model-agnostic interpretability technique based on cooperative game theory, was employed to explain the predictions made by the best-performing ensemble machine learning model (25). The SHAP method was used to calculate the importance of each variable in the optimal model. Finally, we constructed a network calculator to facilitate the generalization and application of the model in clinical settings.

2.4 Statistical analysis

Data analysis was performed using R software (version 4.2.2). Due to the marked imbalance in the number of RCC patients with distant metastases compared to those without, we applied the Synthetic Minority Over-sampling Technique (SMOTE) to increase the number of patients with distant metastases, thereby mitigating the impact of class imbalance on model performance. SMOTE generates synthetic samples and incorporates them into the minority class to address the imbalance in the original dataset, ultimately improving model accuracy (26). Chi-square tests and Fisher’s exact tests were used to compare categorical variables between different groups, with categorical variables reported as frequency (percentage, %). A P-value less than 0.05 was considered statistically significant.

3 Result

3.1 Baseline characteristics of the study cohort

A total of 51,566 RCC patients were included in the study, sourced from the SEER database. Of these, 3,667 (7.11%) developed distant metastases and 47,899 (92.89%) did not. Table 2 presents the demographic and clinicopathological characteristics of all the patients included in the study. Patients from the SEER database were randomly assigned to a training set (n = 36,096) and an internal test set (n = 15,470) in a 7:3 ratio. External validation was conducted using data from 736 RCC patients at the First Hospital of Shanxi Medical University (Table 3). Detailed information on the training, testing, and validation cohorts is provided in Table 1.

TABLE 2
www.frontiersin.org

Table 2. Overview of clinical and pathological characteristics of the Surveillance, Epidemiology, and End Results (SEER) database cohort.

TABLE 3
www.frontiersin.org

Table 3. Clinical and pathological characteristics of the Chinese cohort study population.

We compared the characteristics of patients in the metastatic and non-metastatic groups from the SEER database. Thirteen clinicopathological factors were included in our study: age, sex, race, marital status, tumor size, tumor laterality, histological type, tumor grade, T-stage, N-stage, radiotherapy, chemotherapy, and surgery. Patients from the SEER database were categorized into two subgroups: DM (−) (47,899 patients without distant metastases, 92.89%) and DM (+) (3,667 patients with distant metastases, 7.11%). Our analysis revealed that a higher proportion of patients aged ≥ 50 years was observed in the DM (+) subgroup compared to the DM (−) subgroup (P < 0.0001); Males exhibited significantly higher metastatic rates than females in DM (+) (P < 0.0001); The proportion of White, Asian, or Pacific Islander patients was higher in the DM (+) subgroup compared to the DM (−) subgroup (P < 0.0001). Additionally, married patients (2,393/33,974, 7.05%) showed a higher incidence of distant metastasis than single patients (557/8,358, 6.66%; P = 0.0131). Regarding renal cancer progression, a greater proportion of patients with tumor sizes larger than 5 cm was observed in the DM (+) group (87.37%) compared to the DM (−) group (34.35%, P < 0.0001). The DM (+) subgroup also exhibited significantly higher proportions of certain histological subtypes: 8120/3 (1.91% vs. 0.32%), 8130/3 (0.38% vs. 0.28%), 8310/3 (73.22% vs. 70.50%), and 8312/3 (17.59% vs. 10.81%) compared to DM (−) (P < 0.0001); The DM (+) subgroup exhibited a significantly higher prevalence of Grade III–IV disease (histopathological grading), T2–T4 category (tumor extent), and N1–N2 category (regional lymph node involvement) compared to the DM (−) subgroup (P < 0.0001); significant disparities in treatment administration (radiotherapy, chemotherapy, surgery) between subgroups (P < 0.0001).

3.2 Feature variable selection

As shown in Figure 2, based on Lasso regression analysis, two sets of regularization parameters (λ), λ.min (0.000252) and λ0.1se (0.004947), were determined using 10-fold cross-validation. To optimize the balance between model complexity and generalization, the most parsimonious parameter, λ0.1se, which corresponds to the range within one standard error, was selected as the optimal parameter. Seven significant predictors were identified in the training set: maximum tumor diameter, histological grade, T-stage, N-stage, radiotherapy, chemotherapy, and surgical intervention. Univariate and multivariate logistic regression analyses were conducted on these predictors, with the results summarized in Table 4. Tumor size, grade, T-stage, N-stage, radiotherapy, and chemotherapy were ultimately identified as independent risk factors for distant metastasis in renal cell carcinoma (RCC) patients (P < 0.001). Additionally, surgery (OR = 0.14, 95% CI = 0.13–0.16, P < 0.001) was found to be an independent protective factor for RCC distant metastasis. Variables with P < 0.05 in the multivariate logistic regression analysis were subsequently included in the machine learning model.

FIGURE 2
Chart (A) shows a line plot of coefficients vs. log lambda values ranging from -8 to 1, with colored lines representing different coefficients. Chart (B) displays a scatter plot of binomial deviance vs. log lambda, with red dots indicating data points showing an upward trend from log(lambda) = -6 to -2.

Figure 2. Risk factors for distant metastasis of renal cancer identified by LASSO regression. (A) Based on the logarithmic (lambda) sequence, a coefficient profile was created, yielding non-zero coefficients corresponding to the optimal lambda value. (B) The process of selecting the optimal value for the parameter λ in the Lasso regression model was performed using cross-validation. The dotted vertical lines indicate the optimal predictors based on the minimum criteria and the 1 standard error of the minimum criteria (1-SE criteria).

TABLE 4
www.frontiersin.org

Table 4. Univariate and multivariate logistic regression in patients with distant metastases from renal cancer.

3.3 Model performance evaluation

To build a predictive model for distant metastasis of RCC using ML algorithms, we used seven features (tumor size, tumor grade, T-stage, N-stage, radiotherapy, chemotherapy, and surgery) identified through screening as independent factors. The algorithms used included LR, DT, RF, NBC, KNN, SVM, Enet, MLP, XGB, and LightGBM. To reduce overfitting and select the best model, We conducted 10-fold cross-validation on the training set, evaluating accuracy, precision, recall, F1 score, and AUC for ten ML models (Table 5), and calibration curve plots (Figure 3).

TABLE 5
www.frontiersin.org

Table 5. Model performance evaluation metrics for ten machine learning models.

FIGURE 3
Graphs compare eleven models (LR, ENet, DT, RF, XGB, SVM, MLP, LightGBM, KNN, NBC) across three panels (A, B, C) plotting percent predictions against midpoints. Each model’s performance varies, showing red lines against a benchmark diagonal.

Figure 3. Calibration curves of 10 machine learning methods in the training set (A), test set (B), and external validation set (C). The black diagonal line represents the ideal calibration curve. A calibration curve closer to this line indicates better model calibration.

The results demonstrated that the XGB model exhibited the most stable performance and superior discriminative ability in the validation set. Figures 46 represent the ROC curves, PR curves, DCA curves, and calibration curves for all 10 models across the training, test, and external validation sets. The XGB model consistently delivered strong and stable performance across all datasets, outperforming the other models in terms of discriminative power. Furthermore, the heatmap analysis (Figure 7) offers a comprehensive multidimensional assessment, providing a clearer and more detailed overview of the model’s performance. Following a thorough evaluation of the models across the three datasets, we conclude that the XGB model demonstrates balanced and robust performance in predicting distant metastasis in RCC patients, thus making it the optimal model.

FIGURE 4
Four-panel image with different machine learning model evaluations. (A) ROC curve showing sensitivity vs. 1-specificity for multiple models: DT, ENet, KNN, LightGBM, LR, MLP, NBC, RF, SVM, XGB. (B) Precision-recall curve for the same models. (C) Decision curve analysis with net benefit vs. threshold probability for various strategies, including Treat All and Treat None. (D) Calibration plot for the XGB model showing predicted probability vs. observed outcomes across midpoints.

Figure 4. The receiver operating characteristic (ROC) curves (A), Precision-Recall (PR) curves (B), Decision Curve Analysis (DCA) curves (C), and calibration curves (D) of the 10 machine learning models in the training set, with calibration curves based on the best model.

FIGURE 5
Panel (A) displays an ROC curve comparing model sensitivity and 1-specificity for different models including DT, ENet, KNN, and others. Panel (B) shows a precision-recall curve for the same models. Panel (C) presents a decision curve analysis plotting net benefit against threshold probability. Panel (D) is a calibration curve for the XGB model, displaying the percent against the midpoint. Each panel includes detailed legends for model identification.

Figure 5. The receiver operating characteristic (ROC) curves (A), Precision-Recall (PR) curves (B), Decision Curve Analysis (DCA) curves (C), and calibration curves (D) of the 10 machine learning models in the test set, with calibration curves based on the best model.

FIGURE 6
Four plots are displayed: (A) Receiver Operating Characteristic (ROC) curves comparing models like DT, KNN, and XGB; (B) Precision-Recall curves for the same models; (C) Net Benefit against Threshold Probability for various models; (D) Calibration plot for the XGB model with percent on the y-axis and midpoint on the x-axis.

Figure 6. The receiver operating characteristic (ROC) curves (A), Precision-Recall (PR) curves (B), Decision Curve Analysis (DCA) curves (C), and calibration curves (D) of the 10 machine learning models in the external validation set, with calibration curves based on the best model.

FIGURE 7
Three heatmaps labeled A, B, and C compare model performance across five metrics: Accuracy, AUC, F2-Score, Precision, and Recall. Each heatmap shows different models like XGB, SVM, and RF. Darker shades indicate higher values, while lighter shades represent lower values. Heatmap A shows generally high values across metrics, B has moderate values, and C presents the lowest metrics, particularly in Recall.

Figure 7. Predictive performance of 10 models in the training set (A), test set (B), and external validation set (C).

3.4 Interpretability of the model

Shapley’s Additive Explanation values were employed to interpret the XGB model. Generally, a higher SHAP value for a feature correlates with an increased probability of the target event occurring. The study results indicated that undergoing chemotherapy was the most significant variable, followed by receiving radiotherapy, having stage T3 disease, possessing a tumor size greater than 5 cm, undergoing surgery, Grade IV, Grade III, stage T4, stage N1, stage T2, stage N2, and Grade II (Figure 8).

FIGURE 8
Panel A shows a violin plot of SHAP values for different features. Higher values are in purple and lower values in yellow, indicating feature impact intensity. Panel B displays a bar chart ranking the same features by importance score, with Chemotherapy_X1 having the highest score.

Figure 8. Relative importance of variables based on SHAP for XGB prediction model. Where (A) illustrates the SHAP value distribution of features and (B) shows the feature importance scores visualized as a bar plot.Chemotherapy_X1 indicates receipt of chemotherapy, Radiation_X1 indicates receipt of radiotherapy, T_X3 represents tumor stage T3, size_X2 indicates a tumor size greater than 5 cm, RX_X1 indicates receipt of surgical treatment, Grade_X4 represents tumor grade IV, Grade_X3 represents tumor grade III, T_X4 represents tumor stage T4, N_X1 represents tumor stage N1, T_X2 represents tumor stage T2, N_X2 represents tumor stage N2, and Grade_X2 represents tumor grade II. SHAP, Shapley’s Additive Explanation; RX, RX Summ-Surg (surgery).

3.5 Online calculator for predicting distant metastasis in RCC

Although the XGB model outperformed other machine learning models, its complexity and limitations in interpretability pose challenges for clinical application. To enhance its clinical utility, we developed an interactive web-based calculator based on the XGB model. This tool allows clinicians to input variables via interactive fields to estimate the probability of distant metastasis in RCC patients. Figure 9 displays a screenshot of the online web calculator. The web calculator can be accessed via the following link: https://houzhao11.shinyapps.io/DM_Predictor/.

FIGURE 9
Distant Metastasis Predictor interface for Renal Cell Carcinoma showing input fields and a prediction result. Inputs: Tumor size ≤5 cm, Grade I, T_stage T1, N_stage N0, Radiotherapy No, Chemotherapy Yes, Surgery Yes. Prediction result indicates a 54.66% probability of distant metastasis, categorized as high risk.

Figure 9. An online web-based calculator for predicting distant metastasis of renal cell carcinoma.

4 Discussion

The majority of RCC patients are diagnosed with localized disease, while a small proportion present with metastases at onset. However, up to 30% experience distant metastasis following radical resection (27, 28), and the exact molecular mechanisms remain poorly understood. The mRCC patients usually have poor clinical survival, despite the introduction of numerous new targeted and immunotherapeutic agents, patients inevitably develop resistance to these treatments (29, 30). A study by Pereira et al. (31) found that positron emission tomography-computed tomography (PET/CT) offers significantly higher specificity and negative predictive value than CT scans in detecting metastasis and recurrence in RCC patients. However, due to its high cost and the potential risk of radiation exposure, PET/CT is not commonly used for routine screening of distant metastases (32). Consequently, developing a clinical prediction model to identify high-risk RCC patients is crucial.

This study successfully developed and validated an interpretable machine learning model (XGBoost) for predicting the risk of distant metastasis in RCC patients, utilizing data from the SEER database and a single-center cohort in China The results demonstrate that the XGBoost model exhibits balanced and stable predictive performance in both the internal test set and external validation set, outperforming traditional statistical methods, and revealed the non-linear associations of key drivers such as chemotherapy, radiotherapy, and T-stage through the SHAP analysis. This result provides a new tool for early detection and individualized intervention of RCC metastasis, as well as a data-driven perspective for the exploration of tumor biological mechanisms.

The innovation of the present model, compared to previous studies, is evident in three key aspects: first, by integrating data from the SEER cohort (n = 51,566) and a Chinese validation cohort (n = 726), cross-regional and multi-center validation of the model’s generalization ability was achieved, although external validation performance (AUC = 0.742) declined compared to internal testing (AUC = 0.949), this discrepancy may stem from marked skews in both tumor histologic subtype distribution (93.4% being ICD-O-3 type 8310/3) and treatment patterns (100% surgical intervention) within the validation cohort, suggesting that model requires further optimization for heterogeneous populations. Additionally, the relatively small sample size of the Chinese cohort may exacerbate the impact of class imbalance on the model’s generalizability. Second, the category imbalance problem of distant metastasis samples was effectively alleviated by the SMOTE technique. Third, the web-based calculator developed for the first time transformed the XGBoost model into a visualization tool, providing clinicians with a dynamic risk assessment interface, which is in line with the requirement of “algorithmic transparency” in the International Guidelines for the Application of Artificial Intelligence in Medicine (33).

It is crucial to emphasize that the performance of external validation (AUC = 0.742) was lower than that of internal testing (AUC = 0.949). This inconsistency can primarily be attributed to differences in surgical management structures across study groups and the histological heterogeneity present. Specifically, Surgical Intervention Bias: The Chinese validation cohort exclusively included patients who underwent surgical treatment, resulting in a surgery rate of 100%. In contrast, the SEER dataset encompasses a heterogeneous real-world population that includes non-surgical management of advanced cases. This bias arises from variations in clinical practices and inherently limits the generalizability of the model. Dominance of Histological Subtypes: The validation cohort exhibited a significantly higher proportion of clear cell carcinom, while the SEER dataset comprised a diverse range of histological types. Different subtypes may demonstrate distinct treatment responses and prognostic trajectories, contributing to instability in the model’s performance. T-stage Distribution: The proportion of early-stage tumors is notably higher in the Chinese cohort (87.1% T1) compared to the SEER dataset (68.4%). This discrepancy may reflect a referral pattern that is biased toward localized disease.

External validation is crucial for accurately assessing the reliability of risk prediction models; failure to conduct appropriate external validation can lead to misinterpretation of model performance (34, 35). In this study, we acknowledge the presence of selection bias within the validation cohort, which demonstrates certain differences when compared to the SEER dataset. The lack of adequate external validation may limit the model’s applicability in specific populations. Consequently, to mitigate the selection bias present in the current analysis, we plan to strengthen our external validation efforts in future studies to enhance the model’s reliability. We will consider incorporating a broader patient sample, including data from multiple centers across China and European databases, such as the European RECUR database, to achieve more comprehensive external validation.

Shapley’s Additive Explanation analysis revealed two levels of predictive mechanisms: Treatment-related factors. The analysis revealed that chemotherapy and radiotherapy emerged as the most significant contributors to metastatic risk, a finding that aligns closely with the immunomodulatory dynamics of the RCC microenvironment. Mechanistically, chemotherapy may potentiate immunogenicity by inducing tumor cell release of neoantigenic epitopes (36), while radiotherapy could paradoxically activate pro-metastatic inflammatory cytokine cascades (37), underscoring the dual-edged role of conventional therapies in modulating metastatic propensity; Markers of tumor heterogeneity: Tumor size > 5 cm and high-grade pathological classification (Grade III–IV) are significant factors in the distant metastasis of renal cancer, suggesting that large-volume tumors may induce epithelial-mesenchymal transition (EMT) through mechanical stress, whereas high-grade is associated with vasculogenic mimicry (38, 39). It is noteworthy that surgical intervention was identified as a protective factor; however, its protective efficacy was markedly attenuated in the high-risk metastasis subgroup. This phenomenon may be attributed to the preoperative dissemination of occult micrometastases, necessitating further validation through dynamic monitoring of circulating tumor DNA (ctDNA) (40).

Despite the rigorous study design, this study has the following limitations: Data level: Firstly, the SEER database lacks radiogenomic, genomics, and immunotherapy data, which limits the integration of multimodal information. Additionally, the absence of comprehensive details regarding radiotherapy and chemotherapy protocols restricts our understanding of treatment impacts. Furthermore, the ethnic homogeneity (100% Asian) in the validation cohort may affect the applicability of the model in different populations. To address these gaps, future studies should integrate multi-source datasets (e.g., National Cancer Database, institutional electronic health records) to enrich the SEER dataset with multimodal information and specific treatment regimens. This integration will facilitate a more nuanced assessment of how distinct treatment modalities and regimens influence distant metastasis in renal cell carcinoma, ultimately enhancing the model’s applicability across various population groups. Methodological level: while SMOTE alleviates class imbalance, it may induce synthetic sample bias (41); SHAP interpretation provides only static feature importance and fails to reveal time-dependent metastasis-driven mechanisms (42). Additionally, this study primarily focused on traditional machine learning models. Future research should explore the potential of deep learning architectures like TabNet, particularly on larger datasets. Such models could provide valuable insights into improving predictive accuracy while maintaining interpretability. To address this limitation, future research plans to implement a time-based interpretability framework designed to capture the spatiotemporal effects of treatment characteristics on the mechanisms of metastasis. Our approach will specifically encompass: Dynamic Time Series Modeling: We will integrate Long Short-Term Memory (LSTM) networks with XGBoost to effectively analyze longitudinal data, including sequential tumor markers and treatment regimens. To enhance our analysis, we will collaborate with multi-center medical institutions to obtain comprehensive treatment timelines that include specific start dates for adjuvant therapies and detailed drug regimens. This integration will enable our model to identify and learn time-dependent patterns associated with risk factors, thereby providing deeper insights into how the timing of treatments influences metastatic outcomes. Clinical translational level: the current model relies mainly on traditional clinical metrics (such as tumor size and staging) but does not incorporate some of the most recent detection metrics (such as PD-L1 protein level and VHL mutation status) (43). These metrics can help to determine the patient’s response to precision drug therapies (such as targeted agents and immunotherapies), whereas omitting them may lead to the model’s inability to accurately predict metastatic risk in patients receiving novel therapies. In the future, we can use single-cell sequencing technology to analyze the genetic changes of each cancer cell at different stages and draw a dynamic “genetic map” of tumors from early stage to metastasis; at the same time, we build an intelligent risk warning tool, embedding the model into the hospital’s electronic medical record system, automatically integrating the patient’s latest examination data (such as serum biomarkers, and radiographic findings), enabling real-time metastasis risk stratification, and assisting doctors in adjusting the treatment plan.

This study confirms the practical value of interpretable ML in predicting RCC metastasis. The XGB model developed in this study not only surpasses the limitations of conventional prognostic tools but also provides clinical interpretability through the SHAP framework. The XGB model provides a new tool for individualized prediction of the risk of distant metastasis in renal cancer patients, and by identifying high-risk patients, clinicians can formulate more active follow-up and treatment strategies to improve patients’ prognosis. Future directions include prospective multicenter validation to assess the model’s dynamic predictive ability, further optimization of its performance, and integration into clinical decision support systems to enhance precision oncology strategies for RCC.

5 Conclusion

This study confirms the practical value of interpretable machine learning in RCC metastasis prediction, and the XGBoost model constructed by it not only overcomes limitations of conventional prognostic tools but also provides clinical interpretability through the SHAP framework. The XGBoost model provides a new tool for individualized prediction of the risk of distant metastasis in renal cancer patients, and by identifying high-risk patients, clinicians can formulate more active follow-up and treatment strategies to improve patients’ prognosis. Future directions include prospective multicenter validation to verify the dynamic predictive ability of the model, continue to optimize the model performance, and integrate it into the clinical decision support system, to assist in the precision oncology strategies for RCC.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement

The studies involving humans were approved by Ethics Committee of First Hospital of Shanxi Medical University. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.

Author contributions

ZH: Conceptualization, Writing – original draft, Investigation, Funding acquisition, Writing – review and editing, Formal Analysis, Data curation. PW: Formal Analysis, Data curation, Conceptualization, Writing – review and editing, Investigation. DL: Investigation, Data curation, Conceptualization, Writing – review and editing, Formal Analysis. HZ: Writing – review and editing, Data curation, Investigation, Conceptualization. ZG: Investigation, Writing – review and editing, Conceptualization, Data curation. JL: Writing – review and editing, Investigation, Conceptualization, Data curation. MJ: Writing – review and editing, Conceptualization, Data curation, Investigation. HD: Data curation, Investigation, Conceptualization, Writing – review and editing. WS: Resources, Project administration, Writing – review and editing, Methodology, Supervision.

Funding

The author(s) declare that no financial support was received for the research and/or publication of this article.

Acknowledgments

We thank all individuals who took part in this research.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The authors declare that no Generative AI was used in the creation of this manuscript.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmed.2025.1624198/full#supplementary-material

Abbreviations

RCC, renal cell carcinoma; SEER, Surveillance, Epidemiology, and End Results; LR, Logistic Regression; DT, Decision Trees; RF, Random Forests; NBC, Naive Bayes; KNN, K Nearest Neighbors; SVM, Support Vector Machines; Enet, Elastic Networks; MLP, Multilayer Perceptrons; XGB, Extreme Gradient Boosting; LightGBM, Lightweight Gradient Boosting Machine; ROC, receiver operating characteristic; PR, Precision-Recall; DCA, Decision Curve Analysis; SHAP, Shapley’s Additive Explanation.

Footnotes

References

1. Powles T, Albiges L, Bex A, Comperat E, Grünwald V, Kanesvaran R, et al. Renal cell carcinoma: esmo Clinical Practice Guideline for diagnosis, treatment and follow-up. Ann Oncol. (2024) 35:692–706. doi: 10.1016/j.annonc.2024.05.537

PubMed Abstract | Crossref Full Text | Google Scholar

2. Siegel R, Giaquinto A, Jemal A. Cancer statistics, 2024. CA Cancer J Clin. (2024) 74:12–49. doi: 10.3322/caac.21820

PubMed Abstract | Crossref Full Text | Google Scholar

3. Bukavina L, Bensalah K, Bray F, Carlo M, Challacombe B, Karam J, et al. Epidemiology of renal cell carcinoma: 2022 Update. Eur Urol. (2022) 82:529–42. doi: 10.1016/j.eururo.2022.08.019

PubMed Abstract | Crossref Full Text | Google Scholar

4. Escudier B, Porta C, Schmidinger M, Rioux-Leclercq N, Bex A, Khoo V, et al. Renal cell carcinoma: esmo Clinical Practice Guidelines for diagnosis, treatment and follow-up. Ann Oncol. (2019) 30:706–20. doi: 10.1093/annonc/mdz056

PubMed Abstract | Crossref Full Text | Google Scholar

5. Chen Y, Wang L, Panian J, Dhanji S, Derweesh I, Rose B, et al. Treatment landscape of renal cell carcinoma. Curr Treat Options Oncol. (2023) 24:1889–916. doi: 10.1007/s11864-023-01161-5

PubMed Abstract | Crossref Full Text | Google Scholar

6. Tannir N, Pal S, Atkins M. Second-line treatment landscape for renal cell carcinoma: a comprehensive review. Oncologist. (2018) 23:540–55. doi: 10.1634/theoncologist.2017-0534

PubMed Abstract | Crossref Full Text | Google Scholar

7. Procházková K, Vodièka J, Fichtl J, Krákorová G, Šebek J, Roušarová M, et al. Outcomes for patients after resection of pulmonary metastases from clear cell renal cell carcinoma: 18 years of experience. Urol Int. (2019) 103:297–302. doi: 10.1159/000502493

PubMed Abstract | Crossref Full Text | Google Scholar

8. Tadayoni A, Paschall A, Malayeri A. Assessing lymph node status in patients with kidney cancer. Transl Androl Urol. (2018) 7:766–73. doi: 10.21037/tau.2018.07.19

PubMed Abstract | Crossref Full Text | Google Scholar

9. Alt A, Boorjian S, Lohse C, Costello B, Leibovich B, Blute M. Survival after complete surgical resection of multiple metastases from renal cell carcinoma. Cancer. (2011) 117:2873–82. doi: 10.1002/cncr.25836

PubMed Abstract | Crossref Full Text | Google Scholar

10. Ting Sim J, Fong Q, Huang W, Tan C. Machine learning in medicine: what clinicians should know. Singapore Med J. (2023) 64:91–7. doi: 10.11622/smedj.2021054

PubMed Abstract | Crossref Full Text | Google Scholar

11. Ngiam K, Khor I. Big data and machine learning algorithms for health-care delivery. Lancet Oncol. (2019) 20:e262–73. doi: 10.1016/S1470-2045(19)30149-4

PubMed Abstract | Crossref Full Text | Google Scholar

12. Handelman G, Kok H, Chandra R, Razavi A, Lee M, Asadi H. eDoctor: machine learning and the future of medicine. J Intern Med. (2018) 284:603–19. doi: 10.1111/joim.12822

PubMed Abstract | Crossref Full Text | Google Scholar

13. Suarez-Ibarrola R, Hein S, Reis G, Gratzke C, Miernik A. Current and future applications of machine and deep learning in urology: a review of the literature on urolithiasis, renal cell carcinoma, and bladder and prostate cancer. World J Urol. (2019) 38:2329–47. doi: 10.1007/s00345-019-03000-5

PubMed Abstract | Crossref Full Text | Google Scholar

14. Mao W, Deng F, Wang D, Gao L, Shi X. Treatment of advanced gallbladder cancer: a SEER-based study. Cancer Med. (2020) 9:141–50. doi: 10.1002/cam4.2679

PubMed Abstract | Crossref Full Text | Google Scholar

15. Schober P, Vetter T. Logistic regression in medical research. Anesth Analg. (2021) 132:365–6. doi: 10.1213/ANE.0000000000005247

PubMed Abstract | Crossref Full Text | Google Scholar

16. Ghiasi M, Zendehboudi S, Mohsenipour A. Decision tree-based diagnosis of coronary artery disease: cart model. Comput Methods Programs Biomed. (2020) 192:105400. doi: 10.1016/j.cmpb.2020.105400

PubMed Abstract | Crossref Full Text | Google Scholar

17. Hu J, Szymczak S. A review on longitudinal data analysis with random forest. Brief Bioinform. (2023) 24:bbad002. doi: 10.1093/bib/bbad002

PubMed Abstract | Crossref Full Text | Google Scholar

18. Minnier J, Yuan M, Liu J, Cai T. Risk classification with an adaptive Naive Bayes Kernel machine model. J Am Stat Assoc. (2015) 110:393–404. doi: 10.1080/01621459.2014.908778

PubMed Abstract | Crossref Full Text | Google Scholar

19. Zhang Z. Introduction to machine learning: k-nearest neighbors. Ann Transl Med. (2016) 4:218. doi: 10.21037/atm.2016.03.37

PubMed Abstract | Crossref Full Text | Google Scholar

20. Valkenborg D, Rousseau A, Geubbelmans M, Burzykowski T. Support vector machines. Am J Orthod Dentofacial Orthop. (2023) 164:754–7. doi: 10.1016/j.ajodo.2023.08.003

PubMed Abstract | Crossref Full Text | Google Scholar

21. Togashi Y, Flechsig H. Coarse-grained protein dynamics studies using elastic network models. Int J Mol Sci. (2018) 19:3899. doi: 10.3390/ijms19123899

PubMed Abstract | Crossref Full Text | Google Scholar

22. Castro W, Oblitas J, Santa-Cruz R, Avila-George H. Multilayer perceptron architecture optimization using parallel computing techniques. PLoS One. (2017) 12:e0189369. doi: 10.1371/journal.pone.0189369

PubMed Abstract | Crossref Full Text | Google Scholar

23. Babajide Mustapha I, Saeed F. Bioactive molecule prediction using extreme gradient boosting. Molecules. (2016) 21:983. doi: 10.3390/molecules21080983

PubMed Abstract | Crossref Full Text | Google Scholar

24. Zhao C, Wu D, Huang J, Yuan Y, Zhang H, Peng R, et al. BoostTree and BoostForest for ensemble learning. IEEE Trans Pattern Anal Mach Intell. (2023) 45:8110–26. doi: 10.1109/TPAMI.2022.3227370

PubMed Abstract | Crossref Full Text | Google Scholar

25. Siemers F, Bajorath J. Differences in learning characteristics between support vector machine and random forest models for compound classification revealed by Shapley value analysis. Sci Rep. (2023) 13:5983. doi: 10.1038/s41598-023-33215-x

PubMed Abstract | Crossref Full Text | Google Scholar

26. Fotouhi S, Asadi S, Kattan MW. A comprehensive data level analysis for cancer diagnosis on imbalanced data. J Biomed Inform. (2019) 90:103089. doi: 10.1016/j.jbi.2018.12.003

PubMed Abstract | Crossref Full Text | Google Scholar

27. Gulati S, Barata P, Elliott A, Bilen M, Burgess E, Choueiri T, et al. Molecular analysis of primary and metastatic sites in patients with renal cell carcinoma. J Clin Invest. (2024) 134:e176230. doi: 10.1172/JCI176230

PubMed Abstract | Crossref Full Text | Google Scholar

28. Gong J, Maia M, Dizman N, Govindarajan A, Pal S. Metastasis in renal cell carcinoma: biology and implications for therapy. Asian J Urol. (2016) 3:286–92. doi: 10.1016/j.ajur.2016.08.006

PubMed Abstract | Crossref Full Text | Google Scholar

29. Yue Y, Hui K, Wu S, Zhang M, Que T, Gu Y, et al. MUC15 inhibits cancer metastasis via PI3K/AKT signaling in renal cell carcinoma. Cell Death Dis. (2020) 11:336. doi: 10.1038/s41419-020-2518-9

PubMed Abstract | Crossref Full Text | Google Scholar

30. Barata P, Rini B. Treatment of renal cell carcinoma: current status and future directions. CA Cancer J Clin. (2017) 67:507–24. doi: 10.3322/caac.21411

PubMed Abstract | Crossref Full Text | Google Scholar

31. Pereira M, Punatar C, Singh N, Sagade S. Role of 18F-FDG PET/CT for detection of recurrence and metastases in renal cell carcinoma-are we underusing PET/CT? Diagn Intervent Radiol. (2022) 28:498–502. doi: 10.5152/dir.2022.21096

PubMed Abstract | Crossref Full Text | Google Scholar

32. Rodríguez-Fraile M, Cózar-Santiago M, Sabaté-Llobera A, Caresia-Aróztegui A, Delgado Bolton R, Orcajo-Rincon J, et al. FDG PET/CT in colorectal cancer. Rev Esp Med Nucl Imagen Mol. (2020) 39:57–66. doi: 10.1016/j.remn.2019.09.009

PubMed Abstract | Crossref Full Text | Google Scholar

33. Wang Y, Li N, Chen L, Wu M, Meng S, Dai Z, et al. Guidelines, consensus statements, and standards for the use of artificial intelligence in medicine: systematic review. J Med Internet Res. (2023) 25:e46089. doi: 10.2196/46089

PubMed Abstract | Crossref Full Text | Google Scholar

34. Ramspek C, Jager K, Dekker F, Zoccali C, van Diepen M. External validation of prognostic models: what, why, how, when and where? Clin Kidney J. (2021) 14:49–58. doi: 10.1093/ckj/sfaa188

PubMed Abstract | Crossref Full Text | Google Scholar

35. Siontis G, Tzoulaki I, Castaldi P, Ioannidis J. External validation of new risk prediction models is infrequent and reveals worse prognostic discrimination. J Clin Epidemiol. (2015) 68:25–34. doi: 10.1016/j.jclinepi.2014.09.007

PubMed Abstract | Crossref Full Text | Google Scholar

36. Rini B, Plimack E, Stus V, Gafanov R, Hawkins R, Nosov D, et al. Pembrolizumab plus Axitinib versus sunitinib for advanced renal-cell carcinoma. N Eng J Med. (2019) 380:1116–27. doi: 10.1056/NEJMoa1816714

PubMed Abstract | Crossref Full Text | Google Scholar

37. Vanpouille-Box C, Alard A, Aryankalayil M, Sarfraz Y, Diamond J, Schneider R, et al. DNA exonuclease Trex1 regulates radiotherapy-induced tumour immunogenicity. Nat Commun. (2017) 8:15618. doi: 10.1038/ncomms15618

PubMed Abstract | Crossref Full Text | Google Scholar

38. Pastushenko I, Mauri F, Song Y, de Cock F, Meeusen B, Swedlund B, et al. Fat1 deletion promotes hybrid EMT state, tumour stemness and metastasis. Nature. (2021) 589:448–55. doi: 10.1038/s41586-020-03046-1

PubMed Abstract | Crossref Full Text | Google Scholar

39. Williamson S, Metcalf R, Trapani F, Mohan S, Antonello J, Abbott B, et al. Vasculogenic mimicry in small cell lung cancer. Nat Commun. (2016) 7:13322. doi: 10.1038/ncomms13322

PubMed Abstract | Crossref Full Text | Google Scholar

40. Choueiri T, Tomczak P, Park S, Venugopal B, Ferguson T, Chang Y, et al. Adjuvant pembrolizumab after nephrectomy in renal-cell carcinoma. N Engl J Med. (2021) 385:683–94. doi: 10.1056/NEJMoa2106391

PubMed Abstract | Crossref Full Text | Google Scholar

41. Blagus R, Lusa L. SMOTE for high-dimensional class-imbalanced data. BMC Bioinformatics. (2013) 14:106. doi: 10.1186/1471-2105-14-106

PubMed Abstract | Crossref Full Text | Google Scholar

42. Lundberg S, Erion G, Chen H, DeGrave A, Prutkin J, Nair B, et al. From local explanations to global understanding with explainable AI for trees. Nat Mach Intell. (2020) 2:56–67. doi: 10.1038/s42256-019-0138-9

PubMed Abstract | Crossref Full Text | Google Scholar

43. Motzer R, Banchereau R, Hamidi H, Powles T, McDermott D, Atkins M, et al. Molecular subsets in renal cancer determine outcome to checkpoint and angiogenesis blockade. Cancer Cell. (2020) 38:803–17.e4. doi: 10.1016/j.ccell.2020.10.011

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: renal cell carcinoma, distant metastasis, machine learning, predictive modeling, external validation, web-based calculator

Citation: Hou Z, Wang P, Lv D, Zhou H, Guo Z, Li J, Jia M, Du H and Shuang W (2025) Explainable machine learning for predicting distant metastases in renal cell carcinoma patients: a population-based retrospective study. Front. Med. 12:1624198. doi: 10.3389/fmed.2025.1624198

Received: 08 May 2025; Accepted: 14 July 2025;
Published: 29 July 2025.

Edited by:

Wenle Li, Xiamen University, China

Reviewed by:

Jian Liu, Fudan University, China
Weixing Jiang, Capital Medical University, China
Yi Chen, Xinjiang Medical University, China
Keyue Yan, University of Macau, China

Copyright © 2025 Hou, Wang, Lv, Zhou, Guo, Li, Jia, Du and Shuang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Weibing Shuang, c2h1YW5nd2VpYmluZ0AxMjYuY29t

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.