Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Oncol., 19 September 2025

Sec. Genitourinary Oncology

Volume 15 - 2025 | https://doi.org/10.3389/fonc.2025.1636358

This article is part of the Research TopicBladder Cancer Awareness Month 2025: Current Developments and Insights in the Treatment of Bladder CancerView all 4 articles

Multimodal prognostic models for bladder urothelial carcinoma: uroplakin III combined with serum and demographic data

Runlin Feng&#x;Runlin Feng1†Jian Hou&#x;Jian Hou2†Yanping Tao&#x;Yanping Tao3†Yumin Wang&#x;Yumin Wang2†Songzhou LiSongzhou Li2Xingyuan DongXingyuan Dong2Wenlin Tai*Wenlin Tai4*
  • 1Department of Pathology, The Second Affiliated Hospital of Kunming Medical University, Kunming, Yunnan, China
  • 2Department of Urology, The First Affiliated Hospital of Kunming Medical University, Kunming, Yunnan, China
  • 3Department of Emergency Medicine, Kunming Third People’s Hospital, Kunming, Yunnan, China
  • 4Department of Laboratory Medicine, The Second Affiliated Hospital of Kunming Medical University, Kunming, Yunnan, China

Background: Bladder urothelial carcinoma (BUC) remains a highly recurrent and heterogeneous malignancy. Accurate postoperative risk stratification is crucial to guide adjuvant therapy decisions. We hypothesized that integrating Uroplakin III (UPK3A protein)protein expression with systemic inflammation markers and demographic factors could improve prognostic prediction through advanced machine learning(ML) models.

Methods: This retrospective study analyzed 1,032 BUC patients who underwent radical cystectomy. Clinical, pathological, and serological data, including immunohistochemical UPK3A protein expression, were collected. Least Absolute Shrinkage and Selection Operator (LASSO) regression with λ=0.009 (determined via 10-fold cross-validation) was used for feature selection. Nine ML models were trained and validated. Model performance was assessed using Area Under the Receiver Operating Characteristic Curve (AUC-ROC), calibration curves, decision curve analysis (DCA), and clinical impact curves (CIC). Model interpretability was evaluated with SHapley Additive exPlanations (SHAP).

Results: Light Gradient Boosting Machine(LightGBM), Random Forest(RF), and Extreme Gradient Boosting (XGBoost) models demonstrated superior performance (AUCs: 0.894/0.754 for RF in training/test sets). SHAP analysis highlighted vascular invasion, tumor necrosis, and UPK3A protein as key predictors. CIC demonstrated strong clinical utility. Integrating UPK3A protein with inflammatory and demographic variables outperformed traditional models.

Conclusions: The combination of UPK3A protein expression with multimodal features significantly enhances prognostic modeling in BUC. This approach offers a promising clinical decision support tool to stratify risk and guide postoperative management. Future studies should incorporate transcriptomic/proteomic data to further validate these findings.

1 Introduction

BUC is one of the most prevalent malignancies in the urinary system, with high recurrence, progression rates, and significant variability in prognosiss (1). While radical cystectomy remains the standard treatment for muscle-invasive bladder cancer (MIBC) and high-risk non-muscle invasive bladder cancer (NMIBC), a substantial proportion of patients experience recurrence or metastasis after surgery, leading to marked differences in outcomes (2). Hence, precise postoperative risk prediction is critical for tailoring patient management and informing adjuvant therapy decisions.

Traditional prognostic assessments primarily rely on tumor staging, pathological grading, vascular invasion, and lymph node metastasis. However, such models often overlook the combined impact of tumor molecular characteristics and host-specific factors (3). Recently, UPK3A, a structural protein specifically expressed on the membrane of urothelial cells, has gained widespread use in bladder cancer diagnostic research. Increasing evidence suggests that UPK3A protein expression not only holds significant diagnostic value but may also be closely associated with the invasiveness, progression, and prognosis of bladder cancer (4, 5). Additionally, serum markers (such as white blood cell count, albumin levels, and urine analysis parameters) reflect the host’s systemic inflammatory state and immune response, which are also recognized to play critical roles in bladder cancer prognosis (68). Demographic factors, including age, gender, and smoking history, as basic clinical information, also influence tumor development and patient survival outcomes (9, 10).

In recent years, ML techniques have demonstrated powerful capabilities in constructing medical prediction models by integrating multidimensional, complex data features and capturing underlying patterns that traditional statistical methods may miss (11, 12). Tree-based algorithms, such as XGBoost, random forest (RF), and LightGBM, have shown superior performance in cancer prognosis prediction (1315).However, there is a lack of comprehensive prognostic models based on the integration of UPK3A protein, serum markers, and demographic features, and systematic studies on their clinical application value and interpretability remain limited.

Therefore, this study was conducted using a large cohort of bladder cancer patients from two affiliated hospitals of Kunming Medical University. We systematically collected data on UPK3A protein expression levels, serum markers, and demographic features, and employed LASSO regression for feature selection. A personalized prognostic prediction model was developed using various ML algorithms, including XGBoost, RF, and LightGBM. Model performance was evaluated through ROC curves, calibration plots, DCA, and CIC, with SHAP analysis enhancing model interpretability. The aim of this study was to explore the potential value of UPK3A protein combined with multiple parameters for predicting prognosis in BUC, facilitating precise postoperative risk assessment and the development of individualized management strategies.

2 Materials and methods

The methodology consists of three components: data preprocessing and feature extraction (Section 2.1), model construction and validation (Section 2.2), and reproducibility documentation and implementation environment (Section 2.3).

2.1 Data collection and processing

We collected inpatient data from 1,764 patients diagnosed with bladder cancer and undergoing radical cystectomy at the First Affiliated Hospital and the Second Affiliated Hospital of Kunming Medical University between 2014 and 2024. The dataset included demographic characteristics (gender, age, ethnicity, weight, smoking, alcohol consumption, etc.), medical history (hypertension, diabetes, hematuria, frequency, urgency, dysuria, difficulty in urination, and previous surgeries), tumor morphological features (tumor shape, diameter, location, number, presence of a base, boundary clarity, color, texture, presence of necrosis, bleeding, and cystic lesions), as well as pathological features assessed by immunohistochemistry, including UPK3A, GATA Binding Protein 3 (GATA3), Cytokeratin 20 (CK20), Cytokeratin 7 (CK7), Cytokeratin 5/6 (CK5/6), Tumor Protein 63 (P63), Tumor Protein 53 (P53), Androgen Receptor (AR), Programmed Death-Ligand 1 (PD-L1), microsatellite stability, Human Epidermal Growth Factor Receptor 2 (HER2), nerve invasion, vascular invasion, pathological staging, grading, positive surgical margins, and histological types such as squamous, glandular, neuroendocrine, and sarcomatoid.

Inclusion criteria for participants were as follows: (1) patients aged over 18 years; (2) diagnosis of bladder cancer according to the WHO Classification of Tumors of the Urinary and Male Genital Systems (4th Edition), and receipt of radical cystectomy; (3) complete clinical data, including blood count, biochemical tests, pathological parameters, and immunohistochemistry; (4) detailed treatment history with complete follow-up data and results; (5) no prior radiotherapy, chemotherapy, or immunotherapy.

Patients who met any of the following criteria were excluded from the study: (1) patients who underwent partial bladder tumor resection; (2) post-operative pathology confirmed non-urothelial carcinoma; (3) incomplete clinical data or lost to follow-up with no available prognostic data; (4) preoperative radiotherapy, chemotherapy, or immunotherapy; (5) other malignancies metastasized to the bladder; (6) patients under 18 years of age; (7) patients with survival time less than 1 month. The study was approved by the Ethics Committees of the First Affiliated Hospital and the Second Affiliated Hospital of Kunming Medical University, with informed consent obtained from all patients.

To minimize the impact of missing data on model construction, we used the K-Nearest Neighbors (KNN) Imputer method to impute missing data (less than 20% missing), while data with more than 20% missing were excluded. The primary endpoint was the response to postoperative adjuvant therapy, as recorded in the patients’ follow-up treatment records. Missing values were imputed using the `KNNImputer` algorithm (version 0.24.2, scikit-learn), with the number of neighbors set to 5. Variables with more than 20% missing data were excluded from model construction to reduce bias. Continuous variables were standardized using z-score normalization, and categorical variables were encoded using one-hot encoding prior to model input.

2.2 Statistical analysis and model construction and validation

Categorical variables are presented as percentages (%) and compared between groups using Pearson’s chi-square test. Due to the imbalance in the dependent variable categories, an undersampling method was applied to resample the data and balance the distribution. A five-fold cross-validation was used to split the dataset into training and internal validation sets. In the case of high-dimensional features, LASSO regression was employed for feature selection. This method applies L1 regularization to shrink regression coefficients, reducing dimensionality, selecting the most informative variables, and eliminating redundant features.

Nine ML algorithms were used for predictive modeling, including XGBoost, support vector machine (SVM), multilayer perceptron (MLP), KNN, logistic regression, LASSO, decision tree (DT), gradient boosting machine (GBM), and RF. All models incorporated the features selected by LASSO. A single cross-validation was performed to ensure model stability. Grid search optimization was applied to fine-tune hyperparameters, and the model with the highest area under the AUC-ROC curve was selected as the optimal model. The final model was constructed on the training set and validated on both internal and external validation sets. Model performance was evaluated using AUC-ROC, sensitivity, specificity, recall, F1 score, and accuracy. All machine learning models were implemented using Python 3.8 with `scikit-learn` (v0.24.2), `xgboost` (v1.5.0), `lightgbm` (v3.3.1), and `shap` (v0.41.0). LASSO regression for feature selection was performed using `LassoCV` from `scikit-learn`, with 10-fold cross-validation to determine the optimal lambda value (λ = 0.009), minimizing binomial deviance. Model hyperparameters (e.g., learning_rate, n_estimators, max_depth) were tuned using `GridSearchCV` with five-fold cross-validation. The hyperparameter configurations for each model are provided in Supplementary Table S1.

Additionally, to assess the real clinical utility of the model, DCA and calibration curves were plotted. To identify the optimal clinical decision threshold, a clinical impact curve (CIC) was constructed to visually assess the most effective decision threshold. The threshold was derived using the “surv_cutpoint” function in the survminer R package to maximize survival difference. To analyze the impact of the selected features on the model predictions, SHAP analysis was used. SHAP summary plots were generated to show the contribution of each feature to the prediction results, and specific cases were evaluated using SHAP to illustrate the degree of impact of selected features on the predictions. All statistical analyses were conducted in Python, with two-sided p-values < 0.05 considered statistically significant.

(The flowchart of this research is shown in Figure 1)

Figure 1
Flowchart illustrating the workflow of a multi-center study on radical bladder cancer surgery in China from 2014 to 2024. The figure shows patient enrollment, exclusion criteria, and clinical feature analysis. Patients (N=1784) were randomly divided into training and validation cohorts (4:6). Statistical filtering (p < 0.05) was applied to select relevant features, which were then used in multiple machine learning algorithms, including XGBoost, Logistic Regression, Lasso, SVM, KNN, RF, LightGBM, and MLP. The process ultimately produced an optimum predictive model.

Figure 1. Workflow of model development and validation.

2.3 Model reproducibility and technical implementation

All computational procedures were implemented using Python 3.8. Machine learning models, including LightGBM, XGBoost, Random Forest, SVM, and others, were built using the scikit-learn (v0.24.2), xgboost (v1.5.0), and lightgbm (v3.3.1) packages. Data imputation was performed using KNNImputer with default parameters (n_neighbors=5), and variables with more than 20% missing data were excluded. Continuous variables were standardized using z-score normalization, and categorical variables were transformed by one-hot encoding.

LASSO regression for feature selection was conducted using LassoCV from scikit-learn with 10-fold cross-validation to determine the optimal penalty parameter (λ = 0.009), based on minimum binomial deviance. Model hyperparameters were optimized via GridSearchCV with five-fold internal cross-validation. The detailed hyperparameter settings for each model are provided in Supplementary Table S1.

Model performance was evaluated using metrics such as accuracy, sensitivity, specificity, precision, negative predictive value (NPV), F1-score, Youden index, and AUC-ROC. All metrics were computed using functions from scikit-learn.metrics. To interpret feature contributions, SHAP (SHapley Additive exPlanations) values were calculated using the TreeExplainer module from the shap Python package (v0.41.0). Both global summary plots and individual force plots were generated to visualize model decision logic and feature importance.

All analytical pipelines, including data preprocessing, model training, evaluation, and SHAP analysis, were version-controlled and archived. The complete source code and training-validation splits are available from the corresponding author upon reasonable request, ensuring full reproducibility.

3 Result

3.1 Lasso regression for key variable selection and optimization of BUC prognostic prediction model

The complete analytical process of this study is illustrated in Figure 1. In this study, based on data from 1,674 bladder cancer patients at two affiliated hospitals of Kunming Medical University, a final cohort of 1,032 eligible cases was included. These cases were randomly divided into a training cohort (N=412) and a validation cohort (N=620) in a 4:6 ratio. Univariate analysis identified clinical features associated with patient outcomes, including age, urinary urgency, dysuria, tumor necrosis, perineural invasion, vascular invasion, tumor diameter, tumor location, tissue texture, as well as several blood and urine markers (such as creatinine, neutrophil count, and leukocyte esterase), which showed significant differences between outcome groups (P<0.05) (Table 1).

Table 1
www.frontiersin.org

Table 1. Baseline Uroplakin III, serum and demographic date in bladder.

Table 1: Baseline UPK3A, Serum and Demographic Date in Bladder cancer (Please refer to the attached file (Table 1. DOCX) for details).

Subsequently, clinical variables with statistical significance were selected through univariate analysis, and LASSO regression was applied for further feature selection. The LASSO path plot (Figure 2A) shows that as the regularization parameter λ increases, the regression coefficients of some features gradually converge to zero. The cross-validation curve (Figure 2B) determined that the optimal λ value was 0.009, which minimized the binomial deviance of the model. The final selected features included age, smoking history, positive urine bacterial culture, perineural invasion, vascular invasion, muscle layer invasion (M stage), UPK3A expression, tumor number, tumor boundary characteristics, and necrosis (Figure 2C). These features provided an essential foundation for subsequent model construction.

Figure 2
Graphs illustrating feature selection and model construction using LASSO regression. Panel A: LASSO coefficient profiles showing how the coefficients of 26 features shrink with increasing penalty, identifying key predictors. Panel B: Crossvalidation curve with binomial deviance, indicating the optimal lambda value selected to minimize deviance. Panel C: Nomogram model represented as a bar chart, displaying the contribution of selected clinical, pathological, and molecular features for individualized prediction.

Figure 2. Identification of predictive features using LASSO regression and construction of the nomogram model. (A) LASSO coefficient profiles: Displays how the coefficients of 26 features shrink with increasing penalty, identifying key predictors associated with treatment response. (B) Cross-validation plot: The optimal lambda (λ = 0.009) was selected using 10-fold cross-validation to minimize binomial deviance. (C) Nomogram model: A predictive nomogram was developed based on selected clinical, pathological, and molecular features to estimate individual response probabilities.

Among these factors, smoking history, urine leukocytes, and vascular invasion were considered key prognostic factors for BUC, as these features are likely closely related to the mechanisms of cancer development and progression. For example, smoking, a known risk factor for bladder cancer, may contribute to the malignant transformation of urothelial cells through the accumulation of carcinogens, while an increase in urine leukocytes may suggest the role of inflammatory responses in tumor progression.

3.2 Construction and performance evaluation of a machine learning-based prognostic prediction model for BUC

This study constructed multiple ML models (KNN, RF, XGBoost (XGB),SVM, Logistic Regression (LR), MLP, LightGBM, LASSO, and DT to predict the prognosis of BUC. The models’ performance was systematically evaluated using ROC curves, Calibration Curves, and DCA to identify the optimal predictive model.

Our results indicate that in both the training and validation sets, LightGBM, RF, and XGB models demonstrated excellent predictive performance. The training set AUCs were 0.894, 0.894, and 0.872, respectively (Figure 3A, Table 2), and the validation set AUCs were 0.741, 0.754, and 0.751, respectively (Figure 3B, Table 2). LightGBM and RF also outperformed other models in terms of Accuracy, Recall, and F1-Score. The confusion matrix (Table 3) further validated the stability of the models in classifying true positives (TP) and true negatives (TN).

Figure 3
Six panels showing evaluation of machine learning models. Panel A: ROC curves for the training set with multiple algorithms, demonstrating varying AUC values. Panel B: ROC curves for the test set, showing external model performance. Panel C: Calibration plot for the training set, comparing predicted versus actual probabilities. Panel D: Calibration plot for the test set, highlighting variability in agreement. Panel E: Decision curve analysis for the training set, displaying net benefit across threshold probabilities. Panel F: Decision curve analysis for the test set, illustrating clinical utility of different models.

Figure 3. (A, B) ROC curves: RF, LightGBM, and XGBoost models achieved superior AUCs, indicating excellent classification performance. (C, D) Calibration curves: Good agreement was observed in the training set, while the test set showed greater variability. (E, F) DCA: RF and LightGBM consistently provided the highest net benefit across decision thresholds.

Table 2
www.frontiersin.org

Table 2. Comparison of prediction performance of nine ML models.

Table 3
www.frontiersin.org

Table 3. Comparison of confusion matrix outputs for nine ML models in the training and testing sets.

The calibration performance of the models was assessed through Calibration Curves, and most models, particularly LightGBM and RF, showed good agreement between predicted probabilities and actual observations in both the training and validation sets (Figures 3C, D). DCA (Figures 3E, F) further demonstrated that LightGBM, RF, and XGB models provided higher net clinical benefits at various probability thresholds, suggesting that these models have substantial potential for practical clinical application.

In summary, this study constructed and validated a series of machine learning-based prognostic prediction models for BUC. XGBoost and LightGBM exhibited superior performance in classification (AUC), calibration (accuracy of predicted probabilities), and clinical DCA, making them suitable for prognostic prediction and personalized risk assessment in BUC patients. These results provide clinicians with effective risk stratification tools, helping to more accurately identify high-risk patients and formulate individualized treatment strategies.

3.3 Clinical application evaluation of the prognostic model for BUC based on CIC

The clinical application value of different ML models in predicting the prognosis of BUC was further evaluated using CIC. CIC primarily illustrates the number of patients predicted as high-risk at various risk thresholds and the number of those who actually experience the target event (e.g., disease recurrence or progression).

The analysis in this study shows that the LightGBM,RF, and XGBoost models were able to accurately identify a higher number of high-risk patients across various risk thresholds. Furthermore, the proportion of actual events (e.g., disease recurrence or progression) occurring among those predicted as high-risk was higher, with the curve trends closely mirroring the actual event occurrence curve, indicating their higher clinical application value. In contrast, the KNN and DT models showed considerable deviation from the actual results at medium and low-risk thresholds, with lower accuracy.

Overall, both RF and LightGBM models maintained a good balance between sensitivity and specificity across all risk thresholds, demonstrating the optimal clinical net benefit. These findings support the use of RF and LightGBM as the preferred models for prognostic risk stratification in BUC patients (Figure 4).

Figure 4
Clinical Impact Curves (CIC) showing model performance on training and test datasets for nine algorithms (KNN, DT, Lasso, LightGBM, Logistic, MLP, RF, SVM, and XGBoost). Each plot displays the number of patients predicted as high-risk (red curve) versus the number of actual events (blue curve) across varying risk thresholds. The comparison illustrates clinical applicability and highlights models such as RF, LightGBM, and XGBoost as having the best net accuracy and clinicalbenefit.

Figure 4. CIC analysis showed that RF, LightGBM, and XGBoost models consistently identified high-risk patients with better accuracy and clinical utility across both training and testing cohorts.

3.4 Feature contribution assessment of the prognostic model for BUC based on SHAP values

To enhance the interpretability of the model, SHAP analysis was applied in this study. This method quantifies the contribution of each feature to the model’s decision-making process, further exploring its clinical significance. SHAP values reflect the direction and magnitude of each variable’s impact on the model’s output, where positive SHAP values indicate that the variable increases the probability of predicting a high-risk outcome, and negative SHAP values reduce this probability.

The analysis revealed that, in the RF model, features such as vascular invasion, perineural invasion, muscle layer infiltration (M stage), tumor necrosis, tumor boundary clarity, urine leukocyte esterase positivity, and white blood cell count had the greatest impact on the model’s predictions. The direction of change in feature values was strongly correlated with the predicted risk levels (Figure 5). These findings not only enhance the biological interpretability of the model but also provide a theoretical basis for the postoperative management of bladder cancer patients in the future.

Figure 5
SHAP summary plot illustrating the contribution of each feature to model output. Each dot represents a SHAP value for an individual patient, with color indicating the original feature value (red = high, blue = low). Features such as Vascular Invasion, Perineural Invasion, and Necrosis have the highest impact with greater SHAP values, while features like Blood Pressure and Smoking contribute less overall. The horizontal axis shows SHAP values ranging from negative to positive, reflecting both direction and magnitude of feature effects. A vertical color bar on the right indicates the gradient from low to high feature values.

Figure 5. SHAP summary plot illustrating the contribution of each feature to model output. Each dot represents a SHAP value for an individual patient, with color indicating the original feature value (red: high, blue: low). Features are ranked by their mean absolute SHAP values, reflecting their relative importance in predicting prognosis. UPK3A expression, key serum biomarkers (e.g., White Blood Cell), and demographic variables (e.g., Age, Smoking) were among the top contributors. The direction and magnitude of each feature’s impact are visualized, providing insight into how individual predictors influence model decisions.

4 Discussion

In this study, we developed and validated an interpretable ML–based prognostic model for BUC, integrating UPK3A protein expression with systemic inflammatory markers and demographic data. Our findings support the utility of a multimodal approach to enhance the predictive power of postoperative survival and guide personalized treatment decisions.

In this study, UPK3A protein expression was evaluated at the protein level via immunohistochemical analysis, reflecting its established application in pathological diagnosis rather than gene expression profiling.UPK3A is a transmembrane glycoprotein that plays a pivotal role in maintaining urothelial barrier integrity. Traditionally recognized as a diagnostic marker of urothelial differentiation, recent evidence increasingly supports its role in cancer progression. Several studies have demonstrated that elevated expression of UPK3A protein is associated with advanced tumor stages, aggressive phenotypes, and shorter survival in BUC patients (1618). In our study, UPK3A protein overexpression, assessed via immunohistochemistry, was independently associated with poor overall survival. These results suggest a potential oncogenic role of UPK3A, possibly via dysregulation of p53 signaling, enhanced proliferation, or immune escape mechanisms.

The prognostic implications of UPK3A may be subtype-specific. In luminal bladder cancer subtypes, UPK3A protein overexpression has been correlated with distinct transcriptional programs and increased resistance to chemotherapy or immunotherapy (19). While UPK3A protein is not yet an established therapeutic target, its cell surface localization renders it an attractive candidate for antibody–drug conjugates (ADCs). Although the current ADC landscape in BUC predominantly focuses on HER2, Nectin-4, and Trop-2, the concept is extensible to other surface proteins such as UPK3A. Recent multicenter real-world studies, including those by Zeng et al. and Zhao et al., have demonstrated promising outcomes using neoadjuvant ADCs (e.g., RC48) in combination with immunotherapy for MIBC (20, 21). These findings highlight the translational potential of surface glycoproteins in guiding targeted therapies.

Beyond molecular markers, we incorporated systemic inflammatory indices such as neutrophil count and leukocyturia. These parameters reflect the host’s systemic immune state and are often indicative of a protumor inflammatory microenvironment. Elevated neutrophil-to-lymphocyte ratio and leukocyturia have been linked to poor prognosis in BUC and may represent surrogates for tumor-promoting inflammation or suppressed antitumor immunity (2224). Our inclusion of these routinely available laboratory indices adds pragmatic value to the model and facilitates integration into real-world clinical settings.

Model development followed a rigorous pipeline. LASSO regression was used for feature selection with λ = 0.009, followed by ensemble learning using LightGBM, XGBoost, and RF classifiers. All models achieved robust performance with AUCs above 0.74 in validation cohorts. DCA and CIC confirmed clinical utility, and SHAP values revealed that UPK3A protein expression, vascular invasion, and perineural infiltration contributed most to outcome prediction. This interpretability enhances clinical acceptability and transparency, aligning with the growing emphasis on explainable AI in medicine (25, 26).

Importantly, our model adheres to current ESMO recommendations advocating for multi-dimensional risk assessment in urothelial carcinoma (2729). By integrating tumor biology, host immunity, and clinicopathological variables, our approach surpasses conventional staging systems in granularity and predictive accuracy. Moreover, it supports the vision of precision medicine and data-driven oncology.

Nonetheless, certain limitations must be acknowledged. First, the retrospective design and single-region cohort may limit generalizability. Second, immunohistochemistry does not capture post-translational modifications or alternative splicing variants of UPK3A, which may influence functional outcomes. Third, although UPK3A expression was assessed via immunohistochemistry, the manuscript did not explicitly define the scoring criteria used. In our study, UPK3A staining was semi-quantitatively evaluated using the H-score system, which considers both staining intensity and the percentage of positive cells. However, inter-observer variability remains an inherent limitation of immunohistochemistry-based assessment. Although all immunohistochemistry slides were reviewed independently by two experienced pathologists, no inter-rater concordance coefficient (e.g., kappa statistic) was calculated. Future studies should standardize UPK3A scoring protocols and incorporate digital image analysis or automated quantification to reduce observer-related measurement bias. Fourth, while the model is robust, external multicenter validation in larger cohorts is warranted to confirm reproducibility. Future work should incorporate spatial transcriptomics, single-cell RNA sequencing, and proteogenomic profiling to unravel UPK3A-driven oncogenic networks and treatment resistance (3032). Additionally, functional studies investigating UPK3A silencing or antibody-mediated blockade may provide critical insights into its therapeutic potential. Finally, our study complements and extends prior real-world research on neoadjuvant therapies in BUC. Zeng et al. and Zhao et al. provided compelling clinical evidence supporting the integration of ADCs and immune checkpoint inhibitors in MIBC (20, 21). Our findings suggest that UPK3A, a luminal lineage marker, may serve as a future therapeutic candidate, particularly in cases unresponsive to conventional therapies. Our comprehensive analysis highlights the multifaceted role of UPK3A in bladder cancer pathogenesis and underscores the importance of incorporating molecular, inflammatory, and clinical data to refine prognostic modeling.

In summary, this study proposes a clinically interpretable and biologically informed prognostic model that underscores the prognostic significance of UPK3A overexpression in BUC. By integrating systemic inflammation, pathological features, and molecular markers, our findings extend previous real-world evidence and offer a foundation for stratified patient management and therapeutic innovation. These results support the future incorporation of UPK3A-guided algorithms into routine prognostic assessment, pending further prospective validation.

5 Conclusion

This study presents an interpretable, multimodal prognostic model for postoperative BUC by integrating UPK3A protein expression, systemic inflammatory markers, and clinicopathological features. The model demonstrated favorable predictive accuracy and clinical utility across both internal training and validation cohorts, with LightGBM, Random Forest, and XGBoost achieving optimal performance. Evaluation via AUC-ROC, calibration, DCA, and CIC confirmed its robustness and applicability in clinical decision-making. Notably, the independent prognostic relevance of UPK3A overexpression highlights its potential role as both a biomarker and therapeutic target. While derived from a single-center retrospective cohort, the model offers a pragmatic framework for individualized risk stratification in BUC. Future validation in multi-center, prospective cohorts and incorporation of dynamic and molecular data streams will be essential to further refine and clinically implement this approach.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement

The First Affiliated Hospital of Kunming Medical University granted ethical approval to conduct research in its facilities ((2021) Lun Shen L No. 33); The Second Affiliated Hospital of Kunming Medical University granted ethical approval to conduct research in its facilities (Shen-PJ-Ke-2024-199). The studies were conducted in accordance with the local legislation and institutional requirements. The ethics committee/institutional review board waived the requirement of written informed consent for participation from the participants or the participants’ legal guardians/next of kin because The requirement for written informed consent was waived by the Institutional Review Board due to the retrospective nature of the study. All data were obtained from existing clinical records and laboratory databases, and were fully de-identified prior to analysis. The study posed minimal risk to participants, involved no direct contact with patients, and did not affect patient care in any way.

Author contributions

RF: Conceptualization, Funding acquisition, Writing – review & editing, Methodology, Validation, Software. JH: Visualization, Methodology, Validation, Software, Conceptualization, Writing – review & editing. YT: Formal Analysis, Data curation, Writing – review & editing, Writing – original draft, Investigation. YW: Software, Writing – original draft, Writing – review & editing, Visualization, Data curation. SL: Data curation, Writing – review & editing. XD: Writing – review & editing, Data curation. WT: Resources, Supervision, Writing – review & editing, Conceptualization.

Funding

The author(s) declare financial support was received for the research and/or publication of this article. This manuscript was supported by Yunnan health training project of high level talents (Approval Number: H-2024017), Yunnan Fundamental Research Projects (grant NO.202501AT070489), Yunnan Fundamental Research Kunming Medical University Joint Projects (grant NO.202401AY070001-080), Talent Echelon Cultivation Project of the Second Affiliated Hospital of Kunming Medical University and Yunnan Province (RCTDHB-202305). The 2025 Graduate Education Innovation Fund of Kunming Medical University (Grant No. 2025B029).

Acknowledgments

We gratefully acknowledge the First and Second Affiliated Hospitals of Kunming Medical University, which provided valuable data and resources for this research.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Generative AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fonc.2025.1636358/full#supplementary-material

Abbreviations

ADC, Antibody–Drug Conjugate; AUC, Area Under the Curve; AR, Androgen Receptor; BUC, Bladder Urothelial Carcinoma; CIC, Clinical Impact Curve; CK5/6, Cytokeratin 5/6; CK7, Cytokeratin 7; CK20, Cytokeratin 20; DCA, Decision Curve Analysis; DT, Decision Tree; GBM, Gradient Boosting Machine; GATA3, GATA Binding Protein 3; HER2, Human Epidermal Growth Factor Receptor 2; KNN, K-Nearest Neighbors; LASSO, Least Absolute Shrinkage and Selection Operator; LightGBM, Light Gradient Boosting Machine; LR, Logistic Regression; ML, Machine Learning; MLP, Multilayer Perceptron; MIBC, Muscle-Invasive Bladder Cancer; NMIBC, Non-Muscle-Invasive Bladder Cancer; P63, Tumor Protein 63; P53, Tumor Protein 53; PD-L1, Programmed Death Ligand-1; RF, Random Forest; ROC, Receiver Operating Characteristic; SVM, Support Vector Machine; SHAP, SHapley Additive exPlanations; TN, True Negative; TP, True Positive; UPK3A, Uroplakin III; XGBoost (XGB), Extreme Gradient Boosting.

References

1. Powles T, Bellmunt J, Comperat E, De Santis M, Huddart R, Loriot Y, et al. ESMO Clinical Practice Guideline interim update on first-line therapy in advanced urothelial carcinoma. Ann Oncol. (2024) 35:485–90. doi: 10.1016/j.annonc.2024.03.001

PubMed Abstract | Crossref Full Text | Google Scholar

2. Schuettfort VM, Pradere B, Trinh QD, D'Andrea D, Quhal F, Mostafaei H, et al. Impact of preoperative plasma levels of interleukin 6 and interleukin 6 soluble receptor on disease outcomes after radical cystectomy for bladder cancer. Cancer Immunol Immunother. (2022) 71:85–95. doi: 10.1007/s00262-021-02953-0

PubMed Abstract | Crossref Full Text | Google Scholar

3. Mir MC, Marchioni M, Zargar H, Zargar-Shoshtari K, Fairey AS, Mertens LS, et al. Nomogram predicting bladder cancer-specific mortality after neoadjuvant chemotherapy and radical cystectomy for muscle-invasive bladder cancer: results of an international consortium. Eur Urol Focus. (2021) 7:1347–54. doi: 10.1016/j.euf.2020.07.002

PubMed Abstract | Crossref Full Text | Google Scholar

4. Koga F, Kawakami S, Fujii Y, et al. Impaired p63 expression associates with poor prognosis and uroplakin III expression in invasive urothelial carcinoma of the bladder. Clin Cancer Res. (2003) 9:5501–7.

Google Scholar

5. Olsburgh J, Harnden P, Weeks R, Smith B, Joyce A, Hall G, et al. Uroplakin gene expression in normal human tissues and locally advanced bladder cancer. J Pathol. (2003) 199:41–9. doi: 10.1002/path.1252

PubMed Abstract | Crossref Full Text | Google Scholar

6. Eissa S, Kassim SK, Labib RA, El-Khouly IM, Ghaffer TM, Sadek M, et al. Detection of bladder carcinoma by combined testing of urine for hyaluronidase and cytokeratin 20 RNAs. Cancer. (2005) 103:1356–62. doi: 10.1002/cncr.20902

PubMed Abstract | Crossref Full Text | Google Scholar

7. Oh DY, Kwek SS, Raju SS, Li T, McCarthy E, Chow E, et al. Intratumoral CD4(+) T cells mediate anti-tumor cytotoxicity in human bladder cancer. Cell. (2020) 181:1612–25.e13. doi: 10.1016/j.cell.2020.05.017

PubMed Abstract | Crossref Full Text | Google Scholar

8. Debatin NF, Bady E, Mandelkow T, Huang Z, Lurati MCJ, Raedler JB, et al. Prognostic impact and spatial interplay of immune cells in urothelial cancer. Eur Urol. (2024) 86:42–51. doi: 10.1016/j.eururo.2024.01.023

PubMed Abstract | Crossref Full Text | Google Scholar

9. Lenis AT, Lec PM, Chamie K, and Mshs MD. Bladder cancer: A review. Jama. (2020) 324:1980–91. doi: 10.1001/jama.2020.17598

PubMed Abstract | Crossref Full Text | Google Scholar

10. van Hoogstraten LMC, Vrieling A, van der Heijden AG, Kogevinas M, Richters A, and Kiemeney LA. Global trends in the epidemiology of bladder cancer: challenges for public health and clinical practice. Nat Rev Clin Oncol. (2023) 20:287–304. doi: 10.1038/s41571-023-00744-3

PubMed Abstract | Crossref Full Text | Google Scholar

11. Shkolyar E, Jia X, Chang TC, Trivedi D, Mach KE, Meng MQ, et al. Augmented bladder tumor detection using deep learning. Eur Urol. (2019) 76:714–8. doi: 10.1016/j.eururo.2019.08.032

PubMed Abstract | Crossref Full Text | Google Scholar

12. Li J, Kong Z, Qi Y, Wang W, Su Q, Huang W, et al. Single-cell and bulk RNA-sequence identified fibroblasts signature and CD8 + T-cell - fibroblast subtype predicting prognosis and immune therapeutic response of bladder cancer, based on machine learning: bioinformatics multi-omics study. Int J Surg. (2024) 110:4911–31. doi: 10.1097/js9.0000000000001516

PubMed Abstract | Crossref Full Text | Google Scholar

13. Feng JW, Ye J, Qi GF, Hong LZ, Wang F, Liu SY, et al. A comparative analysis of eight machine learning models for the prediction of lateral lymph node metastasis in patients with papillary thyroid carcinoma. Front Endocrinol (Lausanne). (2022) 13:1004913. doi: 10.3389/fendo.2022.1004913

PubMed Abstract | Crossref Full Text | Google Scholar

14. Lei Y, Tang R, Xu J, Zhang B, Liu J, Liang C, et al. Construction of a novel risk model based on the random forest algorithm to distinguish pancreatic cancers with different prognoses and immune microenvironment features. Bioengineered. (2021) 12:3593–602. doi: 10.1080/21655979.2021.1951527

PubMed Abstract | Crossref Full Text | Google Scholar

15. Sun Q, Bai L, Zhu S, Cheng L, Xu Y, Cai YD, et al. Analysis of lymphoma-related genes with gene ontology and kyoto encyclopedia of genes and genomes enrichment. BioMed Res Int. (2022) 2022:8503511. doi: 10.1155/2022/8503511

PubMed Abstract | Crossref Full Text | Google Scholar

16. Matsumoto K, Satoh T, Irie A, Ishii J, Kuwao S, Iwamura M, et al. Loss expression of uroplakin III is associated with clinicopathologic features of aggressive bladder cancer. Urology. (2008) 72:444–9. doi: 10.1016/j.urology.2007.11.128

PubMed Abstract | Crossref Full Text | Google Scholar

17. Lai Y, Ye J, Chen J, Zhang L, Wasi L, He Z, et al. UPK3A: a promising novel urinary marker for the detection of bladder cancer. Urology. (2010) 76:514.e6–11. doi: 10.1016/j.urology.2009.11.045

PubMed Abstract | Crossref Full Text | Google Scholar

18. Tsumura H, Matsumoto K, Ikeda M, Yanagita K, Hirano S, Hagiwara M, et al. High expression level of preoperative serum Uroplakin III is associated with biologically aggressive bladder cancer. Asian Pac J Cancer Prev. (2015) 16:1539–43. doi: 10.7314/apjcp.2015.16.4.1539

PubMed Abstract | Crossref Full Text | Google Scholar

19. Reis LO, Ferrari K, Zamuner M, Rocha GZ, Billis A, and Fávaro WJ. Urothelial carcinogen resistance driven by stronger Toll-like receptor 2 (TLR2) and Uroplakin III (UP III) defense mechanisms: a new model. World J Urol. (2015) 33:413–9. doi: 10.1007/s00345-014-1329-y

PubMed Abstract | Crossref Full Text | Google Scholar

20. Hu J, Chen J, Ou Z, Chen H, Liu Z, Chen M, et al. Neoadjuvant immunotherapy, chemotherapy, and combination therapy in muscle-invasive bladder cancer: A multi-center real-world retrospective study. Cell Rep Med. (2022) 3:100785. doi: 10.1016/j.xcrm.2022.100785

PubMed Abstract | Crossref Full Text | Google Scholar

21. Hu J, Yan L, Liu J, Chen M, Liu P, Deng D, et al. Efficacy and biomarker analysis of neoadjuvant disitamab vedotin (RC48-ADC) combined immunotherapy in patients with muscle-invasive bladder cancer: A multi-center real-world study. Imeta. (2025) 4:e70033. doi: 10.1002/imt2.70033

PubMed Abstract | Crossref Full Text | Google Scholar

22. Tan YG, Eu EWC, Huang HH, and Lau WKO. High neutrophil-to-lymphocyte ratio predicts worse overall survival in patients with advanced/metastatic urothelial bladder cancer. Int J Urol. (2018) 25:232–8. doi: 10.1111/iju.13480

PubMed Abstract | Crossref Full Text | Google Scholar

23. Taguchi S, Nakagawa T, Matsumoto A, Nagase Y, Kawai T, Tanaka Y, et al. Pretreatment neutrophil-to-lymphocyte ratio as an independent predictor of survival in patients with metastatic urothelial carcinoma: A multi-institutional study. Int J Urol. (2015) 22:638–43. doi: 10.1111/iju.12766

PubMed Abstract | Crossref Full Text | Google Scholar

24. Prijovic N, Acimovic M, Santric V, Stankovic B, Nikic P, Vukovic I, et al. Predictive value of inflammatory and nutritional indexes in the pathology of bladder cancer patients treated with radical cystectomy. Curr Oncol. (2023) 30:2582–97. doi: 10.3390/curroncol30030197

PubMed Abstract | Crossref Full Text | Google Scholar

25. Pak S, Park SG, Park J, Cho ST, Lee YG, and Ahn H. Applications of artificial intelligence in urologic oncology. Investig Clin Urol. (2024) 65:202–16. doi: 10.4111/icu.20230435

PubMed Abstract | Crossref Full Text | Google Scholar

26. Brodie A, Dai N, Teoh JY, Decaestecker K, Dasgupta P, and Vasdev N. Artificial intelligence in urological oncology: An update and future applications. Urol Oncol. (2021) 39:379–99. doi: 10.1016/j.urolonc.2021.03.012

PubMed Abstract | Crossref Full Text | Google Scholar

27. Witjes JA, Babjuk M, Bellmunt J, Bruins HM, De Reijke TM, De Santis M, et al. EAU-ESMO consensus statements on the management of advanced and variant bladder cancer-an international collaborative multistakeholder effort(†): under the auspices of the EAU-ESMO guidelines committees. Eur Urol. (2020) 77:223–50. doi: 10.1016/j.eururo.2019.09.035

PubMed Abstract | Crossref Full Text | Google Scholar

28. So A. Bladder cancer, ESMO 2016. Can Urol Assoc J. (2016) 10:S224–s6. doi: 10.5489/cuaj.4281

PubMed Abstract | Crossref Full Text | Google Scholar

29. Bellmunt J, Orsola A, Leow JJ, Wiegel T, De Santis M, and Horwich A. Bladder cancer: ESMO Practice Guidelines for diagnosis, treatment and follow-up. Ann Oncol. (2014) 25 Suppl 3:iii40–8. doi: 10.1093/annonc/mdu223

PubMed Abstract | Crossref Full Text | Google Scholar

30. Shi ZD, Sun Z, Zhu ZB, Liu X, Chen JZ, Hao L, et al. Integrated single-cell and spatial transcriptomic profiling reveals higher intratumor heterogeneity and epithelial-fibroblast interactions in recurrent bladder cancer. Clin Transl Med. (2023) 13:e1338. doi: 10.1002/ctm2.1338

PubMed Abstract | Crossref Full Text | Google Scholar

31. Guo Y, Lin Z, Zhang W, Chen H, Chen Y, Liu Y, et al. Comprehensive multi-omics analysis of nucleotide metabolism: elucidating the role and prognostic significance of UCK2 in bladder cancer. Funct Integr Genomics. (2025) 25:133. doi: 10.1007/s10142-025-01642-w

PubMed Abstract | Crossref Full Text | Google Scholar

32. Xu Y, Sun X, Liu G, Li H, Yu M, and Zhu Y. Integration of multi-omics and clinical treatment data reveals bladder cancer therapeutic vulnerability gene combinations and prognostic risks. Front Immunol. (2023) 14:1301157. doi: 10.3389/fimmu.2023.1301157

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: BUC, UPK3A protein, prognostic prediction, ML, serum inflammatory markers, risk stratification, shap, multimodal data

Citation: Feng R, Hou J, Tao Y, Wang Y, Li S, Dong X and Tai W (2025) Multimodal prognostic models for bladder urothelial carcinoma: uroplakin III combined with serum and demographic data. Front. Oncol. 15:1636358. doi: 10.3389/fonc.2025.1636358

Received: 27 May 2025; Accepted: 11 August 2025;
Published: 19 September 2025.

Edited by:

Doug Ward, University of Birmingham, United Kingdom

Reviewed by:

Jiao Hu, Central South University, China
Paula Dobosz, Poznan University of Medical Sciences, Poland

Copyright © 2025 Feng, Hou, Tao, Wang, Li, Dong and Tai. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Wenlin Tai, dGFpd2VubGluQGttbXUuZWR1LmNu

These authors have contributed equally to this work

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.