Identification of a Novel Tumor Microenvironment Prognostic Signature for Bladder Urothelial Carcinoma

Background The tumor microenvironment (TME) regulates the proliferation and metastasis of solid tumors and the effectiveness of immunotherapy against them. We investigated the prognostic role of TME-related genes based on transcriptomic data of bladder urothelial carcinoma (BLCA) and formulated a prediction model of TME-related signatures. Methods Molecular subtypes were identified using the non-negative matrix factorization (NMF) algorithm based on TME-related genes from the TCGA database. TME-related genes with prognostic significance were screened with univariate Cox regression analysis and lasso regression. Nomogram was developed based on risk genes. Receiver operating characteristic (ROC) curve and decision curve analysis (DCA) were used for inner and outer validation of the model. Risk scores (RS) of patients were calculated and divided into high-risk group (HRG) and low-risk group (LRG) to compare the differences in clinical characteristics and PD-L1 treatment responsiveness between HRG and LRG. Results We identified two molecular subtypes (C1 and C2) according to the NMF algorithm. There were significant differences in overall survival (OS) (p<0.05), progression-free survival (PFS) (p<0.05), and immune cell infiltration between the two subtypes. A total of eight TME-associated genes (CABP4, ZNF432, BLOC1S3, CXCL11, ANO9, OAS1, FBN2, CEMIP) with independent prognostic significance were screened to build prognostic risk models. Age (p<0.001), grade (p<0.001), and RS (p<0.001) were independent predictors of survival in BLCA patients. The developed RS nomogram was able to predict the prognosis of BLCA patients at 1, 3, and 5 years more potentially than the models of other investigators according to ROC and DCA. RS showed significantly higher values (p = 0.047) in patients with stable disease (SD)/progressive disease (PD) compared to patients with complete response (CR)/partial response (PR). Conclusions We successfully clustered and constructed predictive models for TME-associated genes and helped guide immunotherapy strategies.


INTRODUCTION
Bladder urothelial carcinoma (BLCA) is the pathological type of bladder cancer with the highest percentage, about 90% (1). There were globally approximately 550,000 new patients and 200,000 deaths in 2018 (2). Despite significant advances in the treatment of BLCA with radiotherapy, surgery, and targeted therapy, the prognosis of BCLA patients remains poor, with 30% of them having muscle-invasive bladder cancer (MIBC) at initial consultation (3). However, patients with MIBC are characterized by rapid disease progression and low survival, with a 5-year tumor-specific mortality rate of >50% (4). Therefore, a validated prognostic risk model can help guide individualized treatment of BLCA patients to improve their prognosis.
At present, the 8th edition of the TNM staging method published by the Union International Center of Cancer (UICC) is one of the most valuable indicators to determine the prognosis of patients with BLCA (5). However, due to the heterogeneity of BLCA, the prognosis of patients with the same TNM stage may vary considerably (6)(7)(8). In addition, multiple comprehensive BLCA molecular typing based on genetic analysis can forecast the overall survival (OS) of individuals, such as ferroptosis-associated gene signature (9), autophagy-associated gene signature (10), and RNA binding protein-associated gene signature (11). Therefore, we considered the use of gene signatures as biomarkers that could predict individual prognosis and drug responsiveness, thus improving clinical outcomes in BLCA patients.
The tumor microenvironment (TME) includes solid tumor cells, vascular network, extracellular matrix, secreted factors (cytokines, chemokines), and distantly recruited cells such as activated B cells and macrophages (12). Overall, this homeostatic system supports the progression and recurrence of malignancies and has important implications in chemoresistance and immunotherapy (13,14). Mesenchymal cells such as fibroblasts within TME are associated with the T-cell efflux phenotype in bladder cancer (15). In addition, the non-immune cellular components of TME also influence the therapeutic response, for example, the secretion of TGF-b by fibroblasts can lead to efflux of immune cells or resistance to chemotherapeutic agents, and therefore the therapeutic effect of the tumor varies with the degree of stromal cell infiltration (16). Therefore, tumor tissue gene expression profiles can reflect the relationship between TME and patient prognosis.
In summary, TME-associated gene signature can enhance the reliability of forecasting patient prognosis. Therefore, we aimed to design a prediction model combining TME-related gene signatures and patient clinical characteristics and develop a nomogram to forecast the prognosis of BLCA patients at 1, 3, and 5 years.

Data Download and Preprocessing
We downloaded transcriptome data and clinical annotations from the Genomic Data Commons. The TCGA-BLCA (The Cancer Genome Atlas-Bladder Cancer) cohort consisted of 433 RNA sequencing samples, including 19 normal profiles and 414 tumor profiles. We removed samples without clinical follow-up information and microdissection from the TCGA-BLCA cohort, resulting in the inclusion of a total of 359 samples. We also downloaded the simple nucleotide variation data in the TCGA database for further analysis of copy number variation (CNV).
Moreover, for external validation, data for the cohort of GSE31684 were obtained from the GEO database. The microarray data of GSE31684 from Affymetrix Human Genome U133 Plus 2.0 Array, we downloaded the normalized matrix file directly. All dates contain survival information. TME-related genes were obtained from published studies (17)(18)(19)(20)(21)(22)(23), and a total of 4061 genes were included (Table S1). We used the UCSC Xena (https://xenabrowser.net/) to download the TCGA-GDC pan-cancer data.

Screening for TME-Related Differentially Expressed Genes
The differentially expressed TME-associated genes in BLCA tissues and adjacent tissues were screened, and the screened differential genes and their expression were organized into a gene expression matrix with a corrected p<0.05 and the absolute value of differential expression multiplicity >1 (FDR<0.05 and | log2Fold Change|>1) was set as the threshold value.

Identification of Molecular Subtypes Using Non-Negative Matrix Factorization (NMF) Algorithm
Fifty iterations of the sample were performed using NMF for extracting biological correlation coefficients and predicting the inner feature structure in gene expression matrices (24). We observed performance for the number of clusters k between two and ten.

Comparison of Immune Scores Between Clusters
Microenvironment Cell Populations (MCP) counters allow quantification of the absolute abundance of eight immune cells and two stromal cells from transcriptomic data (24). We evaluated infiltrating cell scores between clusters, which included neutrophils, NK cells, and myeloid dendritic cells, among others. Then we evaluated the infiltrating cells between clusters.

Univariate Cox Regression Analysis and Lasso Regression Analysis
At the ratio of 7:3, 359 samples were divided into the training set and validation set with no one as the control. TME-related differentially expressed genes were subjected to univariate Cox analysis to get the prognostic genes. The lasso method prevents model overfitting by building a penalized feature to build a more refined model. We then applied lasso Cox regression to minimize the amounts of genes in the prognostic modeling. The results of the lasso regression analysis on the variables affecting the outcome of individuals with BLCA were incorporated into multivariate Cox regression analysis.

Construction of a Nomogram Combined With Risk Score (RS) and Clinical Features
The TCGA cohort was used to build a nomogram to forecast the prognosis of individuals with BLCA, with variables including RS and clinical characteristics. RS was calculated according to the expression of differential genes and regression analysis coefficient values. The formula is shown below: We classify the cases into high-risk group (HRG) and low-risk group (LRG) according to the median RS. The receiver operating characteristic (ROC) curve was utilized to assess the predictive value of RS for prognosis. Kaplan-Meier (K-M) curves were drawn and the Log-rank method was used to evaluate OS.

Prediction Model Evaluation
ROC, calibration curve (bootstrap 1000), and decision curve analysis (DCA) were applied to assess the confidence validity of the model. Our RS nomogram was also compared with those of other established models. RMS time was used to assess the prognostic accuracy of the models beyond 60 months of patient survival.

Gene Set Enrichment Analysis (GSEA)
The c2.cp.kegg.v7.4.symbols and c5.go.v7.4.symbols collection were used to explore the function annotation in HRG and LRG by GSEA software. Gene sets with FDR < 0.05 were was considered statistically significant.

Immunotherapy Prediction
We selected patients with urologic tumors in the IMvigor210 cohort who had received programmed death-ligand 1 (PD-L1) blockade treatment to predict response to immunotherapy. This cohort had a total of 348 cases, containing 232 death samples and 116 censored, all of which contained survival data. BLCA patients who received anti-PD-L1 therapy could be classified into the following categories according to the patient's response: complete response (CR), partial response (PR), stable disease (SD), and progressive disease (PD). Among them, CR and PR are recognized as patients who respond to immunotherapy. SD and PD are recognized as patients who do not respond to immunotherapy. We computed RS for each case and classified them as HRG and LRG based on the median value of RS.

Statistical Analysis
Statistical analysis and graphical work were finished in the R environment (version 4.1.1). Volcano maps were drawn using the "ggplot2" package. Violin plots were drawn with "ggpubr" package. Cox regression analyses were performed by the "survival" package. We used the Chi-square test or Fisher's exact test to measure the difference between training and validating sets and the relationship between clinical data and RS. K-M survival curves with log-rank tests were plotted by the "survminer" package. The ROC curves were depicted by the "timeROC" package. Calibration curves were derived from the "rms" package. The restricted mean survival (RMS) package was for computing the C-index for each of the models. p < 0.05 is considered to have statistical difference.

Patient Characteristics and Analysis of Differentially Expressed Genes Associated With TME in BLAC Patients
The TCGA-BLCA cohort consisted of 433 RNA sequencing samples, including 19 normal samples and 414 tumor samples. The characteristics of the cases enrolled in this study were shown in Table 1 after pre-processing. In total, there were 1014 TMEassociated genes (FDR<0.05 and |log 2 FC|>1) differentiated in expression between BLCA patients and regular bladder tissue. The top 50 up-regulated and down-regulated genes with differential expression were plotted as volcanoes ( Figure 1A).

Molecular Subtypes of TME-Related Genes According to the NMF Algorithm
We clustered genes based on the differential expression TMErelated genes by the NMF algorithm. When k=2, C1 and C2 were formed based on covariance and RSS ( Figures 1B-D). Survival results showed that cases in the C1 cluster had better OS and PFS than those in the C2 cluster ( Figures 1E, F). There were differences in immune scores between C1 and C2 for nine cell types, including CD8+ T cells (p<0.05), cytotoxic lymphocytes (p<0.05), B cells (p<0.05), neutrophils (p = 0.023), monocytes (p<0.05), endothelial cells (p<0.05), fibroblasts (p<0.05), NK cells (p<0.05), and myeloid dendritic cells (p = 0.014) (Figures 2A-I). The international transcriptomic immune typology of solid tumors has established six immune subtypes, including wound healing (Immune C1), IFN-gamma dominant (Immune C2), inflammatory (Immune C3), lymphocyte depleted (Immune C4), immunologically quiet (Immune C5), and TGF-beta dominant (Immune C6). Our molecular subtyping results were compared with the former and the results are shown in Figure 1G.

Developing an RS Prediction Model for TME-Related Genes
At the ratio of 7:3, 359 samples were randomly divided into a training set (n = 252) and validation set (n = 107) ( Table 2).
Baseline features of the patients demonstrated no obvious distinction between them in terms of gender, age, pathological grade, treatment history, and TNM stage (p>0.05). Univariate Cox analysis was performed on the training cohort with 173 differentially TME-associated genes (p < 0.05) ( Table S2).
Lasso regression was applied to achieve reduction in the number of genes while remaining highly accurate. The traces of independent variables showed a gradual increase in the number of independent coefficients that converge to zero as l decreases ( Figure 3A). We selected 11 genes as candidate genes according to the best value of l ( Figure 3B). Then TME-related genes with significant differences were screened according to a multifactorial Cox proportional risk model. Finally, eight genes were obtained, constructing the following equation: ROC was utilized to evaluate the accuracy of the RS model in our training set. As shown in Figure 3C, the area under curve (AUC) for the model were 0.818, 0.776, and 0.771 for 1-, 3-, and 5-years, respectively. We used the median value of RS as the boundary to classify the samples into HRG and LRG. K-M survival analysis showed that LRG had a better prognosis than HRG ( Figure 4C).
We performed survival analyses in the TCGA test set, all TCGA set, and the GEO cohort. HRG had a significantly worse prognosis than LRG ( Figure 4D-F). In the three validation sets, the 1-year AUC was 0.761, 0.664, and 0.589, respectively ( Figures 3D-F). In the GEO cohort, the low AUC values at 1, 3, and 5 years were partly due to the short median follow-up time of patients.

Construction of a Nomogram Containing RS for TME-Related Genes
We used univariate and multivariate Cox regression to examine the relationship between potential variables and OS, which included RS, TNM stage, pathological grade, age, and sex (  Figure 4A).    our model with the real results ( Figure 4B). The AUC of the model was higher than the other impact factors in the 3-year and 5-year ROC curve ( Figures 5B, C). However, the RS was higher than the AUC of the nomogram at 1-year ROC curve, which was the highest of all factors ( Figure 5A). Finally, DCA to assess the clinical utility of nomograms. Both nomogram and RS showed good consistency in forecasting OS at 1, 3, and 5 years compared with a singular prognostic factor (Figures 5D-F).

Comparison of the Eight-Gene Signature Risk Model With Other Models
We compared the prediction models of our eight-gene signature with other models to demonstrate the predictive performance of our model, including models of the identified 3-gene (25), 5-gene (26), and 7-gene signature (27). We also computed RS for every sample by multivariate Cox regression analysis. We evaluated the ROC of the four models according to the corresponding genes. The cases were then classified into HRG and LRG based on the median RS value. In all four models, there was a significant difference in survival time between patients in the HRG and LRG groups, with patients in LRH having a better prognosis than those in HRG ( Figures 6E-H). However, the ROC of the previous models showed lower AUCs and, therefore, the other three models have worse predictions compared to our nomogram. (Figures 6A-D).
We calculated the C-index of the four models and found that our model has the highest C-index with 0.695 ( Figure 6I). RMS time was used to estimate the forecasting effectiveness of the model at different time points. Our model was found to outperform the other two gene signatures at > 60 months except for Zhou signature. This demonstrates that our nomogram has an advantage over the other models in predicting both patient survival up to 5 years and patient survival beyond 60 months ( Figure 6J).

Functional Analysis of Genes Between HRG and LRG
We ran GSEA on samples within the HRG and LRG. We identified the following pathways with associated normalized enrichment score (NES) and the adjusted p-value (q-value) enriched in the HRG (Figures 7A, D Classical pathways in tumors include immune checkpoints, DNA duplication, mismatch repair, and epithelial-mesenchymal transition (EMT). The correlation of RS with target genes was analyzed by extracting the expression of the relevant pathway target genes from the samples. The results showed that RS was positively correlated with EMT-related genes (F-AP, TAGLN, and LOXL2) ( Figure 7C).
We computed the immune cell fraction for every sample of the TCGA-BLCA set using an MCP counter and then compared their correlation with RS. As shown in Figure 7F, RS was positively correlated with CD8+ T cells (p<0.05), endothelial cells (p<0.05), and fibroblasts (p<0.05). In addition, we evaluated the correlation between RS, TMB, and infiltrating cells by MCP counter with the results shown in Figure 7B.
Our analysis showed that there were also differences in RS between clinical subgroups of BLCA patients. More specifically, patients older than 65 years, with no history of radiotherapy, female, and with high TNM stage had higher RS (Figures 8A-E). The K-M survival curves of patients in both stage I-II and III-IV subgroups also proved our results (Figures 8F-H). In addition, the K-M survival curves of patients in both MIBC and NMIBC subgroups also proved our results (Figures 8H, I).

Prediction of Response to Immunotherapy Based on RS Model
We selected patients with urologic tumors in the IMvigor210 cohort who had received PD-L1 blockade therapy to observe the effectiveness against immunotherapy. We calculated the RS for each sample and classified them into HRG and LRG based on the median RS value. the RS showed significantly higher values (p = 0.047) in patients with SD/PD compared to patients with CR/PR ( Figure 7E). This suggests that our RS model may be useful in predicting patient response to PD-L1 blocking therapy.

DISCUSSION
The major discovery of our work was a prediction model according to the TME -associated gene signature of BLCA. TME heterogeneity has an essential role in patient prognosis and in predicting the effectiveness of targeted therapies (28)(29)(30). 4061 TME-related genes and 359 BLCA samples of TCGA were evaluated in this work. We applied the NMF method to distinguish two molecular clusters, which is a promising novel clustering method. There were differences in immune scores for TME-infiltrating cell types between C1 and C2. In addition, correlation analysis with prognostic indicators OS and PFS also showed differences between C1 and C2. These results demonstrate the heterogeneity of TME. We found that our prognostic model based on eight TMErelated genes (CABP4, ZNF432, BLOC1S3, CXCL11, ANO9, OAS1, FBN2, CEMIP) was able to accurately forecast the probability of survival in individuals with BLCA. Prior researches have examined several genes included in our signature under a variety of tumors. For instance, CXCL11 has an important role in the production of chemokines. First, these mediators can trigger the accumulation of CD8+ T cells that can contribute to the elimination of the tumor. Secondly, the production of these chemokines by tumor tissue may trigger the migration and activation of immune cells including myeloidderived suppressor cells and regulatory T cells, which act in favor of the tumor and its progress (31). FBN2 is highly expressed in lung cancer tissues, and as an oncogene, it affects the pathogenesis of lung cancer (32). In addition, Chen Y (33) found that CEMIP can promote a variety of tumor processes by affecting tumor proliferation, dedifferentiation, and the tumor microenvironment. In terms of molecular mechanisms, existing   research has shown that CEMIP mainly affects the WNT and EGFR signaling pathways. Multivariate Cox regression analysis revealed RS (HR =1.045, 95% CI: 1.019-1.071) as an independent risk factor, which was validated in both the internal and external sets. We then defined HRG and LRG in the TCGA cohort based on the median RS value. For example, we found that RS was positively correlated with EMT-related genes. In general, a greater degree of immune infiltration could explain a more immune defense in LRG, and thus a more favorable prognosis. Conversely, positively correlated EMT genes may lead to a higher propensity for metastasis and a poorer outcome for BLCA patients in HRG. In addition, RS differs between clinical subgroups of BLCA patients, with a worse prognosis in the elderly, women, and patients with high TNM staging. Thus, our results suggest that RS can provide resilient risk stratification for BLCA patients.
Nomograms are a well-recognized statistical method for visualizing and predicting the probability of survival of patients (34)(35)(36). We innovatively constructed an advanced nomogram for the prognosis of BLCA patients according to TME-RS. The ROC and DCA proved that they can exactly assess the outcome of patients. When we compared our nomogram with three other predictive genes signatures by Wang et al. (27), Zhou et al. (25), and Li et al. (26). Our nomogram showed the highest C-index. These findings suggest that the clinical utility of our constructed nomogram is outperformed by other models. Next, our team will transfer this RS model to other urological tumors such as renal clear cell carcinoma to demonstrate the potential pan-cancer usability of the model.
Of course, there were still some deficiencies in our research. First of all, all our data come from the TCGA database and GEO database, and the sample size was not large enough, which may lead to the bias of the results. Secondly, our conclusions were also lack experimental verification. Therefore, it was necessary to conduct multicenter, large sample, prospective double-blind trials for further verification in the future.

CONCLUSIONS
In conclusion, our eight TME-associated gene signatures combined with clinical features may more accurately predict patient survival at 1, 3, and 5 years. Both external and internal validation were able to have the ability to verify the forecasting capability of the model. RS can be promising to predict the response to immunotherapy of BLCA individuals with PD-L1 blocking therapy.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/ Supplementary Material.

AUTHOR CONTRIBUTIONS
CX, ZK, and DP designed this work. YL analyzed the data. CX and DP wrote this manuscript. ZK edited and revised the manuscript. JG, NL and YY did a lot of work in revising the paper, including rescreening the data, analyzing the data, plotting the figures and the layout of the figures. YY embellished the language of the paper. All authors approved this manuscript.