An Exploration of the Tumor Microenvironment Identified a Novel Five-Gene Model for Predicting Outcomes in Bladder Cancer

Bladder cancer (BC) is one of the top ten most common cancer types globally, accounting for approximately 7% of all male malignancies. In the last few decades, cancer research has focused on identifying oncogenes and tumor suppressors. Recent studies have revealed that the interplay between tumor cells and the tumor microenvironment (TME) plays an important role in the initiation and development of cancer. However, the current knowledge regarding its effect on BC is scarce. This study aims to explore how the TME influences the development of BC. We focused on immune and stromal components, which represent the major components of TME. We found that the proportion of immune and stromal components within the TME was associated with the prognosis of BC. Furthermore, based on the scores of immune and stromal components, 811 TME-related differentially expressed genes were identified. Three subclasses with distinct biological features were divided based on these TME-genes. Finally, five prognostic genes were identified and used to develop a prognostic prediction model for BC patients based on TME-related genes. Additionally, we validated the prognostic value of the five-gene model using three independent cohorts. By further analyzing features based on the five-gene signature, higher CD8+ T cells, higher tumor mutational burden, and higher chemosensitivity were found in the low-risk group, which presented a better prognosis. In conclusion, our exploration comprehensively analyzed the TME and identified TME-related prognostic genes for BC, providing new insights into potential therapeutic targets.


INTRODUCTION
Bladder cancer (BC) is a highly prevalent disease with an incidence of approximately 7% among all male malignancies and is the eighth most common cause of mortality (1). Although the incidence rate of BC has been decreasing, its death rate has not substantially reduced (2). The lack of understanding regarding the molecular mechanism of BC development results in the shortage of effective therapy for BC. Therefore, there is an urgent need to identify novel biomarkers for the early diagnosis of BC and to find effective methods to guide clinical treatments.
Tumor progression is a complicated process in which complex interactions occur between the tumor and the surrounding microenvironment. The surrounding microenvironment of tumor cells is referred to as the tumor microenvironment (TME), which includes immune cells, fibroblasts, and nearby stromal tissues (3,4). The two major components of the TME are resident stromal cells and recruited immune cells (5,6). Various studies have indicated the crucial role of stromal components and immune components in the TME on vascular invasion, adjacent tissue invasion, and drug resistance (7,8). Thus, studying the heterogeneous components of the TME and their complex interactions is necessary to identify novel therapeutic targets.
Previous studies have screened some TME-related genes associated with survival outcomes, and few studies have systematically delineated the TME in BC and developed an accurate prognosis prediction model (9)(10)(11)(12). Thus, it is imperative to fully use the clinical and biological information to conduct a more detailed analysis and to develop a robust and accurate prognostic prediction model of BC from the perspective of TME.
In this study, we calculated the scores of immune and stromal components of 406 BC patients from TCGA dataset and found significant correlations between these scores and prognosis. Patients were classified into three subtypes with distinct biological features based on the TME-related genes. Subsequently, a five-gene model based on TME-related genes was established and validated. The TME landscape of low-and high-risk BC groups predicted by our model was depicted, and immunotherapy factors such as tumor mutation burden (TMB), immune checkpoint, and chemosensitivity were explored. Based on this five-gene signature, we could predict the disease outcome and chemosensitivity that would be beneficial for further immunotherapy in BC.

Raw Data
Level 3 TCGA RNA-seq data of bladder cancer (including 19 normal and 411 tumor samples) and the corresponding clinical data were downloaded from TCGA dataset. By filtering out the normal paracancer tissue, the expression data for 411 cancer tissues were kept for downstream analysis, and 406 samples of which have available survival information. In addition, three separate bladder cancer cohort used in this study as validation datasets (GSE13507, GSE31684 and GSE32894) were downloaded from the GEO database (https://www.ncbi.nlm.nih.gov/geo).
Generation of ImmuneScore, StromalScore and ESTIMATEScore R package 'estimate' was used to calculate the ImmuneScore and StromalScore of tumor samples. Count matrix was normalized and log2 transformed. The higher ImmuneScore or StromalScore is, the larger amount of the immune or stromal components exits in TME. ESTIMATEScore is the sum of ImmuneScore and StromalScore denoting the overall proportion of immune and stromal components in TME.

Consensus Clustering
Consensus clustering was introduced for classifying the BC patients into different subgroup. The K-means algorithm with the Spearman distance was used for clustering. The cluster number was set to a range (1-10).

Survival Curves
The relationship between kinds of scores and survival was explored by plotting the Kaplan-Meier curve using R package 'survival' and 'survminer'. Log Rank test was used to test the differences of OS between defined high and low groups.

Differential Gene Expression Analysis
Differential expression analysis was conducted using the R package 'DESeq2'. The screening conditions for the differential genes were: |log2FoldChange| >1.5, padj <0.05. Heatmaps of differential genes were drawn using the R-package 'pheatmap'. For subclass-specific genes, only genes with significant differences in expression (|log2FC| >1.5, padj <0.05) in all three possible comparisons were considered subclassspecific genes.

GO and KEGG Enrichment Analysis and ssGSEA Analysis
GO and KEGG enrichment analyses were performed with the aid of R packages 'clusterProfiler', 'enrichplot', and 'ggplot2'. Only terms with both p-and q-value of <0.05 were considered significantly enriched. ssGSEA analysis was conducted using R package 'GSVA'. Hallmark geneset was downloaded from the Molecular Signatures Database (MSigDB).

Tumor Infiltration Immune Cells (TICs) Profile
In combination with the LM22 signature matrix, normalized gene expression data (FPKM) were used to calculate the relative proportions of 22 types of infiltrating immune cells (encompass B cells, T cells, natural killer cells, dendritic cells, eosinophils, macrophages and neutrophils, among others.) via CIBERSORT algorithm. ImmuneAI (Immune Cell Abundance Identifier) computational method (http://bioinfolifehusteducn/ ImmuCellAI#!/) was applied for predicting the immune checkpoint blockade response. Immune associated biomarkers for HLA, inflammation-promoting, MHC class I, Type I IFN Response were obtained from published paper (13). The enrichment scores were calculated using R package 'GSVA'.

Establishment of the Five-Gene Risk Prediction Model
First, we assessed the relationships between the expression levels of the selected TME-related genes and overall survival of patients by univariate Cox regression analysis. The significant genes with p <0.05 were screened out for further analysis. We randomly divided the data from TCGA cohort as internal train set and internal validation set (1:1), the entire cohort as test cohort (n = 406). Subsequently, using R package 'glmnet', we performed the Least Absolute Shrinkage and Selector Operation (LASSO) analysis that could reduce the estimation variance while providing an explicable final model. We selected the l with the least cross-validation error and identified the key genes affecting patients' prognosis. Finally, multivariate Cox regression analysis was conducted to construct the TME-based signature for predicting the prognosis in BC patients. The time-dependent ROC curve performed by R package 'survivalROC' and Kaplan-Meier survival curve analysis performed by R package 'survival' were employed to verify the accuracy of the prognostic value of the five-gene signature.

Prediction of Chemotherapy Responses and Candidate Small Molecules
The chemotherapy response for Methotrexate, Vinblastine, Doxorubicin, Cisplatin of each BC patient was calculated using R package 'pRRophetic' based on the Genomics of Drug Sensitivity in Cancer (GDSC) (https://www.cancerrxgene. org). To present potential drugs for five-gene signature risk group, the Connectivity map (CMap) (15) was performed to predict highly significant small molecule drugs for high-risk group compared to low-risk group.

Statistical Analysis
All the statistical analyses were performed with R software and packages from the Bioconductor project. Wilcox-test was used to compare two groups with non-normally distributed variables. For comparisons of three groups, Kruskal-Wallis test and cross Chi-Square-test were used as nonparametric methods. Pairwise comparisons were adjusted using 'Holm's method'. Univariate, multivariate Cox regression and Lasso regression were used for the selection of genes for the predictive gene signature. R package 'glmnet' was used for Lasso regression R packages 'survivalROC', 'plotROC' and 'timeROC' was used for the ROC curve analysis Area under the ROC curve (AUC) was used as an accuracy index to identify the predictive capacity.

Immune and Stromal Scores Were Associated With the Prognosis and Progress of BC
To explore the correlation between the estimated proportion of immune and stromal components and the survival rate of BC patients, Kaplan-Meier survival analysis was performed for ImmuneScore, StromalScore, and ESTIMATEScore, respectively (16). Although ImmuneScore was not significantly correlated with the overall survival rate, StromalScore and ESTIMATEScore displayed a significant negative correlation with the overall survival (Log Rank test, p <0.05) ( Figures 1A-C). These results suggest that the proportion of immune and stromal components could affect the survival of patients with BC.
Furthermore, clinical information was taken into account. The tumor node metastasis (TNM) staging system is the most widely used cancer staging system. As shown in Figure 1D, ImmuneScore was negatively correlated with M-stage (Wilcoxon signed-rank test, p <0.05). Patients in the low stages showed lower StromalScore and ESTIMATEScore scores (Kruskal-Wallis test, p <0.05) ( Figures 1E, F). These results indicate that the proportion of immune and stromal components is associated with the prognosis and progression of BC.

Identification of Differentially Expressed Genes (DEGs) Shared by ImmuneScore and StromalScore
By analyzing expression data of all BC cases downloaded from TCGA database, we obtained the DEGs through the consolidation and analysis of different gene expression profiles between groups of high and low ImmuneScore and StromalScore. 1,210 DEGs were identified through the comparison between groups of high and low ImmuneScore. Among them, 950 genes were up-regulated and 260 genes were down-regulated compared to ImmuneScore group (|log2FC| >1.5, padj <0.05) (Figure 2A). Similarly, 1320 DEGs were identified between groups with high and low StromalScore, consisting of 1,137 up-regulated and 183 down-regulated genes (|log2FC| >1.5, padj <0.05) ( Figure 2B). There are 689 upregulated genes sharing by high score both in ImmuneScore and StromalScore ( Figure 2C) and 122 down-regulated genes sharing by low scores ( Figure 2D). These co-upregulated/ downregulated 811 DEGs reflected the dynamic modulation of the immune and stromal components in TME and thus were possibly potential key factors for the status of TME. Functional enrichment analysis was performed based on the DEGs and results indicated that the DEGs are mainly mapped to the immune-related gene ontology (GO) terms, such as T-cell activation, leukocyte proliferation and lymphocyte proliferation ( Figure 2E). Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis showed the enrichment of cytokine-cytokine receptor interaction, chemokine signaling pathway, hematopoietic cell lineage and cell adhesion molecules ( Figure 2F). These finds suggested that those DEGs were closely related to TME.

Subtypes Divided Based on the TME-Related Genes
Samples were then classified based on the 811 TME-related genes to explore the characteristics of the different subgroups in detail. Consensus clustering of the gene expression profiles yielded three subclasses (Classes 1-3), with 178, 110, and 118 cases in each subclass, which had significantly different expression patterns ( Figures S1A, B and Figure 3A). The subclass assignments were concordant with the principal component analysis (PCA) results ( Figure 3B). A significant prognostic difference was observed in these three subclasses (Log Rank test, p <0.0001), with the poorest overall survival for Class 1 patients and the best overall survival for Class 3 patients ( Figure  3C). To better characterize the three subclasses, differential analyses of clinical features and biological features were performed. From the perspective of clinical data, patients assigned to the Class 1 subtype were relatively older and had more advanced disease, whereas the Class 3 subtype comprised patients with younger age and less advanced disease stages (Kruskal-Wallis test, p <0.05; Chi-Squared test, p <0.05) ( Table 1). We observed that Class 1 and Class 2 both showed high scores in StromalScore and Class 2 also had significantly higher scores in ImmuneScore compared to the other two subclasses, whereas Class 3 showed significantly lower scores in both ImmuneScore and StromalScore compared to the other two subclasses (Kruskal-Wallis test, p <0.05; pairwise Wilcoxon signed-rank test, padj <0.05) ( Figure 4A). The features of hallmarks and other typical biomarkers were compared among these subtypes, and there were remarkable differences among the three subtypes. Notably, Class 2 showed the highest enrichment scores in many immune signatures such as HLA, inflammation-promoting, MHC class I, and Type I IFN responses (Kruskal-Wallis test, p <0.05; pairwise Wilcoxon signed-rank test, padj <0.05) ( Figure 4B). Class 1 and Class 2 both showed a high enrichment scores in stromal signatures such as myogenesis, endothelial cells, fibroblasts and EMT process, while class3 showed lowest enrichment scores in these signatures (Kruskal-Wallis test, p <0.05; pairwise Wilcoxon signed-rank test, padj <0.05) ( Figure 4C). Other features which are not related to immune and stromal such as hypoxia, DNA repair, G2M checkpoint and MYC were significantly different among subclasses (Kruskal-Wallis test, p <0.05; pairwise Wilcoxon signed-rank test, padj <0.05) ( Figure S1C). Expression of immune active markers like PRF1, GZMA, GZMB and immune suppressive markers like FOXP3, IL10, TGFB1 were screened in three subclasses ( Figure S1D). Class 2 showed higher expression of these markers compared to the other two subclasses (Kruskal-Wallis test, p <0.05; pairwise Wilcoxon signed-rank test, padj <0.05). 22 kinds of immune cell fractions were then calculated using CIBERSORT algorithm, and significant differences were found among subclasses in CD8+ T cells, CD4+ activated memory cells, mast cells, M1 macrophages, Tregs and resting NK cells (Kruskal-Wallis test, p <0.05) ( Figure  4D). We also checked the expression of the immune checkpoint genes and found that Class 2 showed higher expression of these  genes compared to the other two subclasses (Kruskal-Wallis test, p <0.05) ( Figure S1E). The response of patients to immune checkpoint blockade (ICB) therapy predicted by ImmuCellAI showed Class2 had the highest response compared to the other two subclasses (Chi-Squared test, p <0.05) ( Figure S1F). Furthermore, we compared our classification with that of a previous study (Figures 5A, B). The previous classification identified five subtypes: luminal-papillary, luminal-infiltrated, luminal, basalsquamous, and neuronal (14). The neuronal subtype had the poorest overall survival time, whereas the luminal-papillary subtype had the best overall survival time. However, the basal-squamous subtypes had moderate overall survival time. In our study, Class 3, with the highest luminal marker expression and lowest basal and neuronal marker expression, demonstrated better survival times, which was similar to luminal-papillary subtype. Class 2 was similar to basal-squamous subtype. Class 1, the subclass with the poorest prognosis, showed a farraginous expression of markers and contained most of the neuronal samples. We also explored the genomic features of the three subtypes and found that Class 3 had a significantly lower mutation frequency of TP53 (32%) than that of Class 1 (49%) and Class 2 (61%), whereas Class 3 had a higher mutation frequency of FGFR3 (32%) and Class 2 had a higher mutation frequency of RB1(36%) ( Figure 5C and Figure S1G). Thus, our TME-based classification identified three subtypes with significantly different clinical and biological features.

Development and Validation of TME-Based Prognostic Model
Our above results suggested that the proportion of immune and stromal components can significantly influence the tumor progression and prognosis of BC, and TME-related genes could well classify the BC samples. Thus, to further explore the prognostic significance of the TME-related genes, univariate cox analysis was conducted to reduce the noise of gene without prognostic value (p <0.05 was considered to have prognostic value). The TCGA cohort was randomly divided into one train set and one validation set (1:1). Then Lasso regression was  employed to present key genes for the establishment of a prognosis prediction model in internal train set (Tables S1, S2 and Figures 6A, B). The five genes fitting into the model were FPR1, TNFAIP6, GFPT2, IL-10 and ZNF683 ( Figure 6C). The K-M plot demonstrated that overexpression of FPR1, TNFAIP6, GFPT2, IL-10 and low expression of ZNF683 were associated with the poor overall survival of BC patients ( Figure 6D). The prognostic value of five-gene signature was validated in the internal validation set and the entire TCGA cohort ( Figures S2A, B and Figure 7A). Furthermore, we used another three independent cohorts (GSE13507, GSE31684, GSE32894) to verify the prognostic value of our five-gene signature. The risk score for each sample was calculated using the model, and patients were divided into high-and low-risk groups according to the median of risk score. Log Rank test was used to test the differences of overall survival between the two groups. To evaluate the power of the prognostic model, time-dependent Receiver Operating Characteristic (ROC) curves were employed to assess the sensitivity and specificity. The results indicated that the signature could accurately predict the overall survival of BC patients in all three cohorts (Figures 7B, D). The patients with higher risk scores had a significantly worse prognosis.
Since risk score has such a distinctive distinction, we further wonder whether the signature could become an independent predictor. Considering the age, stage information and risk score together in a multivariate model, the risk score based on fivegene model was still independent prognostic factor ( Figures  S3A, B). Moreover, this five-gene model had a higher concordance index score compared to the clinical stage model and higher area under curve (AUC) values for one-and threeyear survival (Figures S3C-F). These results indicate that the five-gene signature is an accurate, robust and independent predictive tool for the prognostic prediction of BC patients. To obtain a comprehensive understanding of the relationship between risk score and the biology of BC, hallmark functional enrichment was performed using GSEA based on the DEGs between high-and low-risk groups. The high-risk groups were enriched in the process of the EMT, angiogenesis, myogenesis and hypoxia, and the low-risk groups were enriched in the process of oxidative phosphorylation and interferon alpha response ( Figure  8A). The landscape of TME was depicted using enrichment scores calculated based on immune and stromal biomarkers ( Figure 8B).

Low-risk groups had higher enrichment scores in activated CD8+
T cell and CD56 bright natural killer cell, while high-risk groups showed higher enrichment scores in many other immune cells like activated dendritic cell, natural killer T cell and immune suppressive cells such as MDSC and Treg cells (Wilcoxon signed-rank test, p <0.05). The high-risk group also had higher enrichment scores in activated stroma, which demonstrated to be associated with poor prognosis (Wilcoxon signed-rank test, p <0.05) (17). We then compared the differential infiltration of 22 immune cells between high-and low-risk groups. The results showed that relative to high-risk groups, higher proportion of CD8+ T cells, follicular helper T cells and activated CD4+ memory T cells could be detected in low-risk groups. While resting CD4+   Figure 8C). The high-risk groups had higher PDL1 expressions compared to low-risk groups (Wilcoxon signed-rank test, p <0.05) ( Figure 8D). Gene mutation is an important factor of tumorigenesis and development. We profiled the mutation patterns and found that the high-risk groups had higher frequency of TP53 mutation and low-risk groups had higher frequency of FGFR3 mutations (Figures S4A, B). For the CNA status, there was no statistical significance between two groups ( Figure S4C). Notably, TMB is viewed as a potential biomarker for immunotherapy response and chemotherapy. Compared with high-risk groups, TMB was higher in low-risk groups (Wilcoxon signed-rank test, p <0.05) ( Figure 8E). The different responses for Methotrexate, Vinblastine, Doxorubicin, Cisplatin (MVAC), a combination of drugs often used for BC, between high-risk groups and low-risk groups were calculated based on the GDSC database. The estimated IC50 values showed that low-risk groups had a better response for Methotrexate (Wilcoxon signed-rank test, p <0.05) ( Figure 8F). In summary, the above findings indicated that the five-gene model is associated with different immune cell infiltration, TMB status and drug responses, suggesting that the model maybe helpful in medical decision-making and therapy.

Screening of the Potential Related Small Candidate Drugs
The CMap database was employed to seek for potential drugs that capable of using for the treatment of the patients in the high-risk groups. Based on the DEGs between high-risk and low-risk groups, CMap mode-of-action (MoA) analysis indicated mechanisms of action shared by the above inhibitors ( Figure 9) and small molecule  drugs with highly significant correlations ( Table 2). Positive connectivity scores indicate that drugs can induce biological phenomena in human cell lines. In contrast, negative connectivity scores indicate that the drug reverses the desired biological properties and thus has potential therapeutic value. Docetaxel, Levonorgestrel and Noscapine had been put into clinic and Amonafide was developed in phase 3 trials. These results might provide guidance for future clinical treatment of BC patients.

DISCUSSION
Increasing evidence shows that diverse components of TME, such as immune cells and extracellular matrix, contribute to cancer proliferation, invasion and immune evasion. As the most common malignancy of the urinary system, BC has been frequently studied, but association between TME-related genes and cancer prognosis has not been fully elucidated. In this study, we calculated the immune and stromal scores of BC samples using the ESTIMATE algorithm and found that the estimated scores were associated with BC progression and prognosis. We characterized the features of the three subclasses divided based on the selected TME-related genes. We found that Class 2, which had a higher fraction of anti-tumor cells such as CD8+ T cells and activated CD4+ memory T cells, had a better prognosis compared to Class 1, which had comparable stromal scores and fractions of immunosuppressive cells, such as Treg cells and M2 macrophages.
This result may provide evidence that immune components are associated with BC prognosis. However, the concrete mechanism of how they affect the progression of BC and the interactions between different immune cells need further exploration.
A number of studies have already presented several gene signatures for predicting the prognosis of BC (18)(19)(20)(21); however, few studies comprehensively delineated the TME of BC, and there has not been one robust prognosis model based on TME. As we highlighted in the entire study, the TME plays an important role in FIGURE 9 | CMap database analysis identifies potential cancidate small molecular drugs targeting the DEGs between high-and low-groups. the development of cancer and affects the growth, invasion, and metastasis of cancer (7,(22)(23)(24). It is of great value to explore the prognostic and therapeutic benefits based on TME. In this study, we established a five-gene model based on TME-related genes, and its prognostic value was validated in several independent cohorts. In addition, this model is also a potential predictor of chemosensitivity. It might be useful for clinical medication practice.
Among the members of the five-gene signature, evidence suggests that some of them are associated with the development of BC or other cancers. FPR1, a member of the chemotactic GPCR-7TM formyl peptide receptor family, is expressed on a variety of leukocytes and is primarily involved in leukocyte migration to the sites of bacterial infections. More recently, investigations have revealed that FPR1 plays an important role in cancer. Although the expression of FPR1 in healthy, non-immune cells is low, a number of tumors have been shown to express significant levels of FPR1 (25)(26)(27)(28)(29)(30). Importantly, a previous study indicated that targeting FPR1 by ICT12035, a selective small molecule antagonist, can provide a new avenue for the therapy of cancers (31). Therefore, further investigation of FPR1 antagonists can provide opportunities for more efficient treatment of cancers, including bladder cancer. IL10 is a cytokine that has pleiotropic effects on immunoregulation and inflammation. Previous studies reported that IL10 plays an important role in regulating BC immunosurveillance and immunotherapy (19,32,33). TNFAIP6 encoded proteins are involved in the EMT process, cell migration, and extracellular matrix stability (34,35). It is reported to be an inflammation-associated protein regulated by proinflammatory cytokines such as TNFa, interferon-g, and interleukin-1 (35, 36). It had been demonstrated that high TNFAIP6 expression is significantly associated with aggressive pathological features in urothelial cancer and gastric cancer (37,38). Incorporating TNFAIP6 immunostaining in current pathological examinations may be useful for identifying high-risk patients to assist in personalized medical decision-making and therapy. GFPT2 encodes for glutamine-fructose-6-phosphate aminotransferase 2, which is responsible for glycosylation. As a rate-limiting enzyme of the hexosamine biosynthesis pathway (HBP), it is involved in human breast and colon tumorigenesis (3,39). A study confirmed that in lung cancer, normal fibroblasts transformed to cancer-associated fibroblast (CAF)-like cells under TGF-b treatment and upregulated HBP genes which include GFPT2 (40). ZNF683 is known as a transcription factor that mediates a transcriptional program in various innate and adaptive immune tissue-resident lymphocyte Tcell types such as natural killer T cells (41). These studies offered evidences that our selected five-gene signature was indeed related to immune and stromal functions, and some of these genes were demonstrated to be associated with the development of BC and other types of cancers. Thus, our prognostic prediction model composed of evidence-supported genes would be more explainable. We believe that the combination of immunotherapy and FPR1 inhibitors such as Nedocromil and ICT12035, other potential small molecular drugs will be an available approach for clinical trials to further improve the therapeutic efficacy. However, the molecular mechanism by which the five-gene signature affects the prognosis of BC patients requires further experimental work.
Overall, our exploration based on the TME of BC will be beneficial for understanding how the TME components affects the prognosis of BC. The TME-based model may provide useful information for prognostic prediction and guidance for the clinical therapy of BC.

AUTHOR CONTRIBUTIONS
The design and data analysis were conducted by XJL. The immumohistochemical staining and drug search results were analyzed by JF. The draft was edited by XL and YS. The project administration was conducted by XL and YS. Review and supervise manuscripts were conducted by XL and YS. All authors contributed to the article and approved the submitted version.

ACKNOWLEDGMENTS
Everybody who made a contribution to the research has been listed as an author.

SUPPLEMENTARY MATERIAL
The