Development and External Validation of a Novel 12-Gene Signature for Prediction of Overall Survival in Muscle-Invasive Bladder Cancer

Purpose: We aimed to develop and validate a novel gene signature from published data and improve the prediction of survival in muscle-invasive bladder cancer (MIBC). Methods: We searched the published gene signatures associated with the overall survival (OS) of MIBC and compiled all 274 genes to develop a novel gene signature. RNAseq data of TCGA (the Cancer Genome Atlas) bladder cohort were downloaded. All genes were included in a univariate Cox hazard ratio model. We then used a reduced multivariate Cox regression model, which included only genes achieving P < 0.05 in the univariate model. A total of 172 patients at Fudan University Shanghai Cancer Center (FUSCC) and 61 patients from GEO datasets were used as an external validation set. Results: A total of 327 patients in the TCGA cohort were enrolled. We identified 274 genes from eight published papers on the OS of MIBC. Using the TCGA database, we identified 12 genes that correlated with OS (P < 0.05 in both univariate and multivariate analyses). By integrating these genes with the RT-qPCR data in our validation dataset and GEO datasets, we confirmed that the power for predicting OS of the 12-gene panel (AUC of 0.741 and 0.727, respectively) was higher than just clinical data (including gender, age, T stage, grade, and N stage) alone in the TCGA and FUSCC cohort (AUC of 0.667 and 0.631, respectively). Additionally, upon combining the clinical data and 12-gene panel together, the AUC increased to 0.768, 0.757, and 0.88 in the TCGA, FUSCC and GSE13507 cohorts, respectively. Conclusions: Applying published gene signatures and TCGA data, we successfully built and externally validated a novel 12-gene signature for the survival of MIBC. Brief Explanation We systemically reviewed all published prognostic gene signatures of muscle-invasive bladder cancer (MIBC) and integrated the genes in the TCGA MIBC cohort. This new gene panel was validated in a newly established MIBC cohort in GEO and FUSCC. This method can help update the previous established panels in a new way.


INTRODUCTION
Bladder cancer is the fourth most common cancer, with an incidence of ∼7% among all male malignancies, and the eighth most common cause of mortality in men (1). In 2015, a total of 80,500 new bladder cancer cases were expected in China, with 32,900 estimated cancer-related deaths (2). Urothelial carcinoma is the dominant histological subtype of bladder cancer, except in certain parts of Africa and the Middle East (3). Despite the considerable progress in the treatment of bladder cancer, the prognosis of patients with muscle-invasive bladder cancer (MIBC) remains poor, which is partly attributable to the heterogeneity of disease characteristics (4). This indicates the need for an accurate prognostic assessment after radical cystectomy that is essential for treatment decision-making, patient counseling, and most importantly for defining the indication of adjuvant chemotherapy (5).
The American Joint Committee on Cancer (AJCC) TNM staging system, which has been appropriately validated, is the most widely used prognostic model to predict outcome in patients treated with radical cystectomy (6). Although these staging systems provide useful estimates of clinical outcome, their major limitation is the difficulty of incorporating novel clinical information, such as molecular markers or more complex bioinformatics. Furthermore, current staging systems have been shown to be less accurate than some prediction models that incorporate several sets of clinical data in the era of personalized medicine (7). A recently reported, comprehensive molecular analysis of urothelial bladder cancer from the Cancer Genome Atlas (TCGA) (http://cancergenome.nih.gov/) has provided novel insights into molecular subgroups and potential therapeutic targets for this disease (8). A few studies on gene signatures associated with tumor characteristics and outcomes for MIBC had already been reported before the release of the TCGA database. We searched the PubMed database and found eight papers that summarize the gene signature regarding the prediction of overall survival (OS). However, as indicated in the paper by Riester et al. (9), the performance of these eight gene signatures regarding the OS of MIBC was not so robust, as most of their C-indexes were <0.70. Additionally, as the management of MIBC and chemotherapy has changed in recent decades, all the gene signatures need to be updated. Therefore, we planned a study to integrate all of the published genes with TCGA RNAseq data to develop a novel gene panel and to validate the panel in our own cohort by qRT-PCR.
We established a novel 12-gene signature using TCGA data that was well-validated in our cohort and shown to be superior to TNM staging. This signature improved the prediction of survival of MIBC patients when combined with conventional clinical data including gender, age, tumor T and N stages, and tumor grade. Our study has refined the gene signature of MIBC integrated with the RNAseq data of TCGA. These results might reveal new therapeutic targets for bladder cancer and may be helpful during consultations with patients to predict prognosis.

Selection of Published Studies
We searched EMBASE (www.embase.com) and MEDLINE (www.ncbi.nlm.nih.gov/pubmed) from their inception to December 2017 and systematically identified gene signature studies predicting the OS of MIBC. No language restrictions were applied. The search terms used were as follows: ("bladder cancer" OR "bladder neoplasm" OR "bladder tumor" OR "bladder urothelial carcinoma") AND ("gene signature" OR "gene profile" OR "gene model" OR "molecular profile" OR "genomic profile" OR "gene expression"). Irrelevant studies were identified and excluded by scanning their titles and abstracts. The full text of the remaining articles was carefully reviewed to determine whether the articles contained information on the topic of interest. We also scanned the cited references of the retrieved articles and reviews to identify any additional relevant studies. Finally, we retrieved all of the gene panels relevant to MIBC and OS (Figure 1).

Patient Cohorts
Level 3 TCGA RNAseq data from bladder urothelial carcinoma (BLCA) samples were obtained from the TCGA data portal (https://genome-cancer.ucsc.edu/proj/site/hgHeatmap/). Tumor transcriptomic profiles of 20,534 genes were measured in 436 primary bladder cancer patients. Only the 327 patients with intact clinical information, especially follow-up data, were included in this study. The clinical information was retrieved from the "Clinical Biotab" section of the data matrix based on the Biospecimen Core Resource (BCR) identification numbers of the patients. Extended demographic parameters for these patients, characterized by TCGA consortium, are shown in Table 1.
The Fudan University Shanghai Cancer Center (FUSCC) validation cohort consisted of 172 patients with urothelial bladder cancer that was histologically confirmed by an experienced pathologist and treated by radical cystectomy without any pretreatment. These patients were consecutively enrolled from 2008 to 2015 (shown in Table 1). Once resected, tumor tissues were frozen and stored at −80 • C. Written informed consent was obtained from all participants of this study. The study protocol was approved by the institutional review board of FUSCC and was carried out in accordance with the approved guidelines (approval ID: 050432-4-1805C).
The GEO dataset GSE13507 was downloaded from the website: www.ncbi.nlm.nih.gov/geo. RNA expression data and metadata of 61 patients were used for external validation of the gene signature. OS data were used for prognosis prediction.

RNA Preparation, cDNA Synthesis, and qRT-PCR Validation
Total RNA from frozen tissue specimens was extracted using TRIzol reagent (Invitrogen, Carlsbad, CA, USA), in accordance with the manufacturer's instructions. RNA quantity and quality were determined using a NanoDrop ND-1000 Spectrophotometer (NanoDrop Technologies, Wilmington, DE, USA). mRNA levels were measured using a RevertAid First Strand cDNA Synthesis Kit (K1622; Thermo Fisher Scientific,  Table 1. qRT-PCR was performed on the Applied Biosystems 7,900 Real-Time PCR system using SYBR Green dye (Applied Biosystems, Foster City, CA, USA), as described by the manufacturer. All determinations were performed in triplicate and in at least three independent experiments. The mean Ct value of each gene minus the mean Ct value of ACTB was calculated as Ct. The -Ct value of each gene was applied for binary logistic regression and model construction. The details of this experiment are shown in our previous paper (10).

Statistical Analysis
All of the statistical analyses, including gene selection, classification model construction, and independent testing, were performed with R software and packages from the RMS and Bioconductor project (11,12). For the data obtained by qRT-PCR, univariate and multivariate Cox regression models were used for the selection of genes for the predictive gene signature. All significance tests were two-sided, and a P < 0.05 was considered significant. Area under the ROC curve (AUC) was used as an accuracy index to identify the best combination of multiple markers.

Acquiring Gene Signatures Associated With the OS of MIBC From Literature Analysis
Two reviewers (H.X. and F.W.) conducted the literature search independently. This resulted in the identification of eight studies (9,(13)(14)(15)(16)(17)(18)(19)) that met our requirements; all genes included in these studies are featured in our candidate signature gene list after dereplication. All eight studies and the 274 genes are shown in Supplementary Tables 2, 3, respectively.

Gene List Discovery Using TCGA Database
In the TCGA cohort, we included 239 male and 88 female patients with a median age of 69 years, ranging from 38 to 90 years. Twothirds of patients were AJCC stage III and IV patients. Since the gene names in the gene models from the different reports were not consistent, we unified the IDs of all 274 genes into their official gene names to find the corresponding genes in the full TCGA gene list. OS data were retrieved from the TCGA cohort and univariate Cox regression was performed to identify the prognostic value of the 274 genes (Supplementary Table 4).

Validation of the Integrated Gene Signature in the FUSCC Cohort
The FUSCC cohort included 147 male and 25 female patients who underwent cystectomy. The patient age ranged from 31 to 87 years old, with a median of 63 years. We used qRT-PCR to validate all 12 genes in the FUSCC cohort using fresh frozen tissues obtained from cystectomy. The Ct value of each gene was normalized by the β-actin Ct value. All 12 genes were significant in the univariate model (all P < 0.05, Table 4). In

The 12-gene Signature in MIBC Improved the Predictive Value of the Clinical Model
To further assess the prognostic power of the 12-gene signature, we compared this model with a clinical model including gender, age, T stage, tumor grade, and N stage. We used "rms" package in R project to calculate the C-index values of the multivariate cox regression models. The results are shown in Table 5. In TCGA and FUSCC, the 12-gene signature was more accurate than the clinical model with a higher C-index. We then enrolled all the clinical and gene parameters together in a multivariate cox regression model for a combining model. The C-index reached 0.768, 0.757, and 0.88 in the TCGA, FUSCC and GSE13507 cohort, respectively. These results are shown in Table 5.

DISCUSSION
Once diagnosed, the survival of MIBC patients can range from 1 week to a few years. The disease progression is dependent on risk factors such as tobacco smoking history, exposure to chemicals, radiotherapy, chronic urinary infection, gender, and genetic differences. These clinical criteria may not reflect the entire biology of the disease. In this study, we investigated the efficacy of the 12-gene panel to predict the survival of MIBC patients. Although, previous studies have already developed several gene signatures for predicting OS of MIBC, the management of bladder cancer has improved over decades and the models need to be updated as well. An effective gene signature will improve patient counseling after cystectomy and can better identify candidates who need more aggressive management. In our study, we performed a meta-analysis to systematically review the literature on gene models and attempted to integrate them together with a relatively newly established public cohort. The updated and integrated novel model should be more applicable to recently treated patients. This approach is relatively novel and has not yet been widely used. In this pilot study, gene expression data from the public TCGA cohort of patients with MIBC were analyzed and external validations were performed using the cohort at our center and GEO datasets. Additionally, we randomly selected 5 genes from the 274 genes, and found the predictive superiority of our 12-gene panel (Supplementary Table 5). Leliveld et al. reported that pathological TNM stage and age were independent prognosis factors for patients with MIBC who underwent radical cystectomy (20). Jin S et al. analyzed lymph node-associated variables [pathological N stage (pN), lymph node ratio (LNR) and log odds (LODDDS)] in the patients with MIBC and found c-index that predicted the survival was 0.6769 (pN), 0.6794 (pN+ LNR), and 0.6855 (pN+LNR+LODDDS) (21). Mitra et al. established clinical and genomic classifiers and the c-index was up to 0.73 (18). However, the c-index of the combination model in our study was 0.88. Thus, our update improved the survival prediction. Compared with conventional clinical data, the genomic-clinicopathologic combination in our study had higher clinical benefit by decision curve analysis (Figure 2). These findings are particularly striking given the relative homogeneity of the population analyzed, as our cohort, in which all patients undergone the most aggressive surgical therapy, was strongly selected for being at high risk of death from disease.
Among the members of the novel gene panel, C6orf62 may participate in suppressing proliferation and inducing differentiation through regulating the cell cycle (22). YIF1A is involved in the pathway of the transport of proteins to the Golgi and their subsequent modification, as well as the unfolded protein response. Discrete sites in Yif1A that are necessary for the regulation of endoplasmic reticulum structure have also been identified (23). Moreover, in samples from clinical squamous cell cancer, six genes (GAL, GSTP1, MRPL11, MRPL21, SF3B2, and YIF1A) at 11q13.1-13.3 and one gene (GALR1) at 18q23 showed significant differences in expression between normal and tumor samples (24). Another member of the gene panel is ADRBK2, which encodes the β-adrenergic receptor kinase, a direct target of CREB activation that regulates the neuroendocrine differentiation of prostate cancer cells (25). This kinase is essential for cell metastasis, promotes prostate tumor progression (26), and regulates breast cancer migration, invasion, and metastasis (27). Another gene panel member is CYFIP2, which is involved in Tcell adhesion and p53/TP53-dependent induction of apoptosis. IMP-1 displays cross-talk with K-Ras and modulates colon cancer cell survival through this novel proapoptotic protein (28).
Among the other members of the gene panel, ATIC promotes insulin receptor/INSR autophosphorylation and is involved in INSR internalization (29). The small-molecule inhibitor of ATIC has been shown to suppress the proliferation of breast cancer cells (30). QPRT, which encodes a key enzyme in the catabolism of quinolinate, an intermediate in the tryptophan-nicotinamide adenine dinucleotide pathway, is a potential marker for follicular thyroid carcinoma including the minimally invasive variant (31). EHBP1 encodes a protein that may play a role in endocytic trafficking. The single nucleotide polymorphism rs721048(A>G) in EHBP1 is associated with an aggressive form of prostate cancer (32). EHBP1 is also essential for the anti-invasive effect of atorvastatin in prostate cancer (33).
The panel also includes MARCH7, which is a member of the MARCH family of membrane-bound E3 ubiquitin ligases. E3 ubiquitin ligases accept ubiquitin from an E2 ubiquitinconjugating enzyme in the form of a thioester and then directly transfer the ubiquitin to targeted substrates. MARCH7 promotes ovarian tumor growth and its expression is correlated with poor prognosis in epithelial ovarian cancer (34,35). CPA4 is also included in the panel and may be involved in the histone hyperacetylation pathway. CPA4 is imprinted and may be a strong candidate gene for the aggressiveness of prostate cancer (36) as well as a promising diagnostic serum biomarker for both pancreatic cancer and non-small cell lung cancer (37,38) and an adverse prognostic marker for gastric cancer, NSCLC, and colorectal cancer (37,39,40). The gene panel member SARDH encodes an enzyme localized to the mitochondrial matrix that catalyzes the oxidative demethylation of sarcosine. TMEFF2 and SARDH cooperate to modulate one-carbon metabolism and the invasion of prostate cancer cells (41). Another gene in this list is SUZ12, which is associated with diseases including endometrial stromal sarcoma and endometrial stromal nodules. Among its related pathways are cellular senescence and chromatin organization. SUZ12 promotes proliferation and metastasis in many cancers, including gastric cancer (42), colorectal cancer (43), ovarian cancer (44), bladder cancer (45,46), and NSCLC (47).
The gene encoding epidermal growth factor receptor (EGFR) is also included in the gene panel. EGFR is a receptor tyrosine kinase of the ErbB family. Several studies have shown that the EGFR family of RTKs is involved in urothelial carcinoma progression and chemoresistance. Many clinical trials using inhibitors of EGFR family RTKs have also been performed or are underway (48).
Although it is lack of novelty and function work in our study and our results require further investigation of the efficacy of the 12-gene signature panel in patients, this panel could be extremely beneficial to identify patients at elevated risk of death that may require adjuvant therapy.

CONCLUSION
By applying published gene signatures and TCGA data, we successfully built and externally validated a novel 12-gene signature for the survival of MIBC. This model was generated by integration and updating of the existing model. The model improved the prediction of disease progression or survival and may help facilitate doctor-patient consultations and eventually benefit patients.

DATA AVAILABILITY
All datasets generated for this study are included in the manuscript and/or the Supplementary Files.

ETHICS STATEMENT
This study protocol was approved by the Institutional Review Board of FUSCC and was carried out in accordance with the approved guidelines (approval ID: 050432-4-1805C). As the data (TCGA and GEO datasets) are publicly available, no ethical approval is required.

AUTHOR CONTRIBUTIONS
FW, YS, and DY had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis, study concept, design, and supervision. MA and HX acquisition of data and drafting of the manuscript. ZJ, YaZ, and YiZ analysis and interpretation of data. GS, HZ, and BD critical revision of the manuscript for important intellectual content. FW and MA statistical analysis. FW and DY obtaining funding. YaZ and FW administrative, technical, or material support.