OSblca: A Web Server for Investigating Prognostic Biomarkers of Bladder Cancer Patients

Bladder cancer (BC) is one of the most common malignant tumors in the urinary system. The discovery of prognostic biomarkers is still one of the major challenges to improve clinical treatment of BC patients. In order to assist biologists and clinicians in easily evaluating the prognostic potency of genes in BC patients, we developed a user-friendly Online consensus Survival tool for bladder cancer (OSblca), to analyze the prognostic value of genes. The OSblca includes gene expression profiles of 1,075 BC patients and their respective clinical follow-up information. The clinical follow-up data include overall survival (OS), disease specific survival (DSS), disease free interval (DFI), and progression free interval (PFI). To analyze the prognostic value of a gene, users only need to input the official gene symbol and then click the “Kaplan-Meier plot” button, and Kaplan-Meier curve with the hazard ratio, 95% confidence intervals and log-rank P-value are generated and graphically displayed on the website using default options. For advanced analysis, users could limit their analysis by confounding factors including data source, survival type, TNM stage, histological type, smoking history, gender, lymph invasion, and race, which are set up as optional parameters to meet the specific needs of different researchers. To test the performance of the web server, we have tested and validated its reliability using previously reported prognostic biomarkers, including KPNA2, TP53, and MYC etc., which had their prognostic values validated as reported in OSblca. In conclusion, OSblca is a useful tool to evaluate and discover novel prognostic biomarkers in BC. The web server can be accessed at http://bioinfo.henu.edu.cn/BLCA/BLCAList.jsp.


INTRODUCTION
As one of the most common malignant tumors of the urinary system, bladder cancer (BC) is estimated to cause about 549,393 new cases and 199,922 deaths worldwide in 2018 (1). Based on the clinic-pathological features, BC could be classified into two types: non-muscle invasive tumor (NMIBC, 70-80% of BC patient) and muscle-invasive tumor (MIBC, 20-30% of BC patient) (2,3). Due to the relatively high rate of local recurrence and metastasis in MIBC patients, the treatment outcome is still poor, and the survival rate is lower than that of NMIBC patients. Although NMIBC patients have better survival rates than MIBC, 30-50% of NMIBC patients experience cancer recurrence (4). One of the major challenges to improve clinical outcomes of BC patients is to screen novel biomarkers for diagnosis and prognosis (5).
In recent years, a large number of prognostic biomarkers including DNA markers and protein markers have been reported (6)(7)(8). Some of the prognostic biomarkers, especially the ones involved in biological processes, are useful to identify highrisk patients, and could be used to predict the prognosis and treatment response. However, few biomarkers have been translated into clinics due to the lack of independent validation (5,9,10). With the advance of high through-put technologies, more and more studies analyzed the gene expression of cancer samples and uploaded these data on public databases such as The Cancer Genome Atlas (TCGA, https://portal.gdc.cancer. gov/) and Gene Expression Omnibus (GEO, https://www. ncbi.nlm.nih.gov/geo/). These data offer opportunities for the biomarker discovery, validation, and clinical application (11,12). Unfortunately, until now, this convenient online tool is still unavailable to clinicians and biologists to evaluate and verify the prognostic value of the genes of interests in different datasets for BC.
To solve this problem, we developed an online web server named OSblca, which consists of gene expression profiles and relative clinical information of 1,075 bladder cancer patients from seven independent cohorts collected from TCGA and GEO databases. This web server enables researchers and clinicians to analyze the prognostic value of a gene of interest and accelerates the development of prognostic biomarkers.

Datasets Collection
Gene expression profiles and clinical follow-up information of bladder cancer patients were collected from TCGA and GEO databases. For TCGA dataset, level-3 gene expression profiling data (HiSeqV2) and clinical information of BC samples were downloaded in April 2018. In order to collect the relative datasets from GEO, keywords including "bladder cancer, " "prognosis, " "survival, " and "gene expression" were used to search in GEO database. Next, manual checks of the availability of data of mRNA expression, clinical survival information and at least 50 patients were performed.

Development of OSblca
The OSblca web server was developed by Java script, and hosted by Tomcat 7.0 on Windows 2008. The database system that stores the gene expression and clinical data was handled by SQL Server 2008. The R package "RODBC" is used as a middleware to connect R and SQL. The input of OSblca web server must be the official gene symbol from NCBI (https://www.ncbi.nlm. nih.gov/). The outputs include Kaplan Meier (KM) survival curves, Hazard ratio (HR with 95% confidence interval) and log-rank P-value that are produced by R package "survival" (https://CRAN.R-project.org/package=survival). A gene could be regarded as a potential prognostic biomarker for BC patients when the log-rank P-value is < 0.05. OSblca can be accessed at http://bioinfo.henu.edu.cn/BLCA/BLCAList.jsp. A web server architecture diagram is presented in Figure 1A. The screenshot of the web server interface and the result are shown in Figure 1B.

Validation of Previously Published Prognostic Biomarkers in OSblca
In order to validate the performance of prognostic analysis in our web server, prognosis biomarkers for BC were searched in PubMed using the keywords "bladder cancer, " "survival, " "gene expression, " "biomarker, " and "prognosis." The prognostic capabilities of these genes were evaluated in all cohorts, and all cutoff values in "splitting the patients" were tested in each cohort to get the best cutoff value.

Clinical Characteristics of the Patients in OSblca
According to our criteria, in total 1,075 unique bladder cancer patients were collected from seven data sets including one TCGA cohort and six GEO cohorts. Survival information including overall survival (OS), disease specific survival (DSS), disease free interval (DFI), progression free interval (PFI) were gathered. No patient was lost to follow-up. Of the above, 935 patients have overall survival information, and the median overall survival time is 25.03 months. We also collected age, TNM stage,

Survival Analysis of BC Patients Based on Clinical Characteristics
The Kaplan-Meier plots for the bladder cancer patients in OSblca stratified by TNM stage, histological type, gender, smoking history, lymph invasion, and race are presented in Figure 2.

Usage of OSblca
The main function that OSblca provides is to evaluate and verify the prognostic value for a given gene. "Gene symbol, " "Data source, " "Survival, " and "Split patients" are set as the four main parameters. The input dialog box of "Gene symbol" is on the upper left of the OSblca page ( Figure 3A). A red prompting message will show up when the input is not an official gene symbol. "Data source" provides eight options including independent analysis in one of seven cohorts and in a combined cohort consisting of all the BC patients from seven cohorts. The users can choose to evaluate the prognosis of a given gene in an individual cohort or in a combined cohort according to their needs. Under "Survival" option, four prognostic terms including OS, DSS, DFI, and PFI are provided. In the "Split patients" dialog box, user can select different thresholds of gene expression levels to divide patients into two subgroups for input gene. After then, by clicking the "Kaplan-Meier plot" button, OSblca server will take the request and return the analysis results, which are graphically displayed and presented with HR, 95% CI and log-rank P-value ( Figure 3B).
In order to meet the specific needs, six confounding clinical factors including TNM stage, smoking history, gender, lymph, histological type, and race, were set as optional filter factors in the prognostic analysis. As showed in Figure 3A, each factor has 2-5 options for users to choose from.

Validation of Previously Published BC Biomarkers
To test the reliability of prognosis prediction in our web server, we evaluated 21 prognostic biomarkers from 16 previously reported literatures in the OSblca web server, including KPNA2, TP53, and MYC (17-32). As shown in Table 2, 17 out of 21 (82%) previous reported prognostic biomarkers were showed to have  significant prognostic potency in OSblca, while the remaining four previously reported prognostic biomarkers did not reach significance in OSblca. Among the 17 validated prognostic biomarkers, 11 genes showed significant prognostic abilities in the combined cohort.

DISCUSSION
The discovery of prognostic biomarkers is a hot topic in translational research. In the current study, we present a convenient web server to assist researchers and clinicians to quickly screen and evaluate the prognostic value of genes in different cohorts of BC. As shown in a straightforward web interface, people without much bioinformatics experience can easily navigate OSblca to investigate genes of interests. In addition, users can perform survival analysis filtered by one or several factors according to the specific research purposes of their needs. The validation of previously reported prognostic biomarkers in OSblac showed that our web tool is reliable and can be used in prognostic analysis for BC patients. Notably, 11 genes, such as KPNA2 and TP53, were confirmed as prognostic biomarkers in the combined cohort, which indicated that these genes may be more widely applied as prognostic candidates for BC patients.
In summary, OSblca is a free online survival analysis web server that allows clinicians and researchers to rapidly analyze the prognostic value of a given gene in BC. We will keep updating OSblca to make it more powerful for the users.

AUTHOR CONTRIBUTIONS
GZ, QW, MY, and XG collected data, developed the server, and drafted the paper. QY, YD, XS, YA, and HD set up the server and performed the analyses. LX, WZ, and YW contributed to data analysis and paper writing. All authors edited and approved the final manuscript.