OSgbm: An Online Consensus Survival Analysis Web Server for Glioblastoma

Glioblastoma (GBM) is the most common malignant tumor of the central nervous system. GBM causes poor clinical outcome and high mortality rate, mainly due to the lack of effective targeted therapy and prognostic biomarkers. Here, we developed a user-friendly Online Survival analysis web server for GlioBlastoMa, abbreviated OSgbm, to assess the prognostic value of candidate genes. Currently, OSgbm contains 684 samples with transcriptome profiles and clinical information from The Cancer Genome Atlas (TCGA), Gene Expression Omnibus (GEO) and Chinese Glioma Genome Atlas (CGGA). The survival analysis results can be graphically presented by Kaplan-Meier (KM) plot with Hazard ratio (HR) and log-rank p value. As demonstration, the prognostic value of 51 previously reported survival associated biomarkers, such as PROM1 (HR = 2.4120, p = 0.0071) and CXCR4 (HR = 1.5578, p < 0.001), were confirmed in OSgbm. In summary, OSgbm allows users to evaluate and develop prognostic biomarkers of GBM. The web server of OSgbm is available at http://bioinfo.henu.edu.cn/GBM/GBMList.jsp.


INTRODUCTION
Glioblastoma (GBM) is the most common malignant tumor of the central nervous system (CNS) and causes a high mortality rate (Nikiforova and Hamilton, 2011;Stoyanov et al., 2018). Although many new therapies have improved the clinical outcome and more clinical trials have demonstrated the high efficacy in treating GBM, the survival rate of GBM patients is still low. GBM is a complex disease to tackle with a median survival period of approximately 14 months, and a 5-year survival rate of 5% (Stupp et al., 2005;Johnson and O'Neill, 2012;Polivka et al., 2017). Prognostic biomarkers have been showing great roles in cancer patient management and may guide targeted therapies. Therefore, it is greatly needed to investigate prognostic biomarkers in GBM.
Previous studies have reported some prognostic biomarkers in GBM, such as gene mutation of gene IDH and PTEN, and expression variation of gene CD133 (Yang et al., 2016;Cai and Sughrue, 2017;Nguyen et al., 2018). However, these biomarkers have not been translated to clinical applications due to the lack of independent validation. In addition, due to the molecular heterogeneity among GBMs and limited patient samples (Nathanson et al., 2014;Aldape et al., 2015;Brown et al., 2017), the prognostic behavior of a certain biomarker may be inconsistent or even contradictory between different reports. In other words, cross population validation in a larger patient cohort is critical for evaluating the prognostic biomarker.
In current work, we collected the gene expression profiles and clinical information of 684 GBM patients from seven independent cohorts obtained from TCGA, GEO and CGGA. We developed a user-friendly web server, OSgbm, to analyze the prognostic value of genes of interests. With this web server, it would facilitate researchers and clinicians to screen, develop and validate new prognostic biomarkers in GBM.

Datasets Collection
GBM datasets are from three major data sources. First, level-3 gene expression profiling data (HiSeqV2) and clinical information of GBM samples were downloaded from TCGA on April 2018 (https://portal.gdc.cancer.gov/). Second, four cohorts (≥30 cases) with available gene expression profiles and clinical survival information were collected from GEO database (http://www. ncbi.nlm.nih.gov/geo/). Third, two GBM cohorts were gathered from CGGA (http://www.cgga.org.cn/). After an initial filtration and quality check (with available gene expression profiling data and clinical survival information), 153 samples from TCGA, 276 samples from GEO, and 255 samples from CGGA were included for the following database and web server construction. The histology of recurrent GBM (rGBM) were included in GSE7696 (10 samples), GSE42669 (11 samples), CGGAarray (9 samples) and CGGAseq (22 samples) datasets. Two CGGA datasets also included 20 samples of secondary GBM (sGBM).

System Implementation and Server Set-Up
OSgbm is a web-based tool which uses J2EE (Java 2 Platform Enterprise Edition) architecture as we previously described Xie et al., 2019a;Zhang et al., 2019). The gene expression and clinical data were integrated in the background database, which was handled by a MySQL server. Dynamic web interfaces were written in HTML 5.0 and hosted by Tomcat on Windows Server. Using OSgbm requires a HTML 5.0-compliant browser with JavaScript enabled, but does not require any particular visual plug-in tool. Since the web server was designed for users with no specialized bioinformatics skills, we propose 'out-of-the-box' data. The input of OSgbm web server is official gene symbol. For the "Data Source: Combined" option, as all the datasets used in OSgbm already have been published, processed and normalized well, in order to avoid of the batch effect and platform biases among these datasets, we first stratify the patients into high-and lowexpression group for the input gene in each dataset, and then merged relative patients from high-and low-expression group from each dataset into a combined high-expression group (Upper group in the Kaplan-Meier plot) and a combined lowexpression group (Lower group in the Kaplan-Meier plot) for the analysis of Kaplan-Meier plot and log-rank test. The statistical analyses of input were performed with R package: KM curves with Hazard ratio (HR, 95% confidence interval) and log-rank p value were calculated by R package 'survival'. OSgbm is available at http://bioinfo.henu.edu.cn/GBM/GBMList.jsp.

Validation of Previously Reported Prognostic Biomarkers
A PubMed search was performed to identify previously reported GBM prognostic biomarkers, using keywords 'glioblastoma', 'survival' and 'biomarker'. Totally, 53 prognostic biomarkers were identified from 2013 publications. The flow chart of biomarker collection was showed in Figure S1. The prognostic values of these published biomarkers were analyzed in either a form of combined cohorts of all GBM patients or in a single cohort in our database.

The Clinical Characteristics of GBM Datasets Used in OSgbm
In OSgbm, we included a total of 684 unique GBM samples from seven datasets, including one TCGA cohort, four GEO cohorts and two CGGA cohorts. The survival information includes overall survival (OS), disease specific survival (DSS), disease free interval (DFI) and progression free interval (PFI) (Liu et al., 2018). The confounding clinical factors, such as age, grade, gender, histology and treatment regimens were included as well. Clinical characteristics of these datasets in the OSgbm were presented in Table 1. All of the 684 patients have OS data,

Set-Up of OSgbm Web Server
The main function of OSgbm web server is to evaluate and determine the prognostic value of the quested genes. The users start by typing the gene symbol and choosing one dataset of interest or the combined dataset with pooling all the datasets together. To measure the association between a quested gene and survival, GBM samples are categorized according to the median (or other appropriate cutoff value, such as Trichotomy, Quartile) of the selected gene, and KM analysis is used to compare the outcomes between groups (Xie et al., 2019b). The user could limit the analysis in a subgroup of the patients by setting the age range, grade, gender and so on. Once the gene symbol is input and clinical characters are chosen, OS, DSS, DFI or PFI of each stratified group can be measured and analysis results will be available on the output web page. The prognostic value of each given gene is determined by HR (95% CI) and log-rank p value.

DISCUSSION
The development of prognostic biomarkers is important for guiding the treatments especially for therapy-resistant GBM patients. In our work, we developed a new web server, OSgbm, to help researchers to evaluate the prognostic value of a given gene for GBM patients. OSgbm is easy to use and requires no special skills (such as bioinformatics training). With filtering by one or several clinical confounding factors provided in OSgbm, users can also evaluate the prognostic value of their interested genes according to their special needs. The function and performance tests of OSgbm web server showed that 96% (51 out of 53) of previously reported prognostic biomarkers could be confirmed in OSgbm, which indicates that these biomarkers validated in independent cohorts have the potency of translating to clinical applications, and also indicates the well performance of OSgbm. Nevertheless, there are two genes including IGF1R and PCBP2 which showed different prognostic values to the literatures, the discrepancy of prognostic performance of IGF1R and PCBP2 between OSgbm and literatures may be caused by race, different cohort size, or analysis level and methods (mRNA vs. protein, gene microarray vs. immunohistochemistry) (Maris et al., 2015;Luo and Zhuang, 2017). For example, the race reported in literatures for PCBP2 is Asian, while that in validated cohort of OSgbm is mostly White. The mRNA level was analyzed in OSgbm for IGF1R, while IGF1R was determined by immunohistochemistry in literature. In addition, the race analyzed in OSgbm for IGF1R is Asian (Korea for GSE42669 and Chinese for CGGA), while the race reported in literature for IGF1R is European. As a result, it will be necessary to validate the prognostic performance of IGF1R and PCBP2 in a larger independent cohort of glioblastoma.
In conclusion, OSgbm is a user-friendly web server to help researchers and clinicians to identify suitable prognostic biomarkers in GBM. Furthermore, we will keep update the database of OSgbm to collect more and more GBM datasets when new GBM dataset is available, and will implement the multivariate cox proportional hazards model into OSgbm for the purpose of adjustment for the confounding clinical factors, and we also encourage users to contact us to upload their own data into OSgbm.

DATA AVAILABILITY STATEMENT
All datasets for this study are included in the article/ Supplementary Material.

AUTHOR CONTRIBUTIONS
XG conceived and directed the project. HD and QW collected data and developed the web server. HD, NL, JL, LG, MY, GZ, YA, FW, LX, and YL performed data analysis. WZ, HZ, and MZ contributed to data analysis and paper writing. XG and HD wrote the manuscript with the assistance and approval of all authors.