OSlgg: An Online Prognostic Biomarker Analysis Tool for Low-Grade Glioma

Glioma is the most frequent primary brain tumor that causes high mortality and morbidity with poor prognosis. There are four grades of gliomas, I to IV, among which grade II and III are low-grade glioma (LGG). Although less aggressive, LGG almost universally progresses to high-grade glioma and eventual causes death if lacking of intervention. Current LGG treatment mainly depends on surgical resection followed by radiotherapy and chemotherapy, but the survival rates of LGG patients are low. Therefore, it is necessary to use prognostic biomarkers to classify patients into subgroups with different risks and guide clinical managements. Using gene expression profiling and long-term follow-up data, we established an Online consensus Survival analysis tool for LGG named OSlgg. OSlgg is comprised of 720 LGG cases from two independent cohorts. To evaluate the prognostic potency of genes, OSlgg employs the Kaplan-Meier plot with hazard ratio and p value to assess the prognostic significance of genes of interest. The reliability of OSlgg was verified by analyzing 86 previously published prognostic biomarkers of LGG. Using OSlgg, we discovered two novel potential prognostic biomarkers (CD302 and FABP5) of LGG, and patients with the elevated expression of either CD302 or FABP5 present the unfavorable survival outcome. These two genes may be novel risk predictors for LGG patients after further validation. OSlgg is public and free to the users at http://bioinfo.henu.edu.cn/LGG/LGGList.jsp.


INTRODUCTION
Glioma is the most frequent primary brain tumor with four grades from grade I to IV. Grade IV glioma is also known as glioblastoma, while grade II and III glioma refer to low-grade glioma (LGG) designated by World Health Organization (WHO) (1)(2)(3)(4).
LGG includes three histological types: astrocytoma, oligodendroglioma, and oligoastrocytoma (4)(5)(6), while oligoastrocytoma is no longer considered as a separate entity since the current WHO classification has included molecular markers (including IDH1 mutation and 1p/19q codeletion) to identify astrocytoma and oligodendroglioma, not oligoastrocytoma (3,7). Although less aggressive than high-grade glioma, LGG eventually advances to high-grade glioma without intervention therapy (5,8). For most LGG patients, the treatment is surgical excision followed by radiotherapy and/or chemotherapy including temozolamide (TMZ) and PCV (combination of procarbazine, lomustine, and vincristine) (5,9). However, some patients would be tolerant or resistant to such uniform treatment and progress to relapse and eventual lead to death faster than the others (5,8), maybe due to the molecular heterogeneity of LGG (10)(11)(12), so the optimum timing of the therapeutic schedule needs to be determined case by case (13).
With the availability of public gene expression profiling data, more and more molecular predictive and prognostic indicators have recently been identified in LGG to guide the personalized therapy by informing which patients require early intervention and predicting the prognosis outcome (6,14). However, it requires specific bioinformatics skills to perform prognosis analysis using these gene expression profiling data. It is desirable that users with limited bioinformatics skills can assess prognostic biomarkers for LGG using a convenient and easy-touse bioinformatics tool. In the present study, we developed an easy-to-use web server named OSlgg, which provides a platform to evaluate the prognostic value of a gene of interest by applying Kaplan-Meier plot to present the association between candidate gene and survival rate, conduce to the clinical translation of potential prognostic biomarkers and targeted therapies for LGG patients.

Data Collection
Gene expression profiling and related long-term follow-up data of low-grade gliomas were collected from GEO (Gene Expression Omnibus) and TCGA (The Cancer Genome Atlas) database. For dataset searching, the keywords, including "lowgrade glioma, " "gene expression, " and "survival" were used in GEO database. The criteria for dataset accession are as followed: (1) has gene expression profiling data; (2) includes the long-term follow-up data of patients; (3) contains more than 50 LGG cases to enable valid survival analysis. Thus, one GEO dataset (GSE107850) with 195 LGG cases was collected ( Table 1). For TCGA dataset, gene expression profiling (RNAseq, level-3, HiSeqV2) and follow-up data of 525 LGG cases were downloaded in 2019 ( Table 1). The survival terms of follow-up data include OS (overall survival), RFS (relapse-free survival) and PFS (progression-free survival) ( Table 1). And the clinicopathologic characteristics of LGG patients are summarized in Table S1.

Development of OSlgg
OSlgg adopts object-orient programming method to develop each function module based on the structure of B/S (Browser/Server). Java and R are used to achieve server-side. The web server function was divided into three parts, including UI (user interface), data analysis and data access. Java and R language are used for data analysis and data access, respectively. UI is developed by HTML5, JQurey, and JSP. And the real time communication between web server and clients is achieved by Servlet. Gene expression profiling and clinical data were stored in relational tables in SQL Server database. System architecture flow diagram is presented in Figure 1, as previously described (15)(16)(17)(18). OSlgg can be accessed at bioinfo.henu.edu.cn/LGG/LGGList.jsp.

Verification of Prognostic Biomarkers in OSlgg
To assess the reliability of prognostic analysis of OSlgg web server, previously published prognostic biomarkers of LGG were searched in PubMed using the keywords "low-grade glioma, " "survival, " "prognosis" and "biomarker." As a result, we collected 93 papers with 86 reported prognostic biomarkers. The prognostic abilities of these prognostic genes were assessed in OSlgg.

Discovery of Novel Prognostic Biomarkers in OSlgg
To identify novel prognostic biomarker for LGG, we genomewidely analyzed the prognostic values of human genes using Cox regression analysis. Genes significantly related to prognosis were selected (cox p value < 0.05), including CD302 and FABP5. As they exhibited significant correlation with prognosis (p value < 0.000001) in Cox regression analysis, we further evaluated the prognostic values of CD302 or FABP5 in OSlgg. In addition, correlation analysis and GSEA (Gene Set Enrichment Analysis) were performed to investigate the functions of CD302 and FABP5. Correlations between the expression levels of CD302 or FABP5 and 86 previously reported LGG prognostic biomarkers were assessed using Spearman's rank correlation test of a nonnormal distribution as continuous measures and TCGA data. For GSEA analysis, patients from TCGA cohort were split into two subgroups according to CD302 or FABP5 expression, named as CD302 or FABP5 Upper 25% expression and Lower 75% expression. Then GSEA was run to investigate the gene sets enriched in each subgroup.

Statistical Analysis
Statistical evaluation was performed with SPSS 19.0 (SPSS Inc., Chicago, IL, USA) and GraphPad Prism 7.0 (GraphPad Inc., La Jolla, CA, USA). The association between CD302/FABP5 expression and clinicopathological characteristics was measured by using Chi-square test. Students' t-test and one-way ANOVA (analysis of variance) were employed to determine the significance of expression difference of CD302/FABP5 expression in distinct histologic grades and primary therapy outcomes, respectively. Univariate and multivariate cox regression analysis of CD302/FABP5 expression and clinical factors associated with survival of LGG patients were conducted by using SPSS. A value of p < 0.05 was considered to be statistical significant.
A summary of clinical features for each cohort was shown in Table S1. The Kaplan-Meier plots for LGG patients in OSlgg grouped by different histological type, histologic grade, IDH status, primary and follow-up therapy outcome were presented in Figure 2. As shown, these clinical features were significantly associated with survival (OS or PFS), respectively ( Figure 2).

Application of OSlgg
In OSlgg, "Gene symbol, " "Data Source, " "Survival, " and "Split patients by" are set as the four main parameters to assess the prognostic value of a gene of interest (Figures 3, 4). Typically, the official gene symbol is required to be filled into the "Gene symbol" input box by users. Drop-down menu of "Data source" offers two options for users to pick either of the two independent cohorts (TCGA and GSE107850) ( Figure 3B). Next, users may select the cut-off, by which patients can be split into 2-4 groups according to the expression of the inquired gene ( Figure 3C). Furthermore, according to user's special needs, users may divide LGG patients into subgroups by setting different clinical factors, such as histological type, IDH status, therapy outcome, gender, treatment, etc. (Figures 3, 4). Then user could click the "Kaplan-Meier plot" button, OSlgg will receive the query and output the analysis results to users in a graphical manner on the web page, present the Kaplan-Meier survival curve, HR (with 95% confidence interval) and p value.

Discovery of Novel Potential Prognostic Biomarkers in OSlgg
In order to discover novel risk predictors for LGG, we analyzed the prognostic abilities of all known human genes using Cox regression. As a result, two genes were identified as potential biomarkers, including CD302 and FABP5, which were both significantly associated with survival (OS, RFS and PFS) in OSlgg (Figure 6 and Table 3). Moreover, we found that patients with elevated CD302 or FABP5 expression exhibited worse survival in both TCGA (OS and RFS) and GSE107850 (PFS) datasets, while the lower expression patients presented better survival (Figure 6 and Table 3), indicating that both CD302 and FABP5 could predict the adverse outcome as unfavorable predictors.
To determine whether the prognostic significances of CD302 and FABP5 are caused by correlation with the previously reported prognostic genes, the correlation analysis between CD302/FABP5 and the 86 reported prognostic biomarkers were performed, and showed that CD302/FABP5 were positively correlated with LGG patients were split into two subgroups according to CD302 or FABP5 expression, named as CD302 or FABP5 Upper 25% expression and Lower 75% expression. (B,C) Gene sets enriched in CD302 and FABP5 overexpressing LGG cases, respectively. (D,E) GSEA heat maps for differential expression genes enriched in CD302 and FABP5 overexpressing LGG cases, respectively. 6 reported prognostic genes, including RAB34, CHI3L1, VIM, YAP1, FTL, and MMP14 ( Figure 7A). Among these, RAB34 is positively associated with both CD302 and FABP5, CHI3L1 is positively associated with FABP5, and the remaining four genes are all positively correlated with CD302 ( Figure 7A). The GSEA analysis of LGG cases showed that those cases with high CD302 expression enriched gene sets involved in JAK/STAT signaling pathway, cytokine receptor interaction, and primary immunodeficiency (Figure 7B). And the same analysis found that LGG cases with higher FABP5 expression enriched gene sets including ECM receptor interaction, cytokine receptor interaction and JAK/STAT signaling pathway ( Figure 7C). Moreover, LGG with CD302 overexpression presented GPR65 and PIK3CG up-regulation, while CHI3L1 and RAB36 were upregulated in tumors with FABP5 overexpression (Figures 7D,E). In addition, we found that GPR65, PIK3CG, and RAB36 have prognostic abilities in LGG, the elevated expression of which were significantly associated with worse survival of LGG patients  (Table S2 and Figure S3). As Figure S4 showed, there is no significant difference of the copy numbers between CD302 or FABP5 higher and lower expression groups, respectively, indicating the prognostic significance of CD302 and FABP5 is not caused by genomic copy number changes.

Independent Prognostic and Clinical Significance of CD302 and FABP5
To further investigate the relationship between CD302/FABP5 and clinical factors, we analyzed the expression differences of CD302/FABP5 between LGG subgroups with distinct clinical features, the results showed that LGG patients suffered from histologic grade 3 and progressive disease had significant higher expression of CD302/FABP5, respectively ( Figure S5). In addition, as shown in  Table 5).
Furthermore, we also found that the prognostic abilities of CD302 and FABP5 were independent of the critical clinical features of LGG patients, including histologic grade, therapy and primary therapy outcome (Figures 8, 9, Figures S6, S7). In detail, patients with CD302/FABP5 overexpression exhibited worse survival in both histologic grade 2 and 3 (Figure 8), both stable and progressive disease (Figure 9), and both radiotherapy and TMZ (temozolomide) therapy (Figures S6, S7), while no significant prognostic significance of CD302/FABP5 observed in patients with complete and partial response.

DISCUSSION
Gliomas are graded as I to IV according to the histology and clinical criteria. Grade II and III glioma are designated as low-grade glioma (LGG) (1)(2)(3)(4). Although LGG accounts for a minority of gliomas, it is the major cause of mortality for young adults (14). Although the survival outcomes for patients diagnosed with LGG are better than those for highgrade gliomas, LGG almost universally advances to high-grade glioma (5,8). Surgical resection is the major treatment for LGG. However, even under gross total resection (GTR), the survival rates of LGG patients are still low, having the risk of tumor progression (9). Some low-risk patients exhibit tumor progression-free without intervention, while others with highrisk suffer from the progressive disease, for which intervention treatment may be given after being diagnosed (6). As the patients suffering from LGG have distinct clinical performances, it is necessary to classify patients into subgroups with different risks to guide following treatments.
In this study, we developed a web server OSlgg, by which users could evaluate the prognostic value of genes of interest even for users with limited bioinformatics skills. To determine the reliability of OSlgg, we have verified the prognostic roles of 86 previously reported LGG prognostic biomarkers including IDH1, BIRC5, CDKN1B, PCNA, and MKI67. Furthermore, we have identified two novel potential prognostic biomarkers for LGG patients, including CD302 and FABP5. As C-type lectin receptor, CD302 has roles in cell immune and migration (35,36), and acts as a prognostic biomarker in myeloma (37), is also a potential therapeutic target for acute myeloid leukemia (38). In addition, CD302 had been identified as a biomarker to categorize the metastases of neuroendocrine tumors (NET) (39), and it is reported to be overexpressed in high grade NET (40). Fatty acid-binding protein 5 (FABP5) is involved in fatty acid transport, and acts as a prognostic biomarker in cervical cancer, triple-negative breast cancer and clear cell renal cell carcinoma (41)(42)(43). In addition, FABP5 was found to be expressed in 9 of 23 gliomas with moderate to strong cytoplasmic staining in Human Protein Atlas (HPA) database, and was reported to be expressed in grade II (19/30) and III (22/31) astrocytoma (a histologic subtype of glioma) (44). The prognostic abilities of CD302 and FABP5 have not been reported in LGG yet. In our server, the cox regression analysis reveals that CD302 and FABP5 are significantly correlated with survival outcomes of LGG patients, patients with lower expression of CD302 and FABP5 have improved outcomes compared to patients with higher expression of these genes, and we found that the elevated CD302/FABP5 expression was significantly associated with higher histologic grade and worse therapeutic outcome, in the meanwhile, we found that CD302 and FABP5 were independent prognostic indicators of LGG.
The limitation of OSlgg is that currently only 720 LGG cases are available in our server. Once new datasets with profiling and clinical follow-up data become available, we will update OSlgg to expand the dataset and enhance the performance.
In summary, we developed a prognosis analysis web server OSlgg, which provides a platform for researchers and clinicians to evaluate the prognostic values of genes of interest, and may offer opportunities to facilitate the development of novel targeted strategies for LGG.

AUTHOR CONTRIBUTIONS
YA, QW, and XG developed the server, performed the evaluation of novel prognostic biomarkers, and drafted the paper. LZ, FS, GZ, and HD performed the validation of previous reported biomarkers. HL, YL, and YP collected LGG datasets. WZ, SJ, and YW contributed to data analysis and paper revision. All authors approved the final manuscript.