A Risk Classification System With Five-Gene for Survival Prediction of Glioblastoma Patients

Objective: Glioblastoma (GBM) is the most common and fatal primary brain tumor in adults. It is necessary to identify novel and effective biomarkers or risk signatures for GBM patients. Methods: Differentially expressed genes (DEGs) between GBM and low-grade glioma (LGG) in TCGA samples were screened out and weight correlation network analysis (WGCNA) was performed to confirm WHO grade-related genes. Five genes were selected via multivariate Cox proportional hazards regression analysis and were used to construct a risk signature. A nomogram composed of the risk signature and clinical characters (age, radiotherapy, and chemotherapy experience) was established to predict 1, 3, 5-year survival rate for GBM patients. Results: One hundred ninety-four DEGs in blue gene module were found to be positively related to WHO grade via WGCNA. Five genes (DES, RANBP17, CLEC5A, HOXC11, POSTN) were selected to construct a risk signature for GBM via R language. This risk signature was identified to independently predict the outcome of GBM patients, as well as stratified by IDH1 status, MGMT promoter status, and radio-chemotherapy. The nomogram was established which combined the risk signature with clinical factors. The results of c-index, ROC curve and calibration plot revealed the nomogram showing a good accuracy for predicting 1, 3, or 5-year survival of GBM patients. Conclusion: The risk signature with five genes could serve as an independent factor for predicting the prognosis of patients with GBM. Moreover, the nomogram with the risk signature and clinical traits proved to perform better for predicting 1, 3, 5-year survival rate.


INTRODUCTION
Glioblastoma (GBM) is the most common and aggressive type of primary brain tumor in adult. Despite comprehensive regimens including maximum surgical resection, radiation therapy and chemotherapy, the prognosis of GBM is notoriously poor, with a median survival of 14 months and the 5-year survival rate remaining at ∼5% (1). While intervention of these multimodal treatments cannot eradicate this devastating disease, therapeutic resistance and GBM recurrence were inevitable. Although temozolomide (TMZ) has been proven to prolong the survival of GBM patients as a first-line chemotherapeutic agent, recent studies show that an amount of patients with GBM develop resistance to TMZ during treatment (2), and the recurrence rate of GBM was up to 90% (3). These awful therapeutic outcomes were mainly attributed to glioma stem cells (GSCs) and heterogeneity in GBM (4,5). Likewise, several new drugs, such as monoclonal antibody targeting epidermal growth factor receptor variant III (EGFRvIII), have been proven to show therapeutic efficiency in some cancers, but not in glioma (6). Since only 30% of GBM cases contain EGFRvIII, this means a majority of GBM patients fail to benefit from EGFRvIII-targeted therapy (7,8). Therefore, it becomes particularly important to search for novel molecular biomarkers that precisely predict the prognosis and to choose appropriate individualized treatment strategies for patients with GBM.
With the progress of genetics and molecular biology, an increasing number of molecular biomarkers were discovered in glioma, for instance, IDH mutation, MGMT methylation, TERT promoter mutation, EGFR and P53 (9). As is known to all, IDH1/2 mutation and MGMT promoter methylation are two important biomarkers in glioma. IDH mutation mainly exists in low grade glioma and secondary GBM, and associates with prognosis and GBM subtype (10). Moreover, IDH phenotype was also reported to be potent to form a glioma CpG island methylator phenotype (G-CIMP) and to be related to genomic methylation and gene mutation, such as P53 and TERT mutation (10). MGMT promoter methylation accounts for ∼40% of GBM samples and associates with favorable prognosis of patients receiving radiotherapy and chemotherapy (11). Interestingly, it has been observed that IDH-mutated gliomas frequently carry MGMT promoter methylation and are sensitive to temozolomide (12). These findings indicate that there are cross talks among these key molecular biomarkers and a single gene cannot completely represent the characters of the glioma, as well as GBM. This may partially explain that GBM patients fail to take more advantages from some targeted small molecule inhibitors application (13). Therefore, risk signatures with correlative biomarkers have been developed, which have shown better performance in GBM treatment and survival prediction (14,15). In this study, we developed a risk signature with five genes associated with survival of GBM patients. On this basis, a nomogram including the risk signature and clinical factors was established and it proved to be effective in predicting the clinical outcome of patients with GBM.

Data of Glioma Patients in the Study
Gene expression and survival data of glioma in TCGA were downloaded from GlioVis (http://gliovis.bioinfo.cnio.es/) (16). six hundred twenty samples from TCGA GBMLGG (RNA-seq) were selected for screening differentially expressed genes between GBM and low-grade glioma (LGG). Five hundred twenty-five samples from TCGA GBM (HG-UG133A) were used to construct a clinical survival prediction model and internal validation.

Identification of Differentially Expressed Genes Between GBM and LGG
Based on 470 lower grade glioma (LGG, World Health Organization [WHO] grade II and III) (17,18) and 150 GBM samples in TCGA GBMLGG dataset, R language (edgeR package, R version 3.51) was performed to identify differentially expressed genes (DEGs). Genes with |log 2 (fold-change)|> 1 and false discovery rate (FDR) < 0.05 were considered as DEGs for further analysis.

Weighted Correlation Network Analysis for Discovering Grade-Related Gene Modules
To select glioma grade-related genes from DEGs, we performed weight correlation network analysis (WGCNA) (19). The expression data of DEGs and clinical data (WHO grade, age, gender, IDH status, survival time, and status) were imported and analyzed by R package WGCNA. The genes were classified into several gene modules using an appropriate soft-thresholding power which was calculated by the pickSoftThreshold function (20). The minimum gene size in each module was set as 10. The module eigengenes were calculated and similar modules were clustered and merged according to the module dissection threshold. The correlations between gene modules and clinical traits were calculated and visualized through a heatmap. In this research, we chose the module which is positively related to WHO grade for further study.

Construction and Evaluation of Risk Signature With Selected Genes
Univariate Cox proportional hazards regression analysis was applied to assess the relationship between the expression of DEGs and the overall survival (OS) of patients with GBM in TCGA GBMLGG (RNA-seq) and HG-UG133A platform, respectively. Common genes with P < 0.05 were sorted out and presented as a Venn diagram by R. We then performed multivariate Cox proportional hazards models and filtered the common genes by step function in R. A risk score formula was designed according to the multivariate Cox regression analysis results (18), as follows: The patients were divided into low-risk and high-risk groups according to the median risk score value. KM survival analysis and time-dependent receiver operating characteristic (ROC) curve analysis were used to evaluate the prognostic value.

Bioinformatics Analysis
DEGs between low-risk and high-risk groups with FDR < 0.05 were filtered by R language (edgeR package) and used for Gene ontology (GO) and KEGG pathway analysis via DAVID website (https://david.ncifcrf.gov/) (21). GO terms (FDR <0.05) and KEGG pathways (P-value <0.05) were screened out and visualized via R package ggplot2. Gene set enrichment analysis (GSEA, http://software. broadinstitute.org/gsea/index.jsp) were used to confirm the GO terms and KEGG pathways in the low-risk and highrisk groups (22). Normalized enrichment score (NES) and FDR were calculated to verify the statistical difference for GSEA analysis.

Construction and Evaluation of Clinical Survival Prediction Model
By combining with clinical data, a nomogram of clinical survival prediction model was established by using the package of "rms" in R. Samples from TCGA HG-UG133A platform were divided into training cohort (accounting for 70%) and validation cohort (accounting for 30%) by randomly using R package "caret." The inclusion criteria for data extraction in the predictive model were patients diagnosed with WHO grade IV glioma (GBM).    The exclusion criteria included patients with incomplete data such as survival status and time, radiotherapy, and chemotherapy records. The training cohort was used to construct the nomogram of clinical survival prediction model, and the validation cohort was applied for internal validation. Concordance index (Cindex), ROC curve analysis and calibration curve were used to measure the performance of the nomogram, which were conducted by R.

Statistical Analysis
Risk scores of the samples in GBM subtype, IDH1 status, and MGMT promoter were presented as mean ± standard deviation and calculated by Graphpad Prism 8.0. Statistical differences between and among groups were examined by two tailed t-test and one-way analysis of variance (ANOVA) followed by Dunnett's post-test, respectively. Kaplan-Meier survival analysis and Cox proportion hazards regression model were conducted with R and R package. P < 0.05 was regarded as statistically significant.

Identification of Differentially Expressed Genes Between GBM and LGG
GBM is one of the devastating malignancies with poor prognosis. To better construct a survival prediction signature for patients with GBM, we searched for differentially expressed genes

WGCNA Analysis Revealed Blue Gene Module Was Related to Glioma Grade
To identify genes associated with clinical traits, we collected the RNA-seq data of DEGs and clinical information (WHO grade, age, gender, IDH status, survival time and status), and performed WGCNA analysis. Firstly, the samples were clustered and basic clinical traits were displayed ( Figure 1B). A soft threshold power was then calculated and 10 was selected as the power value to produce a hierarchical clustering tree ( Figure 1C). The module dissection threshold was set at 0.15 to merge similar modules and 4 modules were generated ( Figure 1D). The relationships between gene modules and clinical traits were confirmed by Pearson correlation and exhibited in a heatmap ( Figure 1E). Among the modules, blue gene module contained 194 genes and was the most positively related to WHO grade (r = 0.77, P < 0.0001). In addition, the blue gene module was also correlated with age (r = 0.56), IDH status (r = −0.87), survival time (r = −0.32), and survival status (r = 0.56). Therefore, genes in the blue module were used for further study.

Construction of the Risk Signature With Five-Gene in GBM Cohorts
To select prognosis related genes, we performed univariate Cox proportional hazards regression analysis to analyze the genes in blue module in TCGA GBMLGG (RNA-seq) and HG-UG133A platforms. Seventeen overlapped genes significantly correlated with overall survival (P < 0.05) between the two platforms were obtained (Figure 2A; Supplementary Table 2). Next, multivariable Cox regression analysis was implemented to filter and optimize the genes for constructing risk signature.
Five genes (DES, RANBP17, CLEC5A, HOXC11, POSTN) were screened out, among which RANBP17 was defined as protective with HR < 1, whereas others were defined as risky with HR > 1 ( Figure 2B). The risk-score formula was constructed as follows: risk score = (0.5536 × expression level of DES) + (−0.7340 × expression level of RANBP17) + (0.0995 × expression level of CLEC5A) + (0.2810 × expression level of HOXC11) + (0.0566 × expression level of POSTN). The risk score for each patient in TCGA HG-UG133A platform was calculated (mean ±SD, 1.5290 ± 0.4039; Quartiles were 1.3257 at 25%, 1.5936 at 50%, and 1.7968 at 75%, respectively) and all the 525 patients were divided into high-risk or low-risk groups based on the median cutoff value of the scores. As shown in Figure 2C, GBM patients with high risk scores indicated poor prognosis. The AUC for the five-gene signature risk score model at 1, 3, and 5-year survival were 0.671, 0.706, and 0.796, respectively ( Figure 2D).
The results indicated that the risk signature can better predict 1, 3, and 5-year survival for GBM patients. With the increase of risk score, the expression level of RANBP17 was down-regulated, and the expression level of the other 4 genes were up-regulated (Figures 2E,F). In the mean-time, the number of alive patients decreased ( Figure 2G).

Application of the Risk Signature in Stratified GBM Cohorts
To further explore its clinical application, we investigated the relationship between the risk score and glioma subtype, IDH1 and MGMT promoter status, respectively. The mesenchymal subtype inclined to have higher risk scores than neural and proneural subtype ( Figure 3A). The risk scores of patients with IDH1 mutant type were lower than IDH1 wild type ( Figure 3B). This result was in accordance with the conclusion that IDH1 mutant in glioma was related to better patient prognosis (23). For MGMT promoter, the risk scores decreased in patients with methylated status (P < 0.01, Figure 3C), though the average risk scores between the two groups didn't differ largely. The relationships between the risk score and patient prognosis stratified by IDH1, MGMT promoter status were also explored. There was no significant statistical difference between highrisk group and low-risk group in GBM patients with IDH1 mutant (Figure 3D). This result might be mainly due to the insufficient number of patients. In IDH1 wild-type cohort, patients with low risk scores exhibited longer survival time than high risk group (Figure 3E). In terms of MGMT promoter, no matter of methylated or unmethylated state, the high-risk group indicated dismal prognosis compared with the low-risk group (Figures 3F,G). Furthermore, in consideration of the importance of radio-and chemo-therapy in the treatment of glioma, we analyzed the association between the risk score and the response to standard radio-and chemo-therapy. The patients with low risk scores exhibited favorable prognosis in either radiotherapy or chemotherapy (Figures 3H,I). These results revealed that the risk signature could serve as an independent factor for predicting the prognosis of patients with GBM.  To further investigate the functional roles and KEGG pathways associated with the risk signature, we first screened out the DEGs between the high-risk group and low-risk group. Ninety-five genes with FDR < 0.05 were selected for GO and KEGG pathway analysis via DAVID (Figure 4A;  Supplementary Table 3). We discovered that the five-gene risk signature was functionally associated with extracellular matrix related terms, including extracellular exosome, extracellular matrix, and extracellular matrix organization (FDR < 0.05, Figure 4B). Correspondingly, several KEGG pathways (P < 0.05) such as ECM-receptor interaction and focal adhesion pathways were also obtained ( Figure 4C). To further confirm these results, the samples were divided into high-risk and low-risk groups according to the median of risk scores, and GSEA were applied. Similar GO terms and KEGG pathways were observed via GSEA analysis (Figures 4D,E). Collectively, these results revealed that the five-gene risk signature was correlated to extracellular matrix and cell adhesion functions, which play vital roles in glioma invasion and progression (24,25).

Construction of a Clinical Survival Prediction Model via the Risk Signature Combined With Clinicopathologic Features
Since the risk signature had a better performance in predicting the prognosis of GBM patients, we explored its clinical significance combining with clinical characters (age, radiotherapy, and chemotherapy experience). Firstly, the samples in the TCGA HG-UG133A platform were divided into training cohort (364 cases) and validation cohort (155 cases) randomly (Table 1). Then, multivariable Cox regression analysis was performed to assess the selected variable's contribution in predicting prognosis of GBM patients. The results indicated that the factors, such as risk score, age, acceptance of radiotherapy and chemotherapy, were correlated with patients' survival significantly both in training cohort and validation cohort ( Table 2). A clinical survival prediction model was constructed based on the data in training cohort and presented in a nomogram for predicting 1, 3, 5-year survival ( Figure 5A). C-index, ROC curve and calibration plot were used to evaluate the efficiency of the clinical predictive model. The C-indexes in training cohort and validation cohort were 0.729 and 0.708, respectively. The area under the curves (AUC) of the nomogram for 1, 3, 5-year-survival were 0.771, 0.808, and 0.838 in validation cohort, respectively ( Figure 5B). In the training set, the area under the curves (AUC) for 1, 3, 5-year-survival were 0.796, 0.79, and 0.851, respectively (Supplementary Figure 1A). The calibration plot for the probability of survival at 1, 3, or 5-years showed an optimal agreement between the prediction and observation, both in the validation cohort (Figures 5C-E) and training cohort (Supplementary Figures 1B-D). These results above revealed that the nomogram demonstrated a good accuracy for predicting 1, 3, or 5-year survival of GBM patients.

DISCUSSION
So far, GBM is still a lethal disease without efficient therapeutic regimens. The failure to develop new treatments ascribes to a lack of validation of novel molecular targets, which are often performed in animal models and directly translated to human trials (26). Thus, exploration and validation novel molecular targets are not only necessary, but also very urgent. In the present study, we identified DEGs between GBM and LGG in TCGA data, and confirmed 17 genes significantly correlated with prognosis. Finally, five genes (DES, RANBP17, CLEC5A, HOXC11, POSTN) were selected to construct a risk signature for GBM. Among the five genes, POSTN is an ECM protein and is involved in various cellular processes, including epithelialmesenchymal transition (EMT) and cell migration (27). POSTN is highly expressed in glioma tissues and has been considered as a biomarker of glioma malignancy and recurrence (28,29). It has also been reported that POSTN recruits M2 tumor-associated macrophages and promotes glioma stem cells (GSCs) growth (30). CLEC5A is a spleen tyrosine kinase-coupled receptor, which is abundantly expressed in monocytes, macrophages and neutrophils, and critical for inflammation response (31,32). A recent study has shown that CLEC5A is upregulated in GBM significantly and is associated with poor prognosis (33). Furthermore, downregulation of CLEC5A can inhibit the capabilities of proliferation, migration, and invasion, and promotes apoptosis and G1 arrest in GBM cell lines (33). These results are consistent with our findings that the expression level of CLEC5A increased with the ascent of risk scores and CLEC5A was a risk factor in GBM. Although few have been reported about the other three genes in glioma, they have vital functions and might serve as potential targets for GBM. For instance, DES encodes the intermediate filament protein desmin, which is expressed in cardiac, skeletal, and smooth muscle cells, and its mutations can cause isolated cardiomyopathies and cardiac conduction diseases (34,35). A recent study demonstrated that desmin loss is observed in 92% malignant mesothelioma samples, 76% malignant effusions, 29% benign mesothelial hyperplasia tissues, but not in the reactive effusions (36). Thus, desmin may serve as a useful biomarker in the discrimination between reactive mesothelial proliferation and malignant mesothelioma. As a RanGTP-binding protein, RANBP17 belongs to the importin beta family and is preferentially expressed in the testis (37,38). RANBP17 is upregulated in dilated cardiomyopathy and ischemic cardiomyopathy samples, and may regulate the transport of different cargos in specific cardiomyopathies through enhancing the transcriptional activation of the EA2 transcription factors E12 and E47 (39). HOXC11 belongs to homeobox superfamily that are responsible for encoding transcription factors regulating development (40). HOXC6 and HOXC11 have been shown to induce differentiation of GOTO neuroblastoma cells into Schwannian cells via transcription activation of S100β (41). Moreover, HOXC11 is found to be closely correlated with the survival of patients with renal cell cancer, cervical cancer or breast cancer, and serves as a therapeutic target (40,42). This risk signature comprised of the five genes was identified to significantly correlate with the survival of GBM patients, as well as stratified by IDH1 status, MGMT promoter status, and radiochemotherapy. In addition, GO and KEGG pathway analysis were applied via DAVID and GSEA, and elucidated that the five-gene risk signature was mainly related to extracellular matrix and cell adhesion function. EMC organization and cell adhesion are indispensable biological processes in tumor development and progression (43,44), indicating the essential value of our signature. Nomograms have been applied extensively and exhibit favorable effects on predicting clinical risk signatures and outcomes in some cancers (45,46). For better clinical application, we combined the risk signature with clinical factors (age, radiotherapy, and chemotherapy experience) and established a nomogram, which was validated to have better performance for predicting the outcomes of patients with GBM. The nomogram contained four items, and predicted the 1, 3, 5-year survival rate based on the sum of the score in each item. This clinical prediction model aimed at precisely predicting the prognosis of GBM patients and corresponded to the idea of individual treatment. However, there were some deficiencies in this study. Firstly, the sample size was limited. Five hundred nineteen samples were incorporated, and 364 cases were used for constructing model. The second limitation was that the samples were downloaded from TCGA, and it didn't contain information about extent of tumor resection, which is a key factor closely related to survival time in patients with GBM (47). A collection of detailed clinical records and further validation should be carried out in future study. Despite the above shortcomings, this study still has its advantages and innovations. Firstly, we performed an accurate and widely used method, WGCNA (48), and confirmed genes associated with glioma grade, which is an clinical indicator directly associated with the prognosis of glioma patients. Secondly, the risk signature with five genes was proven to be an independent prognostic biomarker in GBM via Kaplan-Meier survival analysis and multivariable Cox regression analysis. In addition, on the basis of the risk signature and other clinical factors (age, radiotherapy, and chemotherapy experience), the nomogram can predict the 1, 3, 5-year survival rate precisely, thus providing evidences of treatment for GBM patients. Altogether, our study indicated the potential value of our model for predicting the survival of GBM patients.

DATA AVAILABILITY
Publicly available datasets were analyzed in this study. This data can be found here: http://gliovis.bioinfo.cnio.es/.

AUTHOR CONTRIBUTIONS
YW: conceptualization and writing-original draft. YW and GG: methodology. YW and XL: project administration. MZ: supervision. WZ and MZ: writing-review and editing.