Identification of a Tumor Microenvironment-Related Eight-Gene Signature for Predicting Prognosis in Lower-Grade Gliomas

Lower-grade gliomas (LrGG), characterized by invasiveness and heterogeneity, remain incurable with current therapies. Clinicopathological features provide insufficient information to guide optimal individual treatment and cannot predict prognosis completely. Recently, an increasing amount of studies indicate that the tumor microenvironment plays a pivotal role in tumor malignancy and treatment responses. However, studies focusing on the tumor microenvironment (TME) of LrGG are still limited. In this study, taking advantage of the currently popular computational methods for estimating the infiltration of tumor-associated normal cells in tumor samples and using weighted gene co-expression network analysis, we screened the co-expressed gene modules associated with the TME and further identified the prognostic hub genes in these modules. Furthermore, eight prognostic hub genes (ARHGDIB, CLIC1, OAS3, PDIA4, PARP9, STAT1, TAP2, and TAGLN2) were selected to construct a prognostic risk score model using the least absolute shrinkage and selection operator method. Univariate and multivariate Cox regression analysis demonstrated that the risk score was an independent prognostic factor for LrGG. Moreover, time-dependent ROC curves indicated that our model had favorable efficiency in predicting both short- and long-term prognosis in LrGG patients, and the stratified survival analysis demonstrated that our model had prognostic value for different subgroups of LrGG patients. Additionally, our model had potential value for predicting the sensitivity of LrGG patients to radio- and chemotherapy. Besides, differential expression analysis showed that the eight genes were aberrantly expressed in LrGG compared to normal brain tissue. Correlation analysis revealed that the expression of the eight genes was significantly associated with the infiltration levels of six types of immune cells in LrGG. In summary, the TME-related eight-gene signature was significantly associated with the prognosis of LrGG patients. They might act as potential prognostic biomarkers for LrGG patients, offer better stratification for future clinical trials, and be candidate targets for immunotherapy, thus deserving further investigation.


InTRODUcTIOn
Lower-grade gliomas (LrGG) are infiltrative and heterogeneous brain neoplasms that include World Health Organization (WHO) grade II and III diffuse gliomas (Patel et al., 2017). Because of their highly invasive characteristics, complete neurosurgical resection is unachievable for most patients. The residual tumor results in recurrence and malignant progression at variable intervals; some of these tumors recur and even progress to glioblastoma (WHO grade IV) within months, whereas others remain indolent for years (van den Bent et al., 2011;Zhou et al., 2017). The current treatment modalities for LrGG typically involve neurosurgical resection, observation, chemotherapy, and/or radiotherapy. However, none of these treatment options are curative for this disease (Kumthekar et al., 2015). The main purpose of treatment is to delay tumor progression and improve quality of life (Duffau and Taillandier, 2015). However, due to considerable heterogeneity between LrGG, an optimal treatment strategy against this disease at the individual level still remains a challenge (Duffau, 2018). From this perspective, it is necessary to develop reliable approaches for identifying subsets of patients at high risk of deterioration and to find novel molecular targets for the development of effective therapeutic strategies.
Recently, the tumor microenvironment (TME), including tumor-associated normal epithelial and stromal cells, immune cells, and vascular cells, has been observed to participate in cancer biology and has received increasing amounts of attention. Increasing studies show that a highly heterogeneous TME plays a substantial role in tumor malignancy and treatment responses and critically impacts immunotherapeutic strategies (Platten et al., 2014;Ma et al., 2018). For example, epithelial and stromal cells participate in tumor growth, malignant progression, and treatment resistance (Kalluri and Zeisberg, 2006;Straussman et al., 2012). Infiltrating immune cells, including macrophages, dendritic cells, mast cells, natural killer cells, and lymphocytes, also exhibit tumor-promoting features in a context-dependent manner (Fridman et al., 2012;Yoshihara et al., 2013). Thus, strengthening the knowledge of TME and highlighting the mechanism underlying its effects will ultimately contribute to the diagnosis and treatment of gliomas. Considering the importance of TME in cancer treatment, many computational methods have been developed to estimate the infiltration of tumor-associated normal cells in tumor samples using genomic approaches such as ESTIMATE (Estimation of STromal and Immune cells in MAlignant Tumours using Expression data), TIMER (Tumor Immune Estimation Resource), and CIBERSORT (Cell-type Identification By Estimating Relative Subsets Of RNA Transcripts) (Yoshihara et al., 2013;Newman et al., 2015;Li T. et al., 2017). These methods can facilitate a comprehensive understanding of tumor biology and the development of robust predictive models in long-term tumor management. In fact, increasing studies have successfully applied these algorithms to various cancers including prostate cancer, breast cancer, and colon cancer. (Ali et al., 2016;Alonso et al., 2017). However, application of these algorithms to explore the cellular networks underlying the glioma TME are still limited, especially in LrGG.
In the present study, through construction of gene co-expression networks using a weight gene co-expression network analysis (WGCNA) and the identification of TME-related gene co-expression modules, we aimed to identify the prognostic genes involved in the TME of LrGG and to set up an effective prognostic model for LrGG. Our findings might provide novel insights into the TME of LrGG and provide important clues for predicting the prognosis of LrGG and for the selection of individualized therapies for LrGG patients.

Data Sets
RNA sequencing data of The Cancer Genome Atlas (TCGA) LrGG (530 samples, 516 patients) containing RSEM normalized data and log2(RSEM+1) transformed data were obtained from the Broad GDAC firehose (http://gdac.broadinstitute.org/) and the University of California, Santa Cruz, Xena browser (UCSC Xena, https:// xenabrowser.net/), respectively. The clinical data of LrGG (516 patients, five of which had no survival data) were also downloaded from UCSC Xena. The stromal score (positively correlating with the presence of stroma in tumor tissue) and immune score (positively correlating with the level of immune cells infiltrations in tumor tissue) of TCGA LrGG dataset generated using the ESTIMATE algorithm were downloaded from https://bioinformatics. mdanderson.org/estimate/. Another cohort of LrGG patients (181 patients, nine of which had no survival data) was obtained from the Chinese Glioma Genome Atlas (CGGA, http://www.cgga. org.cn/), and their mRNA sequencing data (FPKM) and clinical data were downloaded. In the present study, the TCGA cohort was used as training data and the CGGA cohort was used for validation. The information of LrGG patients including survival information from both TCGA and CGGA datasets is shown in Table 1. Weighted Gene co-Expression network analysis Weighted gene co-expression network analysis (WGCNA) was performed using the "WGCNA" package (Langfelder and Horvath, 2008) in R language, version 3.5.3 (https://www.r-project.org). According to the instructions for WGCNA, the RSEM data were used for subsequent analysis. Prior to network construction, we checked the RNA sequencing data and removed the genes and samples with too many missing values as well as obvious outlier samples. Then, based on the criterion of approximate scale-free topology, we chose the soft thresholding power β to construct unsigned co-expression networks. Co-expression similarity was calculated, defined as S ij , which equals the absolute Pearson's correlation coefficient between gene i and j. Then, the similarity matrix was transformed to the weighted adjacency matrix, defined as Aij = S ij β . The topological overlap matrix (TOM) was derived from the weighted adjacency matrix. Based on the TOM, a hierarchical clustering tree (dendrogram) of genes was produced using a hierarchical clustering method. Finally, we used the standard method, Dynamic Tree Cut, to identify co-expression gene modules with a deep split level of two and a minimum module size of 30. A height cut of 0.25, corresponding to a correlation of 0.75, was used to merge similar modules. We calculated the Gene Significance (GS) and Module Membership (MM) to evaluate the gene relationship between ESTIMATE scores (immune score and stromal score) and each module. Modules significantly associated with the two scores (p value < 0.05) were screened out, and the genes in these modules were selected for further analysis.

Functional Enrichment analysis
Functional enrichment analyses, including gene ontology (GO) analysis comprising cellular component (CC), molecular function (MF), and biological process (BP), and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis, were performed using the clusterProfiler package in R language (Yu et al., 2012). The reference gene set for analyses is whole transcriptome as RefSeq transcript set. The P value adjusted by Benjamini and Hochberg method.

hub Gene Identification and Module Visualization
Hub genes were identified using the R package WGCNA. The thresholds for identifying hub genes in each target module were |MM| > 0.8 and |GS| > 0.2. Finally, we obtained 54, 15, and 8 hub genes in the green, salmon, and magenta module, respectively. Visualization of the targeted modules was performed using Cytoscape (Shannon et al., 2003), an open source software for visualizing molecular interaction networks.

Protein-Protein Interaction analysis
Protein-protein interaction (PPI) analysis of survival-related genes was performed using an online software, The Search Tool for the Retrieval of Interacting Genes (STRING, https:// string-db.org/) (Szklarczyk et al., 2019), and the PPI network was visualized using Cytoscape.

Survival analysis
Kaplan-Meier survival analysis and the Cox proportional hazard model were performed using R language packages (survival, survminer, and ggplot2).
construction of Prognostic Risk Score Model via the Least absolute Shrinkage and Selection Operator Method Least absolute shrinkage and selection operator (LASSO) is a penalization method to shrink and select variates for regression (Tibshirani, 1997). It has been widely used in the genetic data analyses of various cancers, including TCGA data (Zhao et al., 2015). In this article, univariate Cox regression analysis was first performed to identify 74 prognostic hub genes of three TMErelated gene co-expression modules in TCGA cohort. As one (LILRB4) of these 74 genes was missing in the CGGA LrGG cohort, the remaining 73 genes were selected for LASSO Cox regression analysis in TCGA LrGG cohort, and finally, 8 prognostic genes were selected to the construct a prognostic risk model. The formula for calculating the risk score was as follows: risk score = (-0.1880 * expression level of ARHGDIB) + (0.3131 * expression level of CLIC1) + (0.0546 * expression level of OAS3) + (0.2527 * expression level of PARP9) + (0.1052 * expression level of STAT1) + (0.1776 * expression level of PDIA4) + (0.2479 * expression level of TAGLN2) + (-0.2525 * expression level of TAP2).

Time Dependent Receiver Operating characteristic curve analysis
The Receiver Operating Characteristic (ROC) curve analysis is a widely used method for assessing the sensitivity and specificity of a continuous diagnostic marker for a binary disease variable (Kamarudin et al., 2017). Time-dependent ROC curve analysis was used to evaluate the performance, including the predictive accuracy and sensitivity, of our prognostic model within 1 year, 3 years, and 5 years of OS using the R package, survivalROC (Heagerty et al., 2000). The optimal cutoff of the risk score was calculated using the Youden index. In the TCGA cohort, the optimal cutoffs of the risk score for 1 year, 3 years, and 5 years of OS were 7.61036429, 7.37154483, and 7.37154483, respectively. In the CGGA cohort, the optimal cutoffs for 1 year, 3 years, and 5 years of OS were 1.790661239, 1.790661239, and 1.736271505, respectively. In both TCGA and CGGA cohorts, the patients were divided into highand low-risk groups based on the optimal cutoff for 5 years of OS, respectively. In TCGA LrGG cohort, 121 patients were classified in the high-risk group and 390 patients belonged to the low-risk group. In the CCGA LrGG cohort, there were 64 patients in the high-risk group and 108 patients in the low-risk group.

Differential Expression analysis
Differential expression analysis of eight genes within our model was performed using the online database Gene Expression Profiling Interactive Analysis (GEPIA). GEPIA  is an interactive web platform for gene expression analysis, which integrates the cancer samples from the TCGA database and normal samples from the GTEx database.

analysis of Immune Infiltration
TIMER is a comprehensive resource for the analysis of immune infiltrates across various cancers (https://cistrome.shinyapps.io/ timer/) (Li T. et al., 2017). An estimation of immune cells, including B cells, CD4+ T cells, CD8+ T cells, neutrophils, macrophages, and dendritic cells, in TCGA LrGG was performed using TIMER. We analyzed the correlation between the risk model system (risk score and eight genes within the model) and infiltrating levels of six immune cells using an R package (psych). Spearman's correlation was calculated by the corr.test function in psych package, and scatter plots were generated using the ggplot2 package in the R language. We used the following standard to describe the strength of correlation for the absolute value of r: absolute values between 0 and 0.3 indicate a weak correlation; absolute values between 0.3 and 0.7 indicate a moderate correlation; absolute values between 0.7 and 1.0 indicate a strong correlation (Akoglu, 2018).

Immune Scores and Stromal Scores are Significantly associated With LrGG Subtypes and Prognosis
The ESTIMATE algorithm is based on a single sample Gene Set Enrichment Analysis, and its predictive ability has been well validated (Yoshihara et al., 2013). The stromal score and immune score positively reflect the presence of stroma cells and immune cells, respectively, in tumor tissues. First, we plotted the distribution of immune scores and stromal scores in the TCGA LrGG cohort. The results showed that both immune scores and stromal scores in grade III glioma were significantly higher than those in grade II glioma (Figures 1A, B). To reveal the molecular expression patterns of immune and stromal scores, we compared both immune scores and stromal scores in IDH (isocitrate dehydrogenase) subtypes [including IDH wildtype (IDH-Wt) and IDH Mutant (IDH-Mut)] as well as four transcriptome subtypes (including mesenchymal, classical, neural, and preneural) of LrGG. The results showed that both immune scores and stromal scores in IDH-Mut subtype samples were significantly lower than those in IDH-Wt subtype samples (Figures 1C, D). The immune scores of four transcriptome subtypes were significantly different and those of mesenchymal subtype patients were the highest, followed by the classical subtype, neural subtype, and proneural subtype ( Figure 1E). Separately, the stromal scores of transcriptome subtypes were similar to the results of immune scores, except there was no significant difference between the stromal scores of neural and proneural subtype samples ( Figure  1F). To determine the potentially clinical value of the immune score and the stromal score for patients with LrGG, the Kaplan-Meier survival analysis was performed. Based on the median of immune score and stromal scores, LrGG were divided into two groups, the high group (score > median) and low group (score ≤ median). The result showed that patients in the low immune score group had a significantly longer overall survival (OS) compared to patients in the high immune score group ( Figure 1G). Similarly, cases with a low stromal score had better prognosis than those with a high stromal score ( Figure 1H). In summary, these results indicate that both immune scores and stromal scores correlate significantly with LrGG subtypes, and that high scores predict a relatively poor prognosis for LrGG patients.

construction of Weight co-Expression network Using WGcna
To provide system-level insights and high sensitivity to small fold changes in genes, WGCNA can be applied to high-throughput microarray or RNA-seq datasets and can describe the correlation between gene modules and external conditions (Pei et al., 2017). Prior to WGCNA, the RNA sequencing data of TCGA LrGG (530 samples) was constructed into a matrix with gene IDs as row names and sample barcodes as column names. Further, genes and samples with too many missing values were removed and the genes were then ranked by variance from large to small; the top 5,000 genes were selected for WGCNA. After one outlier was identified and removed (Supplementary Figure S1), we constructed the co-expression network based on the remaining 529 samples with 5,000 genes via the WGCNA package. As shown in Figure 2A, β = 4, which is the lowest power for which the scale-free topology fit index reached 0.90, was selected to calculate the adjacencies. Based on TOM, a hierarchical clustering tree (dendrogram) of genes was produced using the hierarchical clustering function. After merging similar modules, we obtained 13 gene modules (Figures 2B-D) with size ranging from 37 to 1,426 genes. We assigned each co-expression module an arbitrary color for reference: black, blue, brown, green, greenyellow, magenta, pink, purple, red, salmon, tan, turquoise, and yellowturquoise. These modules contained 262,791,647,358,115,206,220,133,348,37,70,1,426, and 371 genes, respectively. As a single group, the non-co-expressed group was designated as 'gray' based on the WGCNA developer's rationale, and 16 genes in this group were removed.

Identifying the TME-associated Modules and Prognostic hub Genes of LrGG
To determine whether any module was associated with immune scores and stromal scores, a principal components analysis was performed to generate the module eigengene (ME). MEs provide single-column summary measures of the overall gene expression level of each co-expression module. Then we calculated the correlations between the scores and each co-expression module. The results showed that 11 modules including greenyellow, tan, purple, yellow, pink, salmon, green, magenta, brown, red, and turquoise were correlated with immune scores and stromal scores ( Figure 2E, p value < 0.05). Furthermore, GO enrichment analysis was performed to identify TME-related modules from the 11 abovementioned modules. The results revealed that the three co-expression modules, green, salmon, and magenta, were significantly associated with the TME (Supplementary Figure S2, Supplementary Table S1). Consistently, these three modules belong to the same branch of the clustering dendrogram of module eigengenes ( Figure 2B). However, the remaining modules were rarely correlated with TME (Supplementary Table S2). In a co-expression network, hub genes are defined as genes inside co-expression modules that have high connectivity (Langfelder and Horvath, 2008). Hub genes are highly co-expressed with other genes and play key roles in critical pathways FIGURE 1 | Immune score and stromal score correlate with LrGG subtype and outcome. (a-B) Both immune score and stromal score positively associated with the WHO grade in LrGG. (c-D) Both Immune score and stromal score associated with the IDH status in LrGG. (E-F) Both immune score and stromal score corelated with the transcriptome subtypes of LrGG. (G-h) Both immune score and stromal score associated with the prognosis of LrGG. *** P 0.001, ns P 0.05, (a-F) t-test, Frontiers in Genetics | www.frontiersin.org November 2019 | Volume 10 | Article 1143 (Uddin and Singh, 2017). To identify the hub genes of the three target modules, intramodular connectivity and module membership for all genes in each module were calculated using the WGCNA package. Based on the preset threshold (|MM| > 0.8 and |GS| > 0.2), we identified 54, 15, and 8 hub genes in the green, salmon, and magenta module, respectively. The co-expression network of the hub genes in each module was visualized using Cytoscape based on the weight, which indicated that these hub genes were highly connected ( Figure   3A). To identify the prognostic genes of LrGG patients from the 77 hub genes, univariate Cox regression analysis was performed and the result showed that 74 genes were significantly correlated with the OS of LrGG patients in TCGA cohort (p value < 0.05,  The eigengene adjacency heatmap of 13 co-expression models. The eigengene adjacency A IJ = (1 + cor (E I , E J ))/2. The table is color-coded by adjacency according to the color legend, which decreased in size from red to blue. (E) Correlation between the gene modules and immune scores as well as stromal score. Rows correspond to module eigengenes, columns correspond to traits. Each cell contains the corresponding correlation and p-value. The table is color-coded by correlation according to the color legend, which decreased in size from red to green.

Functional Enrichment and PPI analysis of Prognostic Genes
To further understand the underlying mechanism of these prognostic genes, functional enrichment analysis was performed. GO enrichment analysis showed the 70 prognostic genes significantly associated with immune-related terms. For GO analysis, a total of 366 terms of biological process (BP), 81 terms of cellular component (CC), and 21 terms of molecular function (MF) were identified to be statistically significant (adjusted p value < 0.05, Benjamini and Hochberg method, Supplementary Table S3). The top terms of BP included neutrophil degranulation, activation, neutrophil mediated immunity, and antigen processing and presentation ( Figure 4A). MF indicated enrichments predominantly involved in MHC protein binding, cell adhesion molecule binding, and antigen binding ( Figure  4B). As for CC, these genes showed significant enrichment in the MHC protein complex, vacuolar lumen, and collagencontaining extracellular matrix ( Figure 4C). Additionally, KEGG pathway analysis revealed that these genes were significantly enriched in immune related pathways such as antigen processing and presentation, Th1 and Th2 cell differentiation, Th17 cell differentiation, and Leukocyte transendothelial migration ( Figure 4D, Supplementary Table S3). To further understand the interaction among these genes, PPI networks (which contained 70 nodes and 366 edges) were constructed based on the STRING database ( Figure 4E). This network contained 70 nodes and 366 edges and was highly connected.

construction and Validation of Prognostic Risk Score Model for LrGG
Through the LASSO regression method, eight genes, including ARHGDIB, CLIC1, OAS3, PARP9, PDIA4, STAT1, TAGLN2, and TAP2, were selected to construct the prognostic risk score model (Figures 5A, B). The formula for calculating risk score was as follows: risk score = (-0.1880 * expression level of ARHGDIB) +  Figure 6A). Next, based on the optimal cutoff value (7.37154483) for the 5-year OS, our model presented good sensitivities and specificities to predict the 1-year (sensitivity = 0.906, specificity = 0.806), 3-year (sensitivity = 0.668, specificity = 0.889), and 5-year (sensitivity = 0.486, specificity = 0.915) OS, and we divided the patients in the TCGA LGG cohort into the high-risk group and low-risk group ( Figure 6B). The heatmap showed that these eight genes were remarkably overexpressed in the high-risk group ( Figure  6C). Additionally, differential expression analysis via the GEPIA revealed that all eight genes were significantly upregulated in LrGG samples compared with those in normal brain tissues (Supplementary Figure S3). To evaluate the prediction ability of risk score on OS and Relapse Free Survival (RFS) of LrGG patients, the Kaplan-Meier analysis was performed. The results showed that patients in the high-risk group had significantly shorter OS and RFS than those in the low-risk group based on the TCGA dataset (Figures 6D-E). Furthermore, these results were well validated in the CGGA LrGG cohort. In the CGGA cohort, the AUC of this model for 1-, 3-, and 5-year OS was 0.878, 0.909, and 0.892, respectively ( Figure 6F). Using the same formula and standard for grouping, our model also presented good sensitivities and specificities for predicting the 1-year (sensitivity = 0.991, specificity = 0.706), 3-year (sensitivity = 0.907, specificity = 0.820), and 5-year (sensitivity = 0.810, specificity = 0.891) OS, and the patients were divided into high-risk and low-risk group; the patients in the low-risk group showed a significantly better outcome than those in the high-risk group (Figures 6G-I) in the CGGA cohort. Taken together, these results suggest the potential favorable efficiency as well as general applicability of this eightgene model in predicting short-and long-term prognosis in patients with LrGG.

The Prognostic Model Is an Independent Predictor and a Valuable hierarchical Factor
To study the association between the risk groups classified based on our risk score and the clinicopathological factors (age, gender, histological grade) and molecular characteristics (IDH status and MGMT promoter methylation status) in LrGG, we evaluated the statistical difference of the latter parameters between high and low-risk groups using the chi-square test. The result demonstrated that these parameters, apart from gender, were significantly different in patients from high-and low-risk groups in both the TCGA and CGGA datasets ( Figure 7A, Table 3). To identify if our prognostic model was an independent prognostic factor in LrGG, both univariate and multivariate Cox analysis were performed. After adjusting with the abovementioned clinicopathological and molecular characteristics, our prognostic model was still a significantly predictive factor ( Table 4 and Table 5). Furthermore, stratified survival analysis was performed to evaluate the prognostic values of this prognostic model in different subgroups of LrGG. In the TCGA cohort, the results showed that this model stratified patients with grade II or grade III glioma well in the OS. When both grade and risk scores were considered, grade II glioma patients within the low-risk group showed the best outcomes, whereas grade III glioma patients belonging to the high-risk group presented the worst prognosis ( Figure 7B). Similarly, our model also stratified patients in different IDH subgroups, especially the IDH-Mut subgroup, remarkably well in the OS. When both IDH status and risk scores were considered, patients harboring IDH-Wt in the high-risk group had the worst outcome, whereas IDH-Mut patients in the low-risk group had the best prognosis ( Figure 7C). Furthermore, these results were well validated in the CGGA cohort (Figures 7D, E). Our model also stratified patients in different age subgroups (age >45 or age ≤45) notably well in both the TCGA and CGGA cohorts. When both age and risk scores were considered, high-risk patients with age > 45 had the worst OS, whereas low-risk patients with age ≤45 had the best OS ( Figures  7F, G). Collectively, these data suggest that our prognostic model is an independent prognostic factor as well as a valuable factor for stratification in LrGG.

The Prognostic Model Predicts the Sensitivity for Radio-and chemotherapy
Sine TME plays a substantial role in treatment responses and, as our signature was based on eight TME-related genes, we wondered whether this model could predict the sensitivity of LrGG for radio-and chemotherapy. Through Kaplan-Meier curve analysis, we found that LrGG patients who underwent radiotherapy belonging to the low-risk group had longer OS than those belonging to the high-risk group in the TCGA cohort ( Figure 8A). Consistently, we observed similar result in the CGGA cohort ( Figure 8B). For chemotherapy, we found our risk score significantly associated with the methylation status of the MGMT promoter, which can effectively predict the responsiveness of glioma to alkylating agents, the commonly used chemotherapeutic drugs for glioma (Esteller et al., 2000;Hegi et al., 2005;Song et al., 2015). The majority of patients (65.9%) with MGMT promoter unmethylation belonged to the high-risk group and the majority of patients (84.9%) with MGMT promoter methylation belonged to the low-risk group (Table 3). Furthermore, Kaplan-Meier curve analysis showed that the prognosis of patients who underwent chemotherapy in the low-risk group was better than that of patient underwent chemotherapy in the high-risk group in the CGGA LrGG cohort ( Figure 8C). Taken together, these results indicate that our prognostic model have the potential value for predicting the sensitivity of LrGG patient to radio-and chemotherapy.

The Prognostic Model correlates With Immune Infiltration in LrGG
Clinical research on immunotherapy has confirmed that the tumor-infiltrating lymphocytes within the tumor microenvironment have a predictive value for prognosis and treatment with immunotherapy in cancers (Spencer et al., 2016). Considering that our risk score was based on eight TME-related genes, we investigated whether our risk score was correlated with the infiltrating levels of six immune cells in the TCGA LrGG cohort, which were obtained from TIMER. The  Figure 9A). In addition, the immune and stromal scores of patients within the highrisk group were significantly higher than those of patients within the low-risk group (Supplementary Figure S4A). For immune cell infiltration, the infiltrating levels of six immune cells in high-risk patients were remarkably higher than those in low-risk patients (Supplementary Figure S4B). Next, we analyzed the correlation between the expression levels of eight TME-related genes and the infiltrating levels of six immune cells. The results showed that the expression of these eight genes showed significantly positive associations with immune cell infiltration (p < 0.05, Figure 9B). ARHGDIB expression presented a strong correlation with the infiltration levels of CD4+ T cells, neutrophils, macrophages, and dendritic cells (0.751 ≤ r ≤ 0.847), moderate correlation with the infiltrating levels of B cells (r = 0.605), and weak correlation with CD8+ T cell infiltration level (r = 0.173). CLIC1 expression exhibited

DIScUSSIOn
Glioma is a fatal tumor of the central nervous system, and none of the current available treatments are curative. In recent years, an increasing amount of studies have demonstrated that the tumor microenvironment plays a vital role in in tumor malignancy and responses to treatments including immunotherapy. Thus, a comprehensive understanding of the tumor microenvironment   in LrGG might help us find novel predictive biomarkers to guide individual treatment and might provide therapeutic targets to develop effective treatment strategies.
In this study, we used the weighted gene co-expression network analysis (WGCNA) to construct a weighted gene co-expression network in LrGG samples from the TCGA database. Due to several advantages such as reflecting the continuous nature of the underlying co-expression information, providing systems-level insights, and high sensitivity to low abundance and small fold-changes in genes without any information loss, WGCNA has been widely applied in various cancers (Pei et al., 2017;Maertens et al., 2018;Wan et al., 2018). Through principal components analysis, we initially screened out the co-expression modules that significantly correlated with the immune score and stromal score as calculated by the ESTIMATE algorithm. After performing GO enrichment analysis of these modules, we obtained three co-expression modules with a total of 601 genes involved in TME. In gene co-expression networks, hub genes are very important nodes with a maximal information exchange with other genes, and they are located centrally in the module (Langfelder and Horvath, 2008;Yang et al., 2014). Thus, intramodular hub genes were extracted with high gene significance and high intramodular connectivity from the three modules. Finally, we obtained 77 hub genes, of which 74 genes were associated with the OS of LrGG in TCGA database. Importantly, 70 genes were well validated in another independent LrGG cohort from the CGGA database, which further confirmed the reliability of these results. Indeed, some of these 70 genes were reported to be involved in glioma tumorigenesis or as being significant in predicting OS. For example, TAGLN2 (Transgelin-2), a member of the calponin family of actin-bundling proteins, has been reported to promote the development of glioma, and high TAGLN2 expression is associated with poor prognosis (Han et al., 2017). CLIC1 is the first member of the Chloride Intracellular Channel family; its expression has been correlated  with poor prognosis and it is known to modulate the cell cycle progression of glioblastoma stem cells Gritti et al., 2014). Thus, these results might provide novel insights into the TME of LrGG at the molecular level, and these prognostic genes might act as biomarkers and/or therapeutic targets for the diagnosis, outcome prediction, and treatment of LrGG in the future. The individual prognosis of LrGG patients varies greatly; however, both the histopathological features and recently established genetic biomarkers such as IDH, 1p/19q codeletion, ATRX, and TERT, of diffuse gliomas often fail to precisely predict a prognosis and completely explain this difference (Boots-Sprenger et al., 2013). Although IDH mutation strongly predicts good prognosis in glioma, most LrGG harbor IDH mutation and its predictive value in LrGG was not as good as that in GBM. On the other hand, adults with IDH-Wt LrGG do not have uniformly poor prognoses, nor is there a uniformly good outcome of LrGG with IDH mutation (Aibaidula et al., 2017;Liu et al., 2019). Thus, the LrGG need to be further stratified. More recently, some genetic prognostic models have been constructed and their predictive performances have been validated in glioma. For example, Zeng et al. (2018) identified and validated a threegene prognostic signature for LrGG by integrative Analysis of DNA Methylation and Gene Expression data. Yin et al. (2019) selected five genes from differentially expressed genes in glioblastoma to construct a prognostic model with potential in the prognosis prediction of GBM. Kang et al. (2019) developed a 5-CpG signature of miRNA methylation with prognostic values in non-G-CIMP GBM patients. It is thus obvious that the genes within the majority of constructed prognostic models are selected by differential expression analysis. However, a prognostic model based on TMErelated genes based on gene co-expression analysis was rarely reported for LrGG. Through differential expression analysis of known immune-related genes between IDH-WT and IDH-Mut LrGG, Qian et al. (2018) revealed that IDH was associated with the regulation of immune microenvironment in LrGG and reported an IDH-associated immune signature. However, focusing on IDH-associated immune-related genes might limit our understanding of TME in LrGG and the clinical value of their signature. In order to comprehensively understand the TME in LrGG, our analysis did not emphasize any known clinicopathological or molecular factors. In the present study, by combining the WGCNA and LASSO methods, we successfully developed a prognostic model with eight TMErelated genes for LrGG. Our prognostic model showed a favorable efficiency in predicting both short-and long-term prognoses for LrGG patients. Furthermore, stratified survival analysis demonstrated that this prognostic model still had a prognostic value for patients in different clinicopathological and molecular subgroups. Moreover, our risk model was an independent predictive factor for LrGG after adjusting for clinicopathological and molecular factors. In summary, our risk model has potential value for the prediction of prognoses in LrGG.
Despite many studies indicating that complete neurosurgical resection can effectively prolong the OS rate of LrGG, this is unachievable for most patients due to the invasive characteristics of LrGG, the location of the tumor, and other reasons. Patients with residual tumors might undergo adjuvant radiotherapy and chemotherapy to delay or control tumor progression. However, LrGG exhibits a differential response to radio-and chemotherapy, and some patients cannot even benefit from these treatments (Forst et al., 2014). Thus, identifying LrGG patients who are sensitive to these treatments is especially important. Through Kaplan-Meier curve analysis, we found that our model had the potential value of predicting LrGG patient sensitivity to both radiotherapy and chemotherapy. This might be related to the STAT1 signaling. Notably, we found that our prognostic model consists of three known or presumable targets (STAT1, OAS3 and TAP2) and one modulator (PARP9) of STAT1 signaling (Samuel, 2001;Zhang et al., 2015;Iwata et al., 2016;Narkwa et al., 2017). Additionally, these four genes were co-expressed in both TCGA and CGGA LrGG cohorts ( Figure  3A middle, Supplementary Figures S5A and B). Furthermore, an increasing amount of studies have indicated that STAT1 is associated with radio-and chemoresistance in multiple tumor entities (Weichselbaum et al., 2008;Khodarev et al., 2012). Indeed, STAT1 had been reported to be aberrantly expressed in glioblastoma and an overexpression of STAT1 predicted poor prognosis (Thota et al., 2014). Duarte et al. developed a signature of IFN/STAT1 signaling genes, which presented strong predictive value in the proneural subtype glioblastoma (Duarte et al., 2012). Similarly, our analysis also demonstrated that STAT1 was aberrantly expressed in LrGG and a high expression of STAT1 predicted poor prognosis of LrGG patients (Supplementary  Figures S5C, D). The time-dependent ROC analysis revealed that STAT1 alone presented favorable efficiency for predicting prognosis of LrGG in both TCGA and CGGA LrGG cohorts (Supplementary Figures S5E, F). However, comparing the AUC of our model and STAT1, we found our model had better predictive abilities than STATA1 alone.
In addition, the risk score and expression levels of all eight genes in our model exhibited a significantly positive association with the infiltrating levels of immune cells including B cells, CD4+ T cells, CD8+ T cells, dendritic cells, macrophages, and neutrophils, whereas a relatively low correlation was observed between the gene expression and CD8+ T cells. Accumulating research suggests that the efficiency of immunotherapy relies on an immunogenic TME, and tumor-infiltrating lymphocytes within the TME have a predictive value for treatment with immunotherapy in cancers (Spencer et al., 2016;Gasser et al., 2017). Association with infiltrating immune cells could be a useful criterion for selecting putative cancer vaccine targets . Additionally, it is confirmed that some of these genes are involved in the regulation of immune cells. The cleaved form of ARHGDIB (also called RhoGDIβ), cleaved by caspase-3 at Asp19, plays a role in the PMA− induced differentiation of THP−1 cells to macrophages (Ota et al., 2017). CLIC1, intracellular chloride channel protein 1, regulates macrophage phagosomal functions and regulates the dendritic cell processing of antigens for presentation to antigen-specific T cells (Salao et al., 2016). Furthermore, all of these eight genes were aberrantly expressed in LrGG samples compared to normal brain tissues. Taken together, our model system might have a predictive value for immunotherapy, and these 8 genes might be potential immunotherapy targets that deserve further study.
There are several limitations to our study. First, because of the limited clinical information of patients in public databases, we were unable to perform stratified survival analysis in more subgroups. Second, the performance of our prognostic model should be validated in more LrGG datasets. Third, all the results were based on public datasets and should be further confirmed by actual experiments.
In summary, we identified 74 prognostic hub genes from three TME-related gene co-expression modules and selected eight of these to construct a prognostic model (Figure 10). Time-dependent ROC curves analysis and survival analysis demonstrated that our model exhibited excellent efficiency for predicting the prognosis of LrGG and was a valuable hierarchical marker. Furthermore, our model had the potential value of predicting sensitivity for radio-and chemotherapy. In addition, these eight genes within our model were aberrantly expressed and were significantly associated with the infiltrating level of immune cells in LrGG, indicating their potential as targets for immunotherapy.

aUThOR cOnTRIBUTIOnS
JS analyzed the data, performed computational coding, and wrote the manuscript. QM and WL were involved in the design of the study. KX, YL, QX, GP, and JY collected the data. QL was involved in the design of the study, the manuscript review and editing, and the supervision of the entire work. All authors have approved the final manuscript.

FUnDInG
This study was supported by grants from the National Natural Science Foundation of China (No.81802974) acKnOWLEDGMEnTS We gratefully thank the China Scholarship Council for supporting Jun Su (201706370050).