Glycolysis Define Two Prognostic Subgroups of Lung Adenocarcinoma With Different Mutation Characteristics and Immune Infiltration Signatures

Increasing studies have proved that malignant tumors are associated with energy metabolism. This study was aimed to explore biological variables that impact the prognosis of patients in the glycolysis-related subgroups of lung adenocarcinoma (LUAD). The mRNA expression profiling and mutation data in large LUAD samples were collected from the Cancer Genome Atlas (TCGA) database. Then, we identified the expression level and prognostic value of glycolysis-related genes, as well as the fractions of 22 immune cells in the tumor microenvironment. The differences between glycolysis activity, mutation, and immune infiltrates were discussed in these groups, respectively. Two hundred fifty-five glycolysis-related genes were identified from gene set enrichment analysis (GSEA), of which 43 genes had prognostic values (p < 0.05). Next, we constructed a glycolysis-related competing endogenous RNA (ceRNA) network which related to the survival of LUAD. Then, two subgroups of LUAD (clusters 1 and 2) were identified by applying unsupervised consensus clustering to 43 glycolysis-related genes. The survival analysis showed that the cluster 1 patients had a worse prognosis (p < 0.001), and upregulated differentially expressed genes (DEGs) are interestingly enriched in malignancy-related biological processes. The differences between the two subgroups are SPTA1, KEAP1, USH2A, and KRAS among top 10 mutated signatures, which may be the underlying mechanism of grouping. Combined high tumor mutational burden (TMB) with tumor subgroups preferably predicts the prognosis of LUAD patients. The CIBERSORT algorithm results revealed that low TMB samples were concerned with increased infiltration level of memory resting CD4+ T cell (p = 0.03), resting mast cells (p = 0.044), and neutrophils (p = 0.002) in cluster 1 and high TMB samples were concerned with increased infiltration level of memory B cells, plasma cells, CD4 memory-activated T cells, macrophages M1, and activated mast cells in cluster 2, while reduced infiltration of monocytes, resting dendritic cells, and resting mast cells was captured in cluster 2. In conclusion, significant different gene expression characteristics were pooled according to the two subgroups of LUAD. The combination of subgroups, TMB and tumor-infiltrating immune cell signature, might be a novel prognostic biomarker in LUAD.


INTRODUCTION
Lung adenocarcinoma (LUAD) is a highly fatal cancer of the respiratory system worldwide, and the 5-year survival rate of LUAD is less than 17% (Hirsch et al., 2017). It is reported that application of biomarkers may provide effective prognostic values in LUAD. For example, CAV1 and DCN participate in regulating LUAD progression (Yan et al., 2019). Despite progress in molecular researches, more prognostic biomarkers need to be further explored in LUAD.
Glycolysis is one of the well-studied pathways of glucose metabolism. Normal mammalian cells are inhibited by glycolysis under aerobic conditions, while malignant tumor cells are active in glycolysis even under a sufficient-oxygen environment. This metabolic feature of aerobic glycolysis is called the Warburg effect, which is manifested by high glucose uptake rate, active glycolysis, and high lactic acid content of metabolites (Allard et al., 1994). Previous studies have confirmed that increased levels of glycolysis are involved in malignancy progression, such as proliferation, invasion, and migration (Ganapathy-Kanniappan and Geschwind, 2013;Jin et al., 2017). It is reported that downregulation of Barx2 promotes aerobic glycolysis and predicts a poor prognosis in non-small cell lung carcinoma (NSCLC) . Given that, understanding the mechanisms associated with glycolysis could be a major breakthrough for finding potential prognostic targets. In this study, we aimed to explore new biomarker strategies of glycolysis-related tumor subgroups, which was associated with tumor-infiltrating immune cells, a tumor mutational burden (TMB) in LUAD.
The immune system plays key roles in the surveillance and elimination of tumor cells. The malignant phenotype of tumors is determined not only by the internal activity of tumor cells but also by the tumor-infiltrating immune cells in the tumor microenvironment (Swann and Smyth, 2007), which can promote or suppress the development and growth of tumors (Shiao et al., 2011). Previous studies have shown that the infiltration by immune cells in tumor tissues may have a prognostic value (Fridman et al., 2012). Therefore, the signature of tumor-infiltrating immune cells may be regarded as a potential predictive prognostic biomarker.
Tumor cell mutations could change the function and expression of proteins, resulting in the appearance of tumorspecific neoantigens. T-cells then recognize these neoantigens, causing an antitumor response. Increasing the activation of immune cells through immunotherapy may remove immune-mediated tumor cells and ultimately improve the patients' prognosis. Researchers have identified that high levels of TMB may be more responsive to immunotherapy (Rooney et al., 2015;Fancello et al., 2019), and KRAS mutation can be regarded as a predictive biomarker in lung cancer (Hainsworth and Anthony Greco, 2016). Thus, further studies of mutation signature should be explored to identify prognostic biomarkers for LUAD.
One single prediction method always makes the results inaccurate. Therefore, in the present study, the combination of TMB and tumor-infiltrating immune cells to assess prognosis in different tumor subgroups of LUAD could provide more accurate and effective biomarkers.

Data Collection, Processing, and Validation
We firstly gathered RNA-sequencing data and prognostic information of LUAD from the GDC tool 1 . Then, we downloaded somatic mutation data of all patients processed by VarScan software, and the "maftools" package and "pheatmap" package in R were used to further analyze the mutation data. Subsequently, the "ConsensusClusterPlus" package was performed to construct consensus analysis and PCA was applied to verify the accuracy of the classification. Eventually, the "limma" package was performed to identify differentially expressed lncRNAs (DElncRNAs), differentially expressed miRNAs (DEmiRNAs), and differentially expressed genes (DEGs) with |logFC > 1| and p < 0.05. The independent cohort GSE72094 was downloaded from the Gene Expression Omnibus (GEO) database 2 for data validation. The dataset contains both expression levels of related genes and clinical information such as age, gender, survival status, and survival time.

Gene Set Enrichment Analysis (GSEA)
Gene set enrichment analysis 3 (Subramanian et al., 2005) was used to explore whether the selected gene sets revealed significant differences between the tumors tissues and normal tissues. We obtained glycolysis-related data (KEGG GLYCOLYSIS GLUCONEOGENESIS, BIOCARTA GLYCOLYSIS PATHWAY, HALLMARK GLYCOLYSIS, and REACTOME GLYCOLYSIS) from the MsigDB database 4 . p < 0.05 was the standard to assess whether to further investigate.

Functional Enrichment Analysis of DEGs and Protein-Protein Interaction (PPI) Network Construction Across Two Clusters
To figure out the functional enrichment of glycolysis-related DEGs, the "GOplot" package and "ClusterProfiler" package were used to further analyze by gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, and p < 0.05 was considered as statistically significant. Then, a protein-protein interaction (PPI) network was constructed using the STRING database (Szklarczyk et al., 2015) and the cytoHubba plugin of Cytoscape software (Shannon et al., 2003).

Evaluation of Tumor-Infiltrating Immune Cells
CIBERSORT is a new algorithm used for calculating fractions of 22 immune cells subsets via RNA-seq or microarray data. The immune cells included seven types of T cells and three types of B cells, NK cells and various myeloid cells. We selected the samples with p < 0.05 to elevate the accuracy.
Validation of 43 Significant Glycolysis-Related DEGs Using Quantitative Reverse Transcription-Polymerase Chain Reaction (qRT-PCR) Total RNA was extracted from 16HBE and H1299 cells using the TRIzol reagent (Invitrogen, Carlsbad, CA, United States). After the purity and concentration of the total RNA were determined, the total RNA was reverse transcribed into cDNA using the PrimeScript RT reagent kit (Accurate Biology). The qRT-PCR was performed using the SYBR Green Premix Ex Taq II (Accurate Biology). The PCR conditions were set as follows: 95 • C for 30 s, followed by 40 cycles at 95 • C for 5 s and 60 • C for 30 s for each specific primer. Finally, the relative mRNA expression levels of 43 genes were calculated using the 2 − CT method. The primer sequences are listed in Supplementary Data Sheet 2.

Statistical Analysis
Statistical analysis was utilized by R software (version 4.0.3) and GraphPad Prism (version 7.0). The differences of expression levels between two groups were determined by Student's t-test. We used the "survival" packages to process the survival analysis, and survival curves were visualized by the Kaplan-Meier plotter 8 , which was examined by the log-rank test. The difference of infiltrating immune cells between the high-TMB group and the low-TMB group was determined by unpaired t-test, as well as the expression level of the glycolysis-related genes between cluster 1 and cluster 2. Differences were considered significant at p < 0.05.

Glycolysis Is Associated With the Tumorigenesis of LUAD
We firstly obtained the RNA sequencing data and clinical data of LUAD from TCGA. Then, gene set enrichment analysis (GSEA) was used to further analyze the relationship between glycolysis and LUAD. The GSEA results showed that BIOCARTA GLYCOLYSIS PATHWAY, HALLMARK GLYCOLYSIS, and REACTOME GLYCOLYSIS were significantly enriched in LUAD, which illustrated that glycolysis is involved in the tumorigenesis of LUAD ( Figure 1A). According to the GSEA results, 255 glycolysis-related genes were extracted and visualized by a heat map ( Figure 1B). The information of 255 glycolysis-related genes is summarized in Supplementary Data Sheet 2. In addition, the flow diagram of this study is illustrated in Figure 2.

Screening the Prognostic Value of Glycolysis-Related Genes in LUAD
To elucidate the relationship between the glycolysis-related genes and the prognosis of LUAD, we assessed the prognostic values of these 255 glycolysis-related genes using the Kaplan-Meier plotter database and found that only 43 genes were significantly associated with overall survival (all p < 0.05) (

Consensus Clustering of Glycolysis-Related Genes Identified Two Clusters of LUAD
Glycolysis plays crucial roles in the biological processes of LUAD, so we next visualized the expression levels and correlation of 43 prognostic glycolysis-related genes (Figure 3). Then, we constructed glycolysis subgroups of LUAD with different biological characters based on 43 glycolysis-related genes. Unsupervised consensus clustering analysis indicated that k = 2 was the optimal number of clusters (Figures 4A-C) and principal component analysis (PCA) was used to verify the classification by 43 glycolysis-related genes ( Figure 4D). Finally, a total of 534 patients were divided into cluster 1 and cluster 2, and the results showed that cluster 1 was associated with worse overall survival in LUAD patients compared with cluster 2 (p < 0.001) ( Figure 4E). The specific TCGA patient ID of each cluster is displayed in Supplementary Data Sheet 2.

Validation of Expression of 43 Glycolysis-Related Genes, Consensus Clustering, and Prognostic Value Using an Independent Cohort
In order to verify the feasibility of grouping, the gene expression profile of GSE72094 was used for further analyses. The heatmap plot was visualized to further exhibit the distribution of 43 prognostic glycolysis-related genes ( Figure 5A). As shown in Supplementary Figure 2, the expression of 42 of 43 prognostic glycolysis-related genes was significantly elevated and one gene was downregulated in LUAD tissues compared with adjacent normal lung tissues in GSE72094 (both p < 0.05). Meanwhile, qRT-PCR was also used to verify the expression of these genes. As expected, the results showed that 39 of the 43 genes showed the consistency with the above results, while four genes showed that there is no significance between 16HBE and H1299 (Supplementary Figure 3), which could be caused by the difference between tissue and cell lines. Furthermore, unsupervised consensus clustering analysis and PCA results showed that these two clusters are meaningful and can be verified using GSE72094 (Figures 5B-E). The survival analysis revealed that the joint of glycolysis and gene expression of 43 genes had a significant correlation with the prognosis of LUAD patients ( Figure 5F).

Construction of a 43-Glycolysis-Related Signature
After screening 43 genes, we also constructed a ceRNA network to explore potential upstream mechanisms which may account for the worse overall survival in cluster 1. Using the limma R package according to the standards (p < 0.05, | logFC| > 1), a total of 186 DElncRNAs (Supplementary Data Sheet 2) and 150 DEmiRNAs (Supplementary Data Sheet 2) were identified in LUAD. Firstly, the miRNAs targeted by 43 glycolysis-related genes  Table 3) of 61 DElncRNAs have prognostic values (all p < 0.05). Eventually, we constructed a 43-glycolysis-related ceRNA network in LUAD, including 13 DElncRNAs, 16 DEmiRNAs, and 43 glycolysisrelated genes (Figure 6).

Identification and Functional Enrichment Analysis of the DEGs in Each Cluster
To further comprehend the relationship between glycolysis subgroups of LUAD and different patient prognoses, we identified the DEGs in each cluster from TCGA. A total of 384 DEGs (Supplementary Data Sheet 2), including 94 upregulated genes in cluster 1 and 290 upregulated genes in cluster 2 were screened by the limma R package on the basis of p < 0.05 and | log2(FC)| > 1. Those upregulated DEGs in cluster 1 are downregulated in cluster 2, which means that the rest of the unmentioned downregulated DEGs in cluster 1 are also elevated in cluster 2. In fact, from different perspectives, the DEGs are also different. For this reason, we only displayed and discussed the upregulated genes in each cluster for their different prognosis.
To better realize the function of clustering, we further performed functional enrichment analysis. Upregulated DEGs of cluster 1 were mainly enriched in immune-related biological processes, such as complement activation, classical pathway, humoral immune response mediated by circulating immunoglobulin, and immunoglobulin-mediated immune response ( Figure 7A). More specifically, the immune-related functions in cluster 2 mainly include cell chemotaxis, myeloid leukocyte migration, leukocyte chemotaxis, and neutrophil chemotaxis ( Figure 7C). The results of KEGG enrichment analysis showed that cluster 1 and cluster 2 were mainly enriched in metabolism-related signaling pathways (Figures 7B,D). Moreover, the functional annotation of DEGs displayed that 10 GO terms were statistically significant (p < 0.05) (Figures 7E-H and Tables 4, 5). To identify the biological modules of DEGs in cluster 1, the PPI network was generated by using STRING and visualized by Cytoscape software. The cytoHubba plugin will sort genes by core degree through a variety of algorithms, and genes with high degrees tend to be hub genes. Then, the top 10 hub nodes were identified by the cytoHubba plugin, including HIST1H3B, HIST1H1B, HIST1H1D, HIST1H1E, HIST1H4C, HIST1H2BL, HIST1H2AH, HIST1H2BO, HIST1H3E, and HIST1H2AM ( Figure 7I).

Identification of the Mutation Profile Features From Each Cluster
The simple nucleotide variation data of 561 LUAD samples were downloaded from TCGA and processed by VarScan software. The landscape of mutation data was visualized using the "maftools" package. Mutation profile features indicated that missense mutation was the most common type in Variant Classification, single-nucleotide polymorphism formed the nucleus of variant type, and C>A accounted for more components than other single-nucleotide variants in cluster 1 and cluster 2. The top 10 mutated signatures were visualized by a horizontal histogram, and the results showed that the mutated SPTA1 (29%) and KEAP1 (29%) are unique in cluster 1 among the top 10 mutated genes, as well as USH2A (27%) and KRAS (24%) in cluster 2, which may be the underlying mechanism of grouping ( Figures 8A,B). Besides, the variant allele frequency (VAF) of LUAD is visualized in Figures 8C,D and a lollipop plot displayed these unique mutation points on the protein structure (Figures 8E,F). A waterfall plot displayed the mutation information in each sample, and in cluster 1, seven genes were mutated by >30%: TP53 (58%), TTN (56%), RYR2 (40%), MUC16 (38%), ZFHX4 (38%), CSMD3 (37%), and LRP1B (32%). In cluster 2, five genes were mutated by >30%: TP53 (44%), MUC16 (39%), TTN (38%), CSMD3 (34%), and RYR2 (32%) (Figures 9A,B). Besides, mutated oncogenic pathways were also visualized in each cluster (Figures 9C,D).

Identification of Immune Cell Infiltration Signatures of Each Cluster
The functional enrichment analysis showed that DEGs were involved in cell chemotaxis and immune-related biological processes, which lead us further to analyze the correlation of TMB with immune signatures in each cluster. Based on the CIBERSORT algorithm, we revealed the proportions of different immune cells in specific patients in each cluster. The percentage of 22 types of tumor-infiltrating immune cell in each cluster was visualized (Figures 10A,B). Subsequently, we indicated the differential abundance of immune cells in the low-TMB and high-TMB groups via violin plots. Memory resting CD4+ T cells (p = 0.03), resting mast cells (p = 0.044), and neutrophils  (p = 0.002) showed higher infiltrating levels in the low-TMB group in cluster 1 ( Figure 10C). Moreover, the infiltration levels of memory B cells (p = 0.044), plasma cells (p = 0.048), activated memory CD4+ T cells (p < 0.001), resting NK cells (p = 0.008), M1 macrophages (p < 0.001), and activated mast cells (p = 0.019) were higher in the high-TMB group in cluster 2, However, monocytes (p = 0.021), resting dendritic cells (p < 0.001), and resting mast cells (p < 0.001) showed a higher infiltrating level in the low-TMB group (Figure 10D).

DISCUSSION
Lung adenocarcinoma is one of the prevalent malignant tumors with increasing incidence, mortality, and poor prognosis (Nakamura and Saji, 2014;Hirsch et al., 2017). Meanwhile, LUAD displayed unique metabolic characteristics with high glycolytic activity, which is one of the most crucial hallmarks of cancer (Hanahan and Weinberg, 2011). It is reported that PPP1R14B-AS1 overexpression was associated with poor prognosis in LUAD . Inhibition of glycolysis could regulate the cell survival of LUAD (Farah et al., 2012). However, the relationship between glycolysis-related gene signatures and LUAD prognosis is still unclear so far. In this study, 43 differentially expressed glycolysis-related genes were related to the overall survival for LUAD patients. An appropriate recognition of tumor subgroups is necessary as it may affect patients' prognosis and consideration of molecular detection. On the basis of 43 prognostic genes, we classified two subgroups of LUAD (cluster 1 and 2) by applying unsupervised consensus clustering, which had different clinical prognosis values. Therefore, we analyzed the possible mechanisms of different prognosis from multiple perspectives. A subgroup-specific molecular study transforms the biological characteristics of LUAD into clinical prognostic subgroups to find potential biomarkers. Patients with different subgroups of LUAD had significantly different survival prognoses. In the present study, survival analysis results showed that cluster 1 was associated with worse overall survival for LUAD patients (p < 0.001). To explore the underlying upstream molecular 4 | The top 10 gene ontology (GO) terms of differentially expressed genes in cluster 1. Term  Genes  adj_pval   BP  GO:0006958  Complement activation, classical pathway  IGHV1-18, IGHV1-3, IGHV3-48, IGLV6-57, IGHV3-72, IGHV3-35,  IGHV3-15, IGHV1-58, IGHV4-34, IGHG3, IGLV3-25   2.53E-11   BP  GO:0002455  Humoral immune response mediated by  circulating immunoglobulin   IGHV1-18, IGHV1-3, IGHV3-48, IGLV6-57, IGHV3-72, IGHV3-35,  IGHV3-15, IGHV1-58, IGHV4-34, IGHG3, IGLV3-25   3.63E-11   BP  GO:0006956  Complement activation  IGHV1-18, IGHV1-3, IGHV3-48, IGLV6-57, IGHV3-72, IGHV3-35,  IGHV3-15, IGHV1-58, IGHV4-34, IGHG3, IGLV3-25   1.12E-10   BP  GO:0006910  Phagocytosis, recognition  IGHV1-18, IGHV1-3, IGHV3-48, IGHV3-72, IGHV3-35, IGHV3-15,  IGHV1-58, IGHV4-34, IGHG3 1. 35E-10   BP  GO:0016064  Immunoglobulin-mediated immune  response   IGHV1-18, IGHV1-3, IGHV3-48, IGLV6-57, IGHV3-72, IGHV3-35,  IGHV3-15, IGHV1-58, IGHV4-34, IGHG3, IGLV3- Note. BP, biological process. mechanism which may account for the worse overall survival in cluster 1, we constructed a gene signature related to glycolysis, including 13 DElncRNAs, 16 DEmiRNAs, and 43 glycolysisrelated genes and then verified the prognostic value of this gene signature in LUAD (all p < 0.05). In addition, the difference of gene expression level, immune infiltration signatures, and gene mutation infiltration signatures, and gene mutation information in cluster 1 was compared with that in cluster 2. The transformation of metabolic pathways is usually regulated by specific gene expression, and genetic signatures composed of multiple genes have been developed to predict the prognostic value of clinical cancer patients. According to the different clusters, we screened that 94 DEGs were enriched in immunerelated biological processes in cluster 1. The top 10 hub genes were selected to help identify subgroups, and we found that histone modifications, one of epigenetic modifications,  participated in the occurrence of cancer, which also could be regarded as a prognostic biomarker (Dworkin et al., 2009). Among these top 10 hub genes, HIST1H3B/C-K27Mmutated tumors exhibit a pro-angiogenic/hypoxic signature in diffuse intrinsic pontine gliomas (Castel et al., 2015). Copy number variations of HIST1H1B were connected with cellular development and proliferation in melanoma (Fidalgo et al., 2016). The expression of HIST1H2AH was higher in esophageal squamous cell carcinoma tissues than in adjacent non-tumorous tissues . Ten histone-coding genes here might regulate tumorigenesis to affect prognosis in cluster 1 for LUAD patients. Compared with a single prediction model, the effect of analysis results combined with multiple perspectives can provide a more accurate prediction. Mutation was regarded as a therapy target to improve the prognosis of multiply tumors. Previous researches have reported that targeting SGLT2 may intercept LUAD progression at early stages (Scafoglio et al., 2018). SETD2-mutated LUAD patients exhibited poor recurrence-free survival (Kadara et al., 2017). In this study, we found that the mutated SPTA1 (29%) and KEAP1 (29%) are unique in cluster 1 among the top 10 mutated genes, as well as USH2A (27%) and KRAS (24%) in cluster 2. Furthermore, dysregulation of signaling pathways can also change in cancer metabolism to affect patients' prognosis (Huang et al., 2016). For example, inactivation of the Hippo pathway is connected with the occurrence of various tumors (Panciera et al., 2017). Notch can promote glycolytic metabolism in T cell acute lymphoblastic leukemia (Palomero et al., 2007). Overexpressed SKA3 correlates with poor prognosis through the EGFR-PI3K-Akt axis in LUAD (Hu et al., 2020). Overexpression of CHAP2 may inhibit LUAD cell proliferation and correlate with high survival rate via the WNT signal pathway (Shang et al., 2019). In this study, the enrichment pathways of mutant genes are mainly WNT, PI3K, NOTCH, and Hippo signaling pathways in each cluster, which demonstrated that mutated genes contribute to different prognostic effects of LUAD through these pathways. However, the detailed mechanisms need further investigation.

Category ID
The highly acidic microenvironment produced by tumor glycolysis may affect the infiltration of immune cells to varying degrees, eventually leading to immune escape and tumor progression (Gill et al., 2016;Cascone et al., 2018). Meanwhile, immune cell infiltration has a double feature to affect tumor progression, not only inhibiting the occurrence of tumor but also playing a pro-tumor role (Terlizzi et al., 2014). It is reported that lymphocyte infiltration has been connected with improved survival in NSCLC (Zeng et al., 2016). Mutated genes may generate neoantigens that increase lymphocyte infiltration in the tumor microenvironment. In this study, we found that the high TMB group has higher fractions of memory B cells (p = 0.044), plasma cells (p = 0.048), activated memory CD4+ T cells (p < 0.001), resting NK cells (p = 0.008), M1 macrophages (p < 0.001), and activated mast cells (p = 0.019) in cluster 2, which indicated that glycolysis-related cluster 2 with high TMB contributed to an immunomodulatory tumor microenvironment. These results indicated that TMB could affect the immune cell infiltration signatures and high TMB tends to cause the chemotaxis of immune cells in LUAD.

CONCLUSION
In conclusion, this study is the first to report a 43gene prognostic signature related to glycolysis in LUAD. Furthermore, we constructed a complicated signature, among which the combination analysis of TMB and tumorinfiltrating immune cells was used to assess prognosis in different tumor subgroups of LUAD. Our study may provide a novel sight to realize the mechanisms of glycolysis and identify original gene targets for LUAD patients in the future.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.