Predicting Differences in Treatment Response and Survival Time of Lung Adenocarcinoma Patients Based on a Prognostic Risk Model of Glycolysis-Related Genes

Background: Multiple factors influence the survival of patients with lung adenocarcinoma (LUAD). Specifically, the therapeutic outcomes of treatments and the probability of recurrence of the disease differ among patients with the same stage of LUAD. Therefore, effective prognostic predictors need to be identified. Methods: Based on the tumor mutation burden (TMB) data obtained from The Cancer Genome Atlas (TCGA) database, LUAD patients were divided into high and low TMB groups, and differentially expressed glycolysis-related genes between the two groups were screened. The least absolute shrinkage and selection operator (LASSO) and Cox regression were used to obtain a prognostic model. A receiver operating characteristic (ROC) curve and a calibration curve were generated to evaluate the nomogram that was constructed based on clinicopathological characteristics and the risk score. Two data sets (GSE68465 and GSE11969) from the Gene Expression Omnibus (GEO) were used to verify the prognostic performance of the gene. Furthermore, differences in immune cell distribution, immune-related molecules, and drug susceptibility were assessed for their relationship with the risk score. Results: We constructed a 5-gene signature (FKBP4, HMMR, B4GALT1, SLC2A1, STC1) capable of dividing patients into two risk groups. There was a significant difference in overall survival (OS) times between the high-risk group and the low-risk group (p < 0.001), with the low-risk group having a better survival outcome. Through multivariate Cox analysis, the risk score was confirmed to be an independent prognostic factor (HR = 2.709, 95% CI = 1.981–3.705, p < 0.001), and the ROC curve and nomogram exhibited accurate prediction performance. Validation of the data obtained in the GEO database yielded similar results. Furthermore, there were significant differences in sensitivity to immunotherapy, cisplatin, paclitaxel, gemcitabine, docetaxel, gefitinib, and erlotinib between the low-risk and high-risk groups. Conclusion: Our results reveal that glycolysis-related genes are feasible predictors of survival and the treatment response of patients with LUAD.


INTRODUCTION
Lung cancer accounts for a large proportion of cancer-related human deaths worldwide (Barta et al., 2019;Bade and Dela Cruz, 2020). Because lung adenocarcinoma (LUAD) is a common pathological type of lung cancer (Travis et al., 2015), individualized therapy for LUAD has received increasing attention from clinicians. Because tumor occurrence and development are based on genetic changes (2012; Vogelstein et al., 2013;Zhu et al., 2015), response to therapies and overall survival (OS) is not necessarily the same in patients of the same gender, performance status score, age, and TNM stage when the influence of social, family, and economic factors is removed. Therefore, there is an urgent need to explore effective microscopic molecular biomarkers to predict the response and prognosis of LUAD patients.
Understanding the differences in metabolism and proliferation between tumor cells and normal cells is essential to predict the prognosis and clinical response to treatment in cancer patients. Cells mainly obtain energy to perform their biological activities through glycometabolism, and LUAD cells are not an exception to this rule. Previous studies have revealed that the most significant metabolic change in cancer cells is the appearance of the Warburg effect, which is manifested by increased aerobic glycolysis of tumor cells and dependence on glycolytic pathways to produce adenosine triphosphate (ATP) (Koppenol et al., 2011;Xu et al., 2015;Schwartz et al., 2017;Vaupel et al., 2019). In view of this unique metabolic alteration in tumor cells, many targeted treatments have been discovered and have improved over time (Danial et al., 2003;Coy et al., 2005;Xu et al., 2005;Pelicano et al., 2006;Yang et al., 2020). In addition, an increasing number of studies have used glycolysis-related genes to establish predictive models of tumor prognosis (Zhang et al., 2019;Liu et al., 2020;Tang et al., 2020;Wu et al., 2020;Zhou et al., 2020). The tumor mutation burden (TMB) is defined as the number of somatic mutations per megabase (Mb) of the genomic sequence interrogated of a tumor and can be used as a predictor of the efficacy of immune checkpoint inhibitors (ICI) in multiple tumors (Snyder et al., 2014;Rizvi et al., 2015;Dong et al., 2017). In recent years, several studies have evaluated the TMB as a prognostic molecular marker (Kang K. et al., 2020;Lv et al., 2020;Zhang et al., 2020). Since glycolysis-related genes can be used to establish an effective prognosis prediction model for LUAD, integrating TMB into the prediction model is a new and promising approach with inherent advantages.
Here, we conducted in-depth study using gene expression data from patients with LUAD based on data extracted from TCGA database and used differences in TMB to screen glycolysis-related genes. Subsequently, we used statistical tools to obtain a prognostic model and nomogram composed of glycolysisrelated genes, which exhibited good clinical applicability and produced a method for reliable prediction of prognosis. Furthermore, the risk score was associated with the tumor immune microenvironment and could be used to estimate the sensitivity of the LUAD patients included in this study to treatments.

Clinical Information and Gene Expression Data From Patients
Clinical characteristics, genetic mutations, and mRNA expression data from patients with LUAD were extracted from TCGA database (https://cancergenome.nih.gov/) for subsequent analyzes using Perl (version 5.32.1.1). The clinical characteristics of 522 patients with LUAD, including gender, age, T (tumor), N (lymph node), and M (metastasis) stages, clinical stage, survival status, and smoking history, are detailed in Table 1. We defined smoking history as: a lifelong nonsmoker (<100 cigarettes smoked in lifetime) = 1, current smoker (includes daily smokers and non-daily smokers or occasional smokers) = 2, current reformed smoker for >15 years = 3, current reformed smoker for ≤15 years = 4, current reformed smoker, duration not specified = 5).

Screening of Differentially Expressed Glycolysis-Related Genes
By searching glycolysis gene sets in the Gene Set Enrichment Analysis (GSEA) database (http://www.broadinstitute.org/ gsea/index.jsp), a total of 326 genes were obtained. The gene mutation data obtained were used to calculate the TMB value of each LUAD sample and were divided into high and low TMB groups based on the median value. Subsequently, the analysis of differentially expressed glycolysis-related genes between the high TMB group and the low TMB group was performed in R, with the following cut-off criteria: |log2fold change (logFC)| > 0.2; p-value < 0. 05; FDR (false discovery rate) < 0.05.

Gene Prognosis Model Development and Expression Validation
The 'survival' and 'glmnet' packages in R were used to perform the univariate Cox regression analysis of differentially expressed genes. LASSO regression analysis and multivariate Cox proportional hazards regression were performed. We use the following risk formula: Risk score = β gene1 × Expressiongene1 + β gene2 × Expressiongene2 + β gene3 × Expressiongene3 + + β genen × Expressiongenen, where β is the coefficient and Expressiongene indicates the expression level in patients with LUAD. LASSO is a popular machine-learning algorithm, which was extensively utilized in medical studies Liu et al., 2022a;Liu et al., 2022b;Liu et al., 2022c). ROC curves were FIGURE 1 | GSEA revealed that four gene sets were significantly enriched: (A) BioCarta glycolysis (B) glycolysis (C) REACTOME glycolysis, and (D) WP glycolysis and gluconeogenesis.
Frontiers in Genetics | www.frontiersin.org May 2022 | Volume 13 | Article 828543 3 used to evaluate the validity of the prognostic model using the 'survival ROC' package. Differences in survival times between the low-risk and high-risk groups was then assessed using the 'survival' package and the 'survminer' package in R, and Kaplan-Meier (KM) curves were plotted to display the results directly. Finally, by integrating GSE68465 and GSE11969 datasets, 536 LUAD patient data points were obtained, and patients were divided into high-and low-risk groups using the risk score formula. The Human Protein Atlas (HPA) database (https://www.proteinatlas.org/) was used to verify the expression of genes in the model in LUAD cells.

Comparison of Prognostic Model Prediction Performance
The gene prognostic model was compared with gender, age, stage T (tumor), stage N (lymph node), M (metastasis), clinical stage, and smoking history as an independent prognostic analysis. Subsequently, the AUC of the model was compared with that of the clinicopathological features and with that of the existing LUAD prognostic model with a similar number of genes.

Prognostic Performance of Risk Genes and Nomogram Construction
The prognostic value of the genes included in the model was verified using the Gene Expression Profiling Interactive Analysis (GEPIA) database (http://gepia.cancer-pku.cn/) and the Kaplan-  Meier (KM) plotter database (http://kmplot.com/analysis/). The results of differences in clinicopathological characteristics between the high-risk and low-risk groups were displayed in a strip chart. The differences in risk scores each clinicopathological feature are presented in the box plot. Where p < 0.001 = * **, p < 0. 01 = ** *, p < 0.05 = *. Furthermore, we built a nomogram to improve the application of the model in the clinical setting and validated it using the 'rms' package. A nomogram is a prognostic evaluation tool that can integrate several prognostic determinants, including molecular and clinicopathological parameters; and can calculate and visualize the numerical probability of clinical events using a relatively simple output and is a tool widely used in clinical oncology (Balachandran et al., 2015).

Immune Microenvironment and Therapeutic Response
To clarify the relationship between the immune microenvironment and the risk score, XCELL, TIMER, QUANTISEQ, MCPcounter, EPIC, CIBERSORT, and CIBERSORT-ABS were used to obtain differences in the distribution of immune cells in patients with LUAD from TCGA database. The expression of genes related to immune checkpoint inhibitors were compared between the low-risk group and the high-risk group. We showed the results and the p-value was labeled as follows: p < 0.001 = ***, p < 0.01 = ***, and p < 0.05 = *. The Tumor Immune Dysfunction and Exclusion (TIDE) database (http://tide.dfci.harvard.edu) was used to obtain scores of immunotherapy evaluation-related indicators of LUAD samples (Hoshida et al., 2007;Jiang et al., 2018). Meanwhile, a differential comparative analysis was performed using the number of responder and nonresponder patients evaluated after immunotherapy obtained from the website. Subsequently, we added the pRRophetic package to evaluate the differences in clinical responses between LUAD patients grouped by the gene prognostic model by calculating the IC50, or half maximal inhibitory concentration, of commonly used drugs in R.

Statistical Analysis
All statistical analyses were performed using R 4.1.2 (https:// www.r-project.org/). Univariate Cox regression analysis was used to screen for genes associated with prognosis with p < 0. Frontiers in Genetics | www.frontiersin.org May 2022 | Volume 13 | Article 828543 5 0001. Multivariate Cox proportional hazards regression and LASSO regression analysis were used to construct prognostic model. The KM survival curve analysis and a log-rank test were used to compare differences in overall survival time between the high-risk group and low-risk group. The relationship between risk score and clinicopathological features was analyzed using The Chi-Square test and Wilcoxon Signed-Rank Test with p < 0.05. Spearman's correlation analysis was performed to obtain the correlation between the distribution of immune cell and risk score. Differences between immune cells in risk groupings were analyzed using the Wilcoxon signed rank test with p < 0.0001. The Wilcoxon test was used to calculate the TIDE score and the T cell exhaustion potential of the tumor score between the high-and lowrisk groups. Chi-square test was used to obtain p-values to compare the differences in the number of immune responses in different groups.

Data Processed by GSEA
Using 'glycolysis' as the search keyword, the BioCarta, Hallmark, KEGG, REACTOME and WP gene sets were selected from the Molecular Signatures Database (MSigDB) to obtain a glycolysisrelated gene signature. After using the above detailed data to perform GSEA, significant differences were detected between the tumor tissue sample group and the normal tissue sample group among the gene sets ( Figures 1A-D), with normalized p values <0.05.

Establishment and Evaluation of a Glycolysis-Related Gene Model
By comparing the high TMB group with the low TMB group, 95 differentially expressed genes were obtained (Figure 2 and Supplementary Material 1). Univariate Cox analysis was used obtain 10 genes related to prognosis ( Table 2). LASSO regression analysis and multivariate Cox proportional hazards regression analysis were used to calculate the risk score to construct the prognostic model ( Figures 3A,B and Supplementary Table S1). Prognosis was estimated as follows: (0.00565 * expression level of FKBP4) + (0.03539 * expression level of HMMR) + (0.00638 * expression level of B4GALT1) + (0.00332 * SLC2A1) + (0.00387 * expression level of STC1). The areas under the ROCs (AUCs) of the 1-, 2-, and 3-years overall survival (OS) rate analysis for the 5-gene prognostic model were 0.687, 0.665 and 0.696, respectively, ( Figure 3C). Similar verification results were obtained in the GEO database, and the ROCs (AUCs) of the 1-, 2-, and 3-years OS rate analysis for the prognostic model were 0.712, 0.699 and 0.652, respectively, ( Figure 3D). Survival was significantly different between the two groups (p < 0.001), in TCGA database ( Figure 4C). The low-risk group in the GEO database also had a better prognosis (p = 0.028) ( Figure 4B). Immunohistochemical results of FKBP4, HMMR, B4GALT1, SLC2A1 and STC1 gene expression status in LUAD tissue and normal tissue obtained from HPA database are shown in Figure 4C. We conducted an independent prognostic analysis of age [p = 0.302, HR = 1.008, 95% CI (0.  Figure 5B). We also found that the AUC of the risk score of all patients was similar to the AUC of the clinical stage ( Figure 5C). The predictive performance of the prognostic model in this paper is significantly better than the results in other studies in the 3-years survival time ( Figure 5D) (Yue et al., 2019;Li J. et al., 2020;Yu and Tian, 2020;Yao et al., 2021;Zheng et al., 2021). Validation in the GEPIA database suggests that each gene in the model has a prognostic value ( Figures 6A-E), and similar results were also obtained in the Kaplan-Meier (KM) plotter database ( Figures 7A-E).

Clinical Evaluation and Nomogram Construction
Chi-square test results suggested statistically significant differences in gender, T stage, N stage, clinical stage, and smoking status between subgroups of prognostic models ( Figure 8A). We then found that T stage, N stage, clinical stage, and smoking status ( Figures 8B,C,E,F) were significantly correlated with the risk score, while there were no differences in risk scores between the M stage, age, and gender group ( Figures 8D,G,H) using the Wilcoxon signedrank test. Subsequently, to enhance the clinical utility of the gene model, we constructed a nomogram ( Figure 9A). The risk score, age, gender, smoking status, and clinical stage were basic elements included in the nomogram. Furthermore, the nomogram prediction results were generally consistent with the actual survival outcomes based on the prediction calibration curves of the 1-, 2-, and 3-years survival rates ( Figure 9B). The AUCs of the 1-, 2-, and 3-years survival rate predictions for the nomogram were 0.742, 0.723, and 0.728 ( Figure 9C).

Estimation of Tumor Immune-Related Cells and Molecules and Patient Therapeutic Response With the Gene Assessment Model
We determined that the presence of tumor immune-related cells, such as myeloid dendritic cells, CD4 + memory T cells, CD8 + T cells, endothelial cells, and M2 macrophage, was positively correlated with low risk, while common lymphoid progenitor, M0 macrophages, M1 macrophages, and NK cells were positively associated with high risk ( Figure 10A). Detailed results of the Wilcoxon-signed rank test are shown in Supplementary Material 2. Since it is not uncommon to use immune checkpoint inhibitors to treat LUAD in the clinic, our aim was to clarify whether ICI-related biomarkers were associated with the risk model and found that low risk scores were positively correlated with high expression of CD28 (p < 0.01), CD274 (p < 0.01), and LAG3 (p < 0.01) ( Figure 10B). Subsequently, we found in the TIDE database that the TIDE score of the low-risk group was higher than that of the high-risk group (p = 3.8e-10) ( Figure 11A). This suggested that the possibility of immune escape in the low-risk group Frontiers in Genetics | www.frontiersin.org May 2022 | Volume 13 | Article 828543 8 is higher than in the high-risk group, and the immunotherapy effect is worse than that in the high-risk group. The T cell exhaustion potential of the tumor score was higher in the high-risk group than in the low-risk group ( Figure 11B). Combined analysis of TIDE value and IFNG gene expression indicated that the number of effective samples for immune checkpoint blockade in the high-risk group was significantly higher than in the low-risk group (p = 2.48e-40) ( Figure 11C). Next, we explored the relationship between the risk score and the response to chemotherapy in patients with TCGA LUAD. Our results indicated that a low-risk score was correlated with a higher IC50 for medications such as gefitinib (p = 7.4e-05), erlotinib (p = 1.8e-05), cisplatin (p = 9.5e-07), docetaxel (p < 2.22e-16), gemcitabine (p = 3.2e-12), and paclitaxel (p < 2.22e-16) (Figures 11D-I).

DISCUSSION
Although survival rate and quality of life of patients with LUAD has improved with the development of multiple aggressive treatments, such as surgery, chemotherapy, targeted therapy, and radiotherapy; not every patient can benefit from these treatments. The reason for this phenomenon is that most of the genes altered in different lung adenocarcinoma patients are different. Therefore, existing guidelines recommend genetic testing for lung adenocarcinoma patients before treatment, such as EGFR mutations and anaplastic lymphoma kinase (ALK) fusion mutations (Zhu et al., 2008;Shan et al., 2015;Khan et al., 2018;Kang J. et al., 2020). However, there are still some patients with the same expression of the above-mentioned genes with different prognosis. For example, only some of these patients effectively respond to immune checkpoint inhibition therapy, while others still progress rapidly (Herbst et al., 2020). Thus, we should conduct in-depth research and discussion on the types and number of genes that need to be detected. To date, many studies have revealed that the integration of expression data of multiple molecular markers can effectively predict patient prognosis and their potential response to drugs. Of these, the breast cancer 21-Gene Expression Assay is one of the most welldeveloped panels; it can provide a prediction of patient prognosis, disease recurrence, and tumor metastasis and can be used to guide treatment plans and assist in the development of individualized patient treatment strategies (Sparano et al., 2018). Based on this research model, the research on molecular markers of lung adenocarcinoma is also in full swing in recent years. However, the research directions of these studies have been different, such as the development of an immune prognostic model , an autophagy-associated gene prognostic model (Wang X. et al., 2020), a ferroptosis-related gene prognostic model (Gao et al., 2021), and a glycolysis-related gene prognostic model (Liu et al., 2020), so it is not known whether any one approach is effective for all individuals. Therefore, continuously improving predictive model methods will provide a variety of treatment options for specific patients.
To obtain a more reliable prognostic model for LUAD, we used prognostic models constructed from glycolysis-related genes as a reference. First, glycolysis-related genes in LUAD were extracted and, based on differences in the TMB, differentially expressed glycolysisrelated genes were selected as the basis to construct the prognostic model. Following Cox regression analyses and LASSO regression analysis, we found that a prognostic model consisting of 5 glycolysisrelated genes achieved a better independent prognostic prediction performance, and the nomogram combined with the clinical characteristics of this model resulted in a better performance and more practical clinical application value. Subsequently, we searched the HPA database for the immunohistochemical data relative to these five genes. We also used the data in the GEO database for verification. Since there was a difference in the survival times between patients grouped according to the model, we investigated the reason for this difference. The results of our in-depth study revealed that there were differences in tumor pathological characteristics and immune responses between patients grouped according to glycolysis-related genes, as well as differences in sensitivity to therapeutic drugs. Therefore, we provide sufficient evidence to demonstrate that the gene model obtained in this study contributed to improve the prediction of LUAD patient response to clinical treatment.
For the models obtained in this study, in addition to the predictive performance comparison, we also compared the results of existing similar studies. Sun et al. reported that immune-related genes could be used to construct a prognostic model. However, their nomogram did not combine the model with clinicopathological characteristics, so it was impossible to evaluate the effects of age, gender, and stage for a specific patient . Although Xu et al. obtained prognostic biomarkers by analyzing the tumor microenvironment of LUAD, they did not calculate the AUC value of the model (Xu et al., 2020). Most risk models are based on detecting the expression level of the molecule of interest and by calculating the total risk score to determine the prognosis of the patient. The first requirement is to judge the accuracy of the model before considering whether it can be used in clinical practice. Li et al. found that RNA binding proteins could be prognostic signatures for LUAD; the obtained model had good prediction performance, and a nomogram was also constructed . However, the differences in the immune microenvironment between the groups based on that model have not been further explored. It is well known that patient prognosis is related to a variety of factors, and the aforementioned model is of limited use for predicting survival. In addition, the clinical treatment plan for patients is somewhat variable. Therefore, a model is more valuable if it also has the ability to predict the patient's response to treatment. Wu et al. validated a LUAD prognostic biomarker showed that there was a significant difference between the low-and high-risk groups.
Frontiers in Genetics | www.frontiersin.org May 2022 | Volume 13 | Article 828543 13 constructed using autophagy-related long noncoding RNAs (Wu et al., 2021), but the risk model did not specify the 1-, 2-, or 3-years survival rate AUCs in detail, nor did it analyze the relationship with clinicopathological features. Zhang et al. also constructed a prognostic model based on glycolysis-related genes, but did not specify the AUC values for 1, 2, and 3 years, and did not use the TMB in a differential analysis, nor was the model verified (Zhang et al., 2019). The number of genes in the model was greater than that in this study. Our prognostic model, based on the metabolic characteristics of the tumors, has the following favorable characteristics. First, it is supported by a strong theoretical basis. Glycolysis, as a metabolic characteristic of common tumor changes, has been confirmed to be an influencing factor by many researchers. Second, data screening was reasonable and the results obtained were reliable. Finally, the assessment of the nomogram and its ability to predict patient drug sensitivity provided improved clinical applicability.
Although the model we constructed presents the aforementioned advantages, there are also some shortcomings. For example, studies have shown seemingly contradictory results, the low-risk group has a higher degree of immune infiltration and a better prognosis, but the effects due to immunotherapy are poor. After reviewing the findings from study and referring to relevant prior studies, we presume that there may be several possible reasons for this result. First, we screened differentially expressed glycoly-related genes using the level of TMB. In the gene expression heatmap, we could find that the higher expression of the four genes in the model were all associated with a high TMB. The risk score was positively correlated with these four genes. Therefore, high TMB is also positively correlated with risk score. Current studies show that high TMB is positively correlated with the therapeutic effects of immune checkpoint inhibitors (Rizvi et al., 2015;Dong et al., 2017). Therefore, we believe that the results in this paper are reasonable, for the following reasons: 1) although the low-risk group showed greater immune cell infiltration, the cells that can directly affect immunotherapy are effector T cells; but in the microenvironment other immune cells can inhibit its function, thus making immunotherapy less effective (Cao et al., 2021). For example, this study showed that the low-risk group may have more macrophage M2 infiltration, as well as mast cells and neutrophils with dual roles. 2) The degree of influence of TMB on immunotherapy was higher than that of immune cell infiltration. The low-risk group was associated with a low TMB, and tumor cells with low TMB are not easily supervised by immune cells, even though there are more immune cells in the microenvironment, while a high TMB has a higher probability of triggering a T cell response (Jardim et al., 2021). Secondly, in previous studies, we also found that even though LUAD patients in the group achieved a superior response to immunotherapy, the actual patient survival was worse than the control group (Wang Q. et al., 2020), because the prognosis of tumor patients was influenced by multiple factors, including the sustainability of drug treatment, tolerability of adverse drug reactions, and response to radiotherapy. Moreover, it is a pity that we were not able to conduct in vitro studies to further verify these genes; thus, we can only extrapolate the function of these genes from existing studies. FKBP4 encodes a protein that plays a role in immune regulation and basic cellular processes are involved in protein folding and trafficking. Transcriptional activity and nuclear translocation of NF-κB is enhanced to promote LUAD progression through the formation of FKBP4/Hsp90/IKK and FKBP4/Hsp70/RelA complexes (Zong et al., 2021). As a target gene of mir-34A-5p, HMMR can promote tumor growth in LUAD through the HCG18/Mir-34A-5p/HMMR axis (Li et al., 2020c). As one of the beta-1,4-galactosyltransferase (beta4GalT) genes, B4GALT1 encodes an enzyme involved in glycoconjugates and lactose biosynthesis. Prior studies have found that it is associated with the prognosis of LUAD (Zhang et al., 2019), but the specific mechanisms involved have not been elucidated. SLC2A1 encodes a glucose transfer protein, which has been found to be associated with the prognosis of LUAD patients (Cheng et al., 2021), but the specific influencing mechanism has not been elaborated. STC1 can enhance glucose metabolism, ATP production, and lactic acid production of lung cancer cells under normoxic and hypoxic conditions, thus achieving anti-apoptotic properties (Ohkouchi et al., 2012). These genes deserve further investigation into the mechanisms and interactions of LUAD when experimental research resources become available in the future.

CONCLUSION
An LUAD risk prognostic signature consisting of 5 glycolysis-related genes was identified in this study. To predict the survival time of patients with LUAD and their potential response to therapeutics, the model obtained in this study has excellent performance. This is the first study to predict the survival and drug response of patients with LUAD by combining glycolysis-related genes with TMB. Furthermore, the combination of glycolysis-related genes and immune responses not only enhances the accuracy of the model but also leads to a new direction in immunotherapy.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: The Cancer Genome Atlas, Gene Expression Omnibus (GEO), accession numbers GSE68465 and GSE11969.

AUTHOR CONTRIBUTIONS
RZ wrote the manuscript. RZ, DD, and XW performed the statistical analysis. RZ, YD, RH, and CZ prepared the figures and wrote the figure legends. All authors approved the final version to be published.

ACKNOWLEDGMENTS
Thanks to AJE and Charlesworth Group for the language polishing.