Transcriptome Analyses Identify a Metabolic Gene Signature Indicative of Antitumor Immunosuppression of EGFR Wild Type Lung Cancers With Low PD-L1 Expression

Purpose With the development and application of targeted therapies like tyrosine kinase inhibitors (TKIs) and immune checkpoint inhibitors (ICIs), non-small cell lung cancer (NSCLC) patients have achieved remarkable survival benefits in recent years. However, epidermal growth factor receptor (EGFR) wild-type and low expression of programmed death-ligand 1 (PD-L1) NSCLCs remain unmanageable. Few treatments for these patients exist, and more side effects with combination therapies have been observed. We intended to generate a metabolic gene signature that could successfully identify high-risk patients and reveal its underlying molecular immunology characteristics. Methods By identifying the bottom 50% PD-L1 expression level as PD-L1 low expression and removing EGFR mutant samples, a total of 640 lung adenocarcinoma (LUAD) and lung squamous carcinoma (LUSC) tumor samples and 93 adjacent non-tumor samples were finally extracted from The Cancer Genome Atlas (TCGA). We identified differentially expressed metabolic genes (DEMGs) by R package limma and the prognostic genes by Univariate Cox proportional hazards regression analyses. The intersect genes between DEMGs and prognostic genes were put into the least absolute shrinkage and selection operator (LASSO) penalty Cox regression analysis. The metabolic gene signature contained 18 metabolic genes generated and successfully stratified LUAD and LUSC patients into the high-risk and low-risk groups, which was also validated by the Gene Expression Omnibus (GEO) database. Its accuracy was proved by the time-dependent Receiver Operating Characteristic (ROC) curve, Principal Components Analysis (PCA), and nomogram. Furthermore, the Single-sample Gene Set Enrichment Analysis (ssGSEA) and diverse acknowledged methods include XCELL, TIMER, QUANTISEQ, MCPcounter, EPIC, CIBERSORT-ABS, and CIBERSORT revealed its underlying antitumor immunosuppressive status. Besides, its relationship with somatic copy number alterations (SCNAs) and tumor mutational burden (TMB) was also discussed. Results It is noteworthy that metabolism reprogramming is associated with the survival of the double-negative LUAD and LUSC patients. The SCNAs and TMB of critical metabolic genes can inhibit the antitumor immune process, which might be a promising therapeutic target.


INTRODUCTION
Tyrosine kinase inhibitors (TKIs), as a milestone treatment against lung cancer, have demonstrated remarkable therapeutic effects in NSCLC. TKIs reversibly binds to the intracellular tyrosine kinase domain of the epidermal growth factor receptor (EGFR) by competing with ATP and inhibit activation of downstream signaling (1). Although in EGFR mutation-positive patients, erlotinib, gefitinib, and afatinib have achieved better progression-free survival (PFS) and overall survival (OS), most patients inevitably acquired resistance to TKIs within 12 months (2). Moreover, they are not always beneficial for EGFR mutation-negative patients. Programmed death-ligand 1 (PD-L1) is expressed on multiple malignant tissues and up-regulated within the tumor microenvironment, resulting in T-cell immunity resistance (3). Antibodies of PD-L1 can restore T cell function and enhance antitumor immunity (4).
For patients with EGFR mutation-negative and overexpressing PD-L1, T cell-based immunotherapies, which have been called immune checkpoint inhibitors (ICIs), were used as a choice because of their remarkable clinical response (5,6). Immunotherapy is the first-line treatment of advanced-stage NSCLC patients harboring EGFR/ALK (ALK receptor tyrosine kinase) wild type with PD-L1 expression ≥ 50% and second-line treatment when PD-L1 expression ranges between 1 and 50% (7). When the expression of PD-L1 ranges between 1-50%, ICIs is still a second-line treatment option along with chemotherapy (5,6). However, low expression of PD-L1, EGFR wild-type NSCLCs showed less therapeutic benefit and more adverse events. Therapies combined PD-L1 blockade and chemotherapy have achieved modest response rates but at the expense of more adverse effects (8)(9)(10). Considering that the carcinogenic mechanism and molecular basis of EGFR wild-type and low expression of PD-L1 NSCLC remain elusive, exploring an optimal treatment regimen is still ambiguous.
Accumulating evidence has suggested that metabolic reprogramming contributes to tumorigenesis and impacts the tumor microenvironment (11). Metabolic gene alterations and their prognostic roles have been observed in several tumor types like thyroid cancer, neuroblastoma, and melanoma, which indicate that altered metabolic genome may serve as novel tumor targets for therapies (12)(13)(14). However, it remains unknown whether metabolism-associated genes could mediate a key mechanism in EGFR wild-type and low expression of PD-L1 NSCLC.
We conducted comprehensive analyses integrating multiple sources in the present work, including gene transcriptome, somatic copy number alterations (SCNAs), mutations, and clinical parameters to uncover the impacts of metabolic genes in the double-negative (EGFR wild-type and low expression of PD-L1) LUAD and LUSC patients. The multiple platforms utilized in this study contain the Cancer Genome Atlas (TCGA), the Gene Expression Omnibus (GEO) database, and website resources like the cBioPortal platform and TIMER. The critical prognostic metabolic genes and a risk predict signature was generated to discuss specific immune status further.

Sample Datasets and Data Availability
Publicly gene expression and mutation format files of lung adenocarcinoma (LUAD) and lung squamous carcinoma (LUSC) with corresponding clinical data were downloaded from The Cancer Genome Atlas (TCGA) on 1 September 2020, which comprised of 904 EGFR wild type samples and 93 adjacent non-tumor samples. The definition of TMB refers to the number of somatic mutations, coding mutations, base replacement mutations, and insertion mutations per megabase in the genome. We calculated TMB as the number of all mutations/exon length (38 million) for each sample. By identifying the bottom 50% PD-L1 expression level as PD-L1 low expression and removing EGFR mutant samples, 640 tumor samples were finally extracted. The LUAD and LUSC expression dataset in Gene Expression Omnibus (GEO) were included as the validation cohorts. GSE3141 contained gene expression profile of 111 primary lung tumor samples and corresponding prognostic data. GSE14814 included 133 NSCLC samples with clinical data and microarray data.

Gene Expression Data Analysis
The metabolism-related genes were obtained from the Molecular Signatures Database (MSigDB) of the gene set enrichment analysis (GSEA) website. 944 genes in the KEGG (Kyoto Encyclopedia of Genes and Genomes) gene sets that correlated with the metabolism pathway were identified and extracted.
Differentially expressed metabolic genes (DEMGs) between tumor and normal tissues were targeted by using the R package limma. FDR > 0.05 and |log2FoldChange| > 1 were defined as the thresholds for screening DEMGs to identify upregulated and downregulated genes. We performed the Univariate Cox proportional hazards regression analyses to determine the prognostic genes associated with overall survival (OS). The intersect genes between DEMGs and prognostic genes were considered to put into the least absolute shrinkage and selection operator (LASSO) penalty Cox regression analysis.

Construction of the Prognostic Metabolic Signatures
The LASSO penalty Cox regression analysis was conducted on these intersect genes and to generate a metabolic gene signature. By calculating each subject's cox regression coefficients, patients were sub-grouped to the high-and low-risk depending on their median value of signature scores. The same signature formula was applied to the GEO validation cohorts. The Kaplan-Meier method was achieved by the survival and survivalROC packages. The timedependent Receiver Operating Characteristic (ROC) curve was used to predict clinical characteristics' accuracy and signature. The R package rms generated the nomograms. The calibration curve showed the calibration of the nomogram between the predicted risk and observed outcomes, representing the predicted and actual 3-Year overall survival (OS).
The current acknowledged methods such as XCELL (18), TIMER (19), QUANTISEQ (20), MCPcounter (21), EPIC (22), CIBERSORT-ABS (23), and CIBERSORT (24) were united to reveal the immunologic characteristics between groups. Diverse immune infiltrating cells were estimated by Spearman correlation analysis and Wilcoxon signed-rank test with risk scores and risk groups. The estimation files for the TCGA project calculate the immune infiltration statues were download from the TIMER website (http://timer.comp-genomics.org). The R packages ggplot2, ggtext, scales, and limma were used in this procedure.

The Cancer Immunome Atlas
The Cancer Immunome Atlas (TCIA) is an online searchable database that enables researchers to develop and test hypotheses on the impact of cancer genome on tumor microenvironment and immune characteristics, particularly concerning ICIs treatment response (25). The immunophenotypes of 20 solid cancers from TCGA are determined by cellular characterization of the immune infiltrates, showing potential tumor escape mechanisms. Using the machine learning method, tumor immunogenicity is identified, and a scoring scheme defined as immunophenoscore is generated. The immunophenoscore can be used as a favorable predictor of response to anti-CTLA-4 and anti-PD-1 antibodies, which were validated in two independent cohorts. A higher immunophenoscore indicates a better prognosis and better response to immunotherapy (26). Through TCIA, we obtained the immunophenoscore of TCGA-LUAD and TCGA-LUSC and the expression levels of CTLA-4 and PD-1. Patients were re-grouped according to the expression of CTLA-4 and PD-1, and then the immunophenoscore of the high-and low-risk patients were compared for responses to ICIs treatment.

Function Annotation
By taking the median risk value as the threshold, samples were divided into the high-and low-risk groups. General differentially expressed genes (DEGs) were screened between these two risk levels and were put into the Gene Ontology (GO) function annotation and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis by R package GOplot.

Statistical Analysis
The student's t-test compared gene expression data between tumor samples and adjacent non-tumor samples. The Chisquare test for parametric distributions or the Wilcoxon test for nonparametric distributions was used for differences in proportions. All statistical analyses were performed with R software (Version 3.6.3) and Graphpad (Version 8.0.2). Pvalue < 0.05 was considered statistically significant if there are not specially mentioned.

Identification and Validation of the Prognostic DEMGs Signature in the TCGA Cohort
In 944 metabolism-related genes, 224 DEMGs were identified between tumor tissues and adjacent nontumorous tissues (73 down-regulated and 151 up-regulated, Figures 1A, B), of which 103 metabolic genes related to OS were extracted by univariate Cox regression analysis ( Figure 1C). After cross-matching, 19 DEMGs correlated with OS were finally generated and were put in LASSO Cox regression analysis to establish a prognostic  signature ( Figures 1D, E). Based on the optimal value of l ( Figures 1F, G), a signature comprised of 18 DEMGs was constructed as follows: risk score = ADCY9*-0.1096 + ACP4*0.4257 + GCLC*0.0124 + ALDOA*0.0196 + UCK2*0.0106 + PLA2G4B*-0.5342 + TXNRD1*0.0865 + PGM2*0 .04 96 + PTGIS*0 .294 6 + LDHA *0.0949 + PFKP*0.0838 + PTGES*0.0865 + ALDH1A2*-0.1811 + ISYNA1*-0.1000 + ACP5*-0.1516 + POLR2J3*0.4008 + ACSM5*-0.2192 + ADA*0.0094. According to the median cut-off value of risk score, the patients were divided into the high-and low-risk groups. The higher risk score was associated with worse OS in survival analysis ( Figure 1H). We used Principal Component Analysis (PCA) to evaluate the prognostic signature's effectiveness, which showed significantly different distribution patterns and suggested that the DEMGs signature can distinguish the high-and low-risk patients effectively (Figure 2A). The ROC curve analyses indicated that the AUC value of the DEMGs signature was higher than all the other risk factors ( Figure 2B), reaching 0.709. A nomogram was also built to incorporate the clinical parameters and risk score to predict 3-year OS. The nomogram's 1-, 3-and 5-year calibrate curve revealed that the predicted OS was very close to the actual OS ( Figures 2C, D), indicating high accuracy.
The DEMGs signature was tested in cohorts GSE3141 and GSE14814 for validation. We took the same formula to predict patients' survival, of which the Kaplan-Meier plot showed the patients in the high-risk group had a significantly poor survival time than those in the low-risk group (Figures 2E, F).

Associations Between the Prognostic Signature and Clinicopathologic Features
As shown in Figures 3A-L, we further tested the relationship between the signature and pathological parameters. In younger or older patients, female and male patients, early-or late-stage patients, the signature still showed the prognostic ability in the high-and low-risk groups.

GO Function Annotation for the Prognostic Signature
Patients were divided into the high-and low-risk groups depending on the median value of risk scores. The general DEGs between two risk levels were put into GO and KEGG functional annotation ( Figures 4E, F), which revealed that diverse immunology processes pathways like humoral immune response, leukocyte migration, leukocyte chemotaxis, and leukotriene metabolic process were involved.

Genetic Alterations of the Genes in the Prognostic Signature
The mutation data of 586 LUAD samples and 501 LUSC samples are available in the cBioPortal platform (http://www.cbioportal. org/). As shown in Figure 4D, the 18 signature genes showed varying levels of mutation in TCGA-LUAD and LUSC samples. The mutation rates of UCK2 and ACSM5 were significantly higher than that of other genes. Furthermore, we investigated the impacts of the 18 signature genes on immune infiltrations via the TIMER database (https:// cistrome.shinyapps.io/timer/), a comprehensive resource for systematical analysis of immune infiltrates across diverse cancer types (19). We evaluated the relationship between somatic copy number alterations (SCNAs) of 14 signature genes and immune cell infiltration. The SCNAs includes deep deletion (-2), arm-level deletion (-1), diploid/normal (0), armlevel gain (1), and high amplification (2). The immune infiltration level in each category was compared with the diploid/normal level. A p-value < 0.05 was considered significant, and a two-sided Wilcoxon rank-sum test was used between different categories. As shown in Figure 5, differential SCNAs categories statistically changed the immune cell infiltration levels, including B cell, CD8+ T cell, CD4+ T cell, macrophage, neutrophil, and dendritic cell infiltration.

Immune Profile Analysis
We characterized the immunology profile of EGFR wild type lung cancer samples with low PD-L1 expression by ssGSEA. In 29 immune gene sets, aDCs, B cells, CD8+ T cells, DCs, Macrophages, Mast cells, Neutrophils, NK cells, pDCs, T helper cells, Tfh, Th1 cells, Th2 cells, TIL, and Treg were associated with lower ssGSEA score in the high-risk group ( Figure 6A). Similarly, APC co-inhibition, APC co-stimulation, CCR, Check−point, Cytolytic activity, HLA, Inflammation−promoting, MHC class I, Parainflammation, T cell co−inhibition, T cell co−stimulation, Type I IFN ReSponse, and Type II IFN Response gained lower ssGSEA score in the high-risk group ( Figure 6B). The general immune cell types and immune response were significantly lower in high-risk double-negative (with EGFR wild and low PD-L1 expression) LUAD and LUSC, suggesting immunosuppression statuses in the high-risk patients.
Besides, The Spearman correlation analysis also indicated a negative relationship between risk score and multiple immune infiltration cells. As shown in Figure 7, most recognized immune infiltrating cells were negatively correlated with risk scores, including diverse T cells, B cells, and Macrophage. The Wilcoxon signed-rank test also confirmed the analysis of ssGSEA. In Figure 8, immunosuppression statuses in the highrisk patients were revealed by the lower levels of extensive immune infiltration cells such as T cell CD4+ central memory, T cell CD4+ Th2, T cell CD8+, T cell CD8+ naive, myeloid dendritic cell, and macrophage.

The Immunologic Landscape of TMB and Its Correlation With the Prognostic Signature
The definition of TMB is the number of mutations per DNA megabases (Mb). TMB has been reported to closely influence the response to ICIs (27)(28)(29) because the highly mutated burdens are associated with abundant neoantigens and susceptibility to immune cells (29,30).
Considering the extensive association between TMB and the tumor immune microenvironment, we further tested the  immunologic landscape of TMB and its correlation with the prognostic signature. As shown in Figure 4A, the high-risk group had a higher TMB level than the low-risk group, indicating a reverse TMB tendency compared with immunology status in the high-risk patients. When TMB alone was used as a prognostic indicator, there was no significant difference in survival between the low-and high-TMB groups by taking the median TMB value as cutoff ( Figure 4B). However, when the signature groups were added to TMB, the prognosis of patients in different subgroups showed distent survival differences ( Figure 4C). Patients with high-TMB and low-risk had the best prognosis among all the groups.
The  Figure 6C). Significant results (P < 0.05) were selected for subsequent analysis. Taking the median TMB value as a cutoff, the relative expression of 22 tumor-infiltrating immune cells in the low-and high-TMB samples were determined ( Figure 6D). B cells memory, T cells CD8, T cells CD4 memory activated, Tregs, Macrophages M0, Macrophages M1, and Dendritic cells resting showed infiltration differences relating to the two TMB levels. It is worth mentioning that B cells memory, T cells CD8, T cells CD4 memory activated, and Macrophages M1 play important roles in pro-inflammatory response while showing lower infiltration levels in the high-risk groups, which are consistent with the immunosuppressive status confirmed by ssGSEA.

Assessment of the Immunophenoscore From The Cancer Immunome Atlas to Predict the Response to ICIs
The negative correlation between risk score and immune infiltration cells was validated above. We further confirmed the relationship between risk score and immunotherapy response by TCIA. TCIA is a web-based database (https://tcia.at/home) that provides information on the cellular composition of tumor infiltrating lymphocytes, informing on response to checkpoint blocker immunotherapies of 20 solid cancers from TCGA. TCIA provided a composite score, the immunophenoscore, to reveal tumor immunologic heterogeneity of 20 valid cancers as an indication of tumor cell type might be susceptible to ICIs treatment. Higher immunophenoscores were positively correlated with better anti-cytotoxic lymphocyte antigen response to immunotherapies regardless of the expression of CTLA-4 and PD-1 ( Figures 3M-P).

DISCUSSION
As LUAD and LUSC patients with EGFR wild type can hardly respond to TKIs treatment, immune checkpoint inhibitors (ICIs) are the optimal first-line choices for them. For patients with double-negative, which means "EGFR wild type and low PD-L1 expression," combing PD-L1 antibody therapy and chemotherapy can improve outcomes but with more toxicities (31). Thus, new therapeutic strategies are needed for better efficacy.
Our study generated a prognostic DEMGs signature by comprehensive analyses of 640 LUAD and LUSC patients from TCGA, which were also validated in cohorts of the GEO database. Previous studies indicated that patients with EGFR wild type and low PD-L1 expression owned immunosuppressive status associated with less immune checkpoint protein expressions and lymphocyte infiltration (32,33). Meanwhile, our results showed that the high-risk patients evaluated by the DEMGs signature also expressed a similar immunosuppression status. The immune infiltration cell expressions evaluated by multiple methods were negatively related to risk score, and the pro-inflammatory factors like B cells memory, T cells CD8, T cells CD4 memory activated, and Macrophages M1 showed lower infiltration levels in the high-risk group. CD4+ memory T cells and B cells are localized and enriched in tertiary lymphoid structures, showing benefits on tumors' prognosis (34,35). Memory B cells function in both naïve and memory T cell responses as antigen-presenting cells, thus inducing an antitumor immune response (36). Besides, we found that although all the 18 signature genes had relatively stable mutation status in LUAD and LUSC, diverse forms of SCNAs in these genes could significantly inhibit immune infiltration in LUAD and LUSC. These may explain the underlying mechanism of antitumor immune deficiency in the double-negative highrisk patients.
Notably, studies reported that TMB could be used as a genomic biomarker that predicts favorable responses to ICIs. In a meta-analysis, patients with higher TMB who underwent ICIs had a better OS and PFS than those who received chemotherapy alone. While in patients with lower TMB, such ICI benefits were not statistically significant (37). In advanced    NSCLC patients, this association between high-level TMB and clinical benefit of ICIs was also observed (38). Moreover, higher TMB has been proved to be associated with a lower proportion of antitumor immune cells like macrophages M1, CD8 T cells, and B cells in papillary thyroid carcinoma (39), which was also confirmed by our result. Extensive analysis showed that different TMB levels in the double-negative LUAD and LUSC patients were correlated to variable immune infiltration cells.
Combining the signature with TMB, poor prognostic patients could be better identified, making our signature a new predictive biomarker for ICIs benefits in the double-negative LUAD and LUSC patients. The biological roles of 18 DEMGs signature genes are not fully investigated. Some of the signature genes are reported to be involved in tumor development. For example, ADCY9 encodes the protein adenyl cyclase type 9, which is related to the activation of intracellular production of cyclic AMP (40). ADCY9 has been reported to involve in the development of tumors. A higher expression level of ADCY9 was found in colon cancer and was an indicator of a bad prognosis (41). GCLC encodes a rate-limited enzyme in the GSH biosynthesis (42).  Abnormal expression of GCLC was reported in multiple types of tumors (43,44). ALDOA is coding genes of Aldolase A that involves in the glycolysis of cancer cells (45). It is related to the poor survival of various types of tumors, including pancreatic cancer, osteosarcoma, and LUSC (46)(47)(48). Similarly, TXNRD1 is also highly expressed in diverse malignancies and promotes tumor progression (49,50). UCK2, a type of rate-limiting enzyme of pyrimidine-nucleotide biosynthesis, is detected to up-regulate in several types of tumors, including neuroblastoma and hepatocellular carcinoma (51,52). Specifically, a high expression level of UCK2 has also been identified in stage IA lung cancer related to early recurrence, poor first progression survival, and short overall survival (53). In previous studies, PFKP is differentially expressed in glucose metabolic reprogramming in some cancers (54,55). Namely, PFKP is lowly expressed in seminomas and embryonal carcinomas but highly expressed in human breast tumor cells and lung cancer (56). ISYNA1 is associated with p53-related apoptosis in cancers like lung squamous cell carcinoma, bladder cancer, and pancreatic cancer (57,58), affecting tumor proliferation and clinical parameters. ACP5 encodes a metalloprotein enzyme that belongs to the acid phosphatase family, up-regulated in breast cancer and LUAD, associated with poor outcomes (59,60).
In conclusion, metabolic genes play an essential role in risk evaluation of LUAD and LUSC patients with EGFR wild type, low PD-L1 expression. Higher TMB and lower antitumor immune infiltration were found in the high-risk group, indicating poor prognosis in OS. Our related metabolic signature provides an adaptable way to identify potential therapeutic targets, especially SCNAs of the signature genes that could significantly inhibit the immune cells' infiltration. More biological roles of these metabolic genes are intended to explore for extended clinical significance.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary material. Further inquiries can be directed to the corresponding author.

AUTHOR CONTRIBUTIONS
JZ, MW, and FZ handled the conceptualization and methodology. JZ and MW handled the software. MW and JX handled the validation. FZ was in charge of the original draft preparation. JZ and JX were responsible for the review and editing. All authors contributed to the article and approved the submitted version.