Identification of Key Genes Related to CD8+ T-Cell Infiltration as Prognostic Biomarkers for Lung Adenocarcinoma

Background CD8+ T cells are one of the central effector cells in the immune microenvironment. CD8+ T cells play a vital role in the development and progression of lung adenocarcinoma (LUAD). This study aimed to explore the key genes related to CD8+ T-cell infiltration in LUAD and to develop a novel prognosis model based on these genes. Methods With the use of the LUAD dataset from The Cancer Genome Atlas (TCGA), the differentially expressed genes (DEGs) were analyzed, and a co-expression network was constructed by weighted gene co-expression network analysis (WGCNA). Combined with the CIBERSORT algorithm, the gene module in WGCNA, which was the most significantly correlated with CD8+ T cells, was selected for the subsequent analyses. Key genes were then identified by co-expression network analysis, protein–protein interactions network analysis, and least absolute shrinkage and selection operator (Lasso)-penalized Cox regression analysis. A risk assessment model was built based on these key genes and then validated by the dataset from the Gene Expression Omnibus (GEO) database and multiple fluorescence in situ hybridization experiments of a tissue microarray. Results Five key genes (MZT2A, ALG3, ATIC, GPI, and GAPDH) related to prognosis and CD8+ T-cell infiltration were identified, and a risk assessment model was established based on them. We found that the risk score could well predict the prognosis of LUAD, and the risk score was negatively related to CD8+ T-cell infiltration and correlated with the advanced tumor stage. The results of the GEO database and tissue microarray were consistent with those of TCGA. Furthermore, the risk score was higher significantly in tumor tissues than in adjacent lung tissues and was correlated with the advanced tumor stage. Conclusions This study may provide a novel risk assessment model for prognosis prediction and a new perspective to explore the mechanism of tumor immune microenvironment related to CD8+ T-cell infiltration in LUAD.


INTRODUCTION
Lung adenocarcinoma (LUAD) is the most common type of lung cancer, accounting for 40% of all lung cancers (1)(2)(3). In recent years, the development of immunotherapy has changed the landscape of non-small cell lung cancer (NSCLC) therapy (4)(5)(6). Notably, the immunotherapy effects mainly rely on the immune responses, which are significantly influenced by the tumor microenvironment (7,8). CD8+ T cells are central effector cells in the tumor microenvironment, and previous studies have reported that highly infiltrating CD8+ T cells are beneficial to prognosis in most tumors, including LUAD (9)(10)(11)(12)(13)(14). However, the mechanism of CD8+ T-cell infiltration in the tumor microenvironment in LUAD is still unclear. Therefore, identifying novel biomarkers related to CD8+ T-cell infiltration may help explore the immune infiltration mechanism in LUAD.
With the rapid development of bioinformatics, new tools have arisen to identify novel biomarkers (15)(16)(17)(18)(19)(20)(21). For example, weighted gene co-expression network analysis (WGCNA) is an effective tool that mines related patterns between genes to identify relevant modules and hub genes in cancer (16), and it has been widely used to find biomarkers at the transcriptional level (17,18). Another bioinformatics tool, namely, Cell Type Identification by Estimating Relative Subsets of RNA Transcripts (CIBERSORT), is used to quantify the cellular composition of immune cells using a deconvolution algorithm based on gene expression data (19). This algorithm has been successfully used to approximate the level of immune cell infiltration in various cancers, such as prostate cancer and renal cancer (20,21).
Previously, many studies have focused on exploring the immunegenomic biomarkers, which may direct immunotherapy. For example, tumor mutational burden (TMB) may be a preferable choice for directing the first-line immuno-oncology agent management of advanced non-oncogene-addicted NSCLC patients (22,23). However, fewer studies are focusing on exploring prognostic biomarkers from the aspect of immune cell infiltration, which could be used not only to estimate prognosis but also to direct immunotherapy. In this study, to identify the hub genes related to CD8+ T immune cell infiltration and potential biomarkers of LUAD, we first used WGCNA to obtain differentiated gene expression modules based on gene expression data in The Cancer Genome Atlas (TCGA) database. The CIBERSORT algorithm was used to calculate the T-cell compositions of the samples. Those important modules and hub genes related to CD8+ T-cell infiltration were identified by correlation analysis of the WGCNA and CIBERSORT algorithm results. Furthermore, the immune and clinical characteristics of the hub genes were verified, and a risk score model based on the hub genes was built, which were significantly related to the prognosis of LUAD after least absolute shrinkage and selection operator (Lasso) regression analysis and multivariable Cox analysis. The model's performance was evaluated using receiver operating characteristic (ROC) curves, calibration curve, and stratification analysis. Gene Expression Omnibus (GEO) datasets were then conducted for external validation. Furthermore, we performed multiple fluorescence in situ hybridization of 98 LUAD tissues and 82 adjacent tissues to further verify the results of bioinformatics analysis. This is the first time that the WGCNA and CIBERSORT algorithm were used to identify the relevant biomarkers of infiltration of CD8+ T cells in LUAD and to further build a LUAD prognosis prediction model.

Analysis of the Differential Gene Expression
Differentially expressed gene (DEG) analysis was performed using the R package "edgeR" and visualized by volcano plot and heatmap. The heatmap and the volcano plot were done with the R packages "pheatmap" and "EnhancedVolcano," respectively. The threshold of DEGs was set at |logFC| >1 and false discovery rate <0.05.

Co-Expression Network Construction by Weighted Gene Co-Expression Network Analysis
With the use of the R package "WGCNA," a weight co-expression network was constructed based on the expression value of 8,807 DEGs (16). According to Pearson's correlation value between paired genes, a similarity matrix containing the expression levels of individual transcripts was built. Then, based on the equation, adjacency between the paired genes = |Pearson's correlation between the paired genes| b , the similarity matrix was converted into an adjacency matrix. The parameter b could amplify differences of correlation between genes. When b = 4, the adjacency matrix was converted into a topological overlap matrix. Finally, we used a bottom-up algorithm to classify genes with similar expression patterns into different modules.

Construction of Module Feature Relationships
With the use of the R package "CIBERSORT," the proportions of 22 types of immune cells in the samples were deduced according to the expressions of genes. The expression of signature genes was extracted to form a signature gene expression matrix. Combined with the known immune cell signature, the immune cell proportions of samples were calculated using "CIBERSORT," and finally, the proportions of relevant subtypes of T cells were extracted. Furthermore, the correlations between genes of modules in WGCNA and the subtypes of T cells were calculated by Pearson's test. The modules most significantly correlated with CD8+ T cells were selected for the subsequent analyses.

Enrichment Analysis of Functions and Signaling Pathways
The enrichment of functions and signaling pathways of genes in the identified hub module was conducted using the R package "clusterProfiler," and the threshold was p-value <0.05 and q-value <0.2. After the enrichment pathways were determined, a bubble map was plotted.

Identification of Hub Genes Associated With Infiltration of CD8+ T Cells and Prognosis
To further determine the central nodes in the modules related to immune cell infiltration, we imported the co-expression network of relevant modules of WGCNA into Cytoscape (https:// cytoscape.org/) and then screened the genes with high nodes according to a different degree. Furthermore, approximately onethird of the total genes in the modules were selected as hub genes according to the threshold of degree = 260. Meanwhile, all genes in the hub module were imported into the STRING database (https://string-db.org/), and then a protein-protein interaction (PPI) network was constructed. The network was imported into Cytoscape to search central nodes, and approximately one-third of the total genes were selected as hub genes when the degree = 5. Finally, a Venn plot integrated the WGCNA and STRING database results to identify the hub genes. After acquiring the hub genes, we used univariate Cox regression analyses to preliminarily screen the hub genes associated with the prognosis, and the genes with statistical significance (p < 0.05) underwent Lasso-penalized Cox regression analysis for further dimension reduction. Genes with statistical significance (p < 0.05) in Lasso-penalized Cox regression analysis were considered key hub genes associated with CD8+ Tcell infiltration and selected for the subsequent analyses.

Construction and Validation of a Prognostic Risk Model
Based on key hub genes associated with CD8+ T-cell infiltration, a prognostic risk model was constructed. The risk score was calculated as follows: risk score = (b A × gene A expression) + (b B ×gene B expression) ··· + (b N × gene N expression). To evaluate the model's performance, the "survival" package was used to draw a calibration curve, and the "survivalROC" package was used to draw ROC curves. An area under the ROC curve (AUC) >0.6 was considered as a good performance of the model. Patients were divided into high-risk or low-risk groups according to the median value of the risk score, and the Kaplan-Meier method with log-rank test was used to test the prognostic significance of the risk score. p < 0.05 was considered statistically significant.
The prognostic model was then validated by the GSE72094 dataset from the GEO database. The risk scores of samples were calculated as the formula shown above. The ROC curve and the calibration curve were drawn to evaluate the performance, as well as the Kaplan-Meier method and the log-rank test were used to compare the prognostic significance between the high-risk group and the low-risk group. Because EGFR mutation status is critical in LUAD, we estimated the prognostic model in wild-type and mutation-type EGFR samples in the construction and validation cohorts.

Correlation Between Key Hub Genes and Subtypes of Immune Cells
The correlations of key hub genes and immune cell subtypes were calculated online using the TIMER database (http://timer. cistrome.org/). Then, together with the correlation data, a heatmap was plotted to show the correlations, and scatter diagrams were shown for different key hub genes.

External Validations of Protein Expression by Multicolor Immunofluorescence
The lung cancer tissue array with 82 pairs of matched cancerous and adjacent tissues, as well as an additional 16 cases of cancer tissues (HLugA180Su07), was obtained from Shanghai Outdo Biotech. To assess the expressions of key hub genes, multicolor immunofluorescence (mIHC) was performed using an Opal 7color fluorescent IHC kit (PerkinElmer) combined with automated quantitative analysis (AQUA; Genoptix). First, the concentrations and order of the five antibodies were optimized, and a spectral library was built based on single-stained slides. The slides were first deparaffinized with xylene and ethanol, and antigen retrieval was done using a microwave. After incubation with freshly made 3% H 2 O 2 for 10 min, the tissues were blocked in a blocking buffer for another 10 min at room temperature. Then the tissues were incubated with the primary antibodies, followed by secondary horseradish peroxidase (HRP) antibodies (Cell Signaling Technology) and an opal working solution (Akoya Biosciences). Primary antibodies recognizing the following antigens were used: MZT2A (1:20; Abcam), ALG3 (1:25; Abcam), GAPDH (1:1,500; Abcam), GPI (1:3,000; Abcam), and ATIC (1:200; Abcam). The slides were then mounted with ProLong Gold Antifade Reagent with DAPI and scanned using a confocal microscope (LEICA, Japan). Fluorescence images were acquired using a Vectra 2 intelligent slide analysis system using Vectra 2.0.8 (PerkinElmer). The mean fluorescence intensities (MFIs) of MZT2A, ALG3, GAPDH, GPI, and ATIC were measured.

Immunohistochemistry of CD8+ T Cell
The distribution of CD8+ T cells in the lung cancer tissue array (HLugA180Su07) was evaluated by immunohistochemistry staining. The tissue array was incubated in a dry oven at 63°C for approximately 1 h, deparaffinized in xylene, and then rehydrated with graded ethanol solutions. After antigen retrieval, the array was incubated with a primary antibody against CD8 (DAKO, IR623) overnight at 4°C in a humidified chamber, followed by incubation at room temperature for 30 min with the secondary antibody (Envision+/HRP, Rabbit, DAKO). Subsequently, the tissue array was incubated with a 3,3′-diaminobenzidine (DAB) solution for 5 min. Finally, the tissue array was counterstained with hematoxylin. The immunostained slide was evaluated by two experienced pathologists blinded to clinicopathological characteristics, and the percentage of CD8 positive cells was annotated.

Subgroup Analysis to Evaluate the Performance of the Model
To test the performance of the model, the risk score in different subgroups of age, sex, T stage, N stage, etc., was evaluated in TCGA dataset, GEO dataset, and the external validation dataset. Moreover, the Kaplan-Meier method and log-rank test were used to evaluate the performance of prognosis prediction in different subgroups. Differences in risk scores between different clinical characteristics were analyzed by GraphPad Prism 7.0. A Student's t-test was used for comparison between two groups. ANOVA was used for comparison between three or more groups. p <0.05 was considered statistically significant.

The Clinical Characteristics of The Cancer Genome Atlas Cohort
After exclusion of those cases with deficient clinical information, 529 cases were included in this study. Of these, 478 were cancer tissues and 51 were normal tissues. The clinical characteristics are shown in Supplementary Table 1. The flowchart of this study is shown in Figure 1.

Identification of Differentially Expressed Genes and Construction of Gene Co-Expression Network
After comparing the expressions of LUAD tissues with those of normal tissues in TCGA-LUAD cohort, we identified 8,807 DEGs, including 2,172 upregulated genes and 6,635 downregulated genes (Figures 2A, B). The gene co-expression network was then constructed using the 8,807 DEGs. b = 4 (scale-free R 2 > 0.85) was set as the soft-threshold power to build a scale-free network ( Figure 2D). Furthermore, we used the dynamic hybrid cutting method to construct a hierarchical clustering tree. Each leaf on the tree represented an individual gene, and genes with similar expression data were gathered to form a tree branch representing a gene module. Eleven modules were generated ( Figures 2C, E).

Identification of Key Module Related to CD8+ T-Cell Infiltration
The proportion of immune cells in each sample was calculated based on gene expression by CIBERSORT. Seven T-cell subtypes were included: CD8+ T cells, CD4 naive T cells, CD4 memory resting T cells, CD4 memory activated T cells, follicular helper T cells, regulatory T cells (Tregs), and gamma delta T cells. Significantly, no CD4 naive T cells and gamma delta T cells were found. The proportion of each T cell subtype was extracted as the phenotype data, and its associations with the WGCNA modules were analyzed. The highest correlations were found between genes in the pink modules (275 genes) and CD8+ T cells (R 2 = 0.22, p < 0.01). Hence, the genes in the pink modules were used in the subsequent analyses ( Figure 3).

Function Enrichment Analysis
In the pink module, 275 genes were analyzed by Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) function enrichment analyses. GO analysis showed that the main enriched pathways were RNA metabolic processes, glucose metabolic processes, mitochondrial matrix, mitochondrial inner membrane, heterogeneous enzymatic activity, and tRNA catalytic activity ( Figures 4A-C). The main pathways found enriched by KEGG were sugar metabolism and arginine/proline metabolism, and the enriched functions were mainly related to cell respiration ( Figure 4D).

Identification and Validation of Hub Genes Related to CD8+ T-Cell Infiltration
The genes in the pink module were imported into Cytoscape to build a co-expression network (Figure 5A), and a total of 93 hub genes were obtained at degree >260 ( Figure 5B). One hundred forty-eight interactions with the proteins encoded by the genes of the pink module were identified by the PPI network, and then 46 hub proteins were selected at the degree >5 ( Figure 5C). By overlapping the genes in Cytoscape and the PPI network, we acquired 117 hub genes related to CD8+ T-cell infiltration ( Figure 5D). Of them, VARS (originating from the PPI network) does not belong to the pink module or DEGs and thus was excluded from the subsequent analysis. Finally, 116 hub genes associated with CD8+ T-cell infiltration were obtained.

Identification of Prognosis-Related Key Genes and Construction of a Risk Assessment Model
To identify prognosis-related genes from the hub genes, 116 genes were subjected to univariable Cox regression analysis. In total, 34 genes were found significantly associated with the prognosis ( Figure 6A). We then used Lasso-penalized Cox regression analysis and identified five genes independently correlated with prognosis ( Figure 6B). Based on the expressions and correlation coefficients of these five genes, a risk assessment model was established, where Risk Score = MZT2A * 0.035 + ALG3 * 0.084 + ATIC * 0.104 + GPI * 0.125 + GAPDH * 0.134 ( Figure 6C).

Validation of Risk Assessment Model
Patients were divided into high-risk and low-risk groups, with the median score at 6.53. The patients with the high-risk group showed a poorer 5-year overall survival (OS) compared with the patients in the low-risk group (low-risk vs. high-risk = 44.6% vs. 31.0%, p < 0.01) ( Figure 7A). The ROC curves showed that the AUC of OS at 5-year was 0.643, which suggests that the prediction of the risk model has a good performance ( Figure 7B). The calibration curve of the model suggested that the predicted 5-year OS closely correlated with the actual 5-year OS ( Figure 7C). The subgroup analysis of the risk model suggested that the model had a good prediction performance in patients with wild-type EGFR status or mutation-type EGFR status (Figures 7D-I).
The GSE72094 dataset was used to validate the risk model (Supplementary Table 2). After scoring, the cases were divided into high-risk and low-risk groups, with the median score at 5.34. The result of the Kaplan-Meier curve was similar to that of TCGA cohort ( Figure 8A). Furthermore, the calibration curve, ROC curve, and the AUC (0.62) implied that this risk model had good prediction performance in the external dataset ( Figures 8B, C).
We also assessed the model in the patients with wild-type or mutation-type EGFR status. The results showed that the prognosis of the low-risk group did not have a significant difference from that of the high-risk group in the mutation-type cohort, which may be caused by the small sample size. However, the performance of the model in the wild-type cohort was good ( Figures 8D-I).
Since our bioinformatics analysis was based on RNA sequences, we performed a multicolor immunofluorescence (mIHC) experiment on 98 LUAD tissues and 82 adjacent tissues from the protein perspective to validate the model. The follow-up time of the cohort was 1-94 months, and the median follow-up time was 39 months (interquartile range: 15-57). The median survival time was 50 months. The baseline characteristics of the cohort are shown in Supplementary Table 3. The expressions of five proteins were primarily located in the cytoplasm (Figure 9). We calculated the risk score based on fluorescence intensities and then divided the cohort into the low-risk group and high-risk group according to the median risk score. The results showed that the prognosis of the high-risk group was significantly poorer than that of the low-risk group ( Figure 10A). The ROC curve, the calibration curve, and the AUC = 0.655 showed good performance ( Figures 10B, C). In the subgroup analysis of EGFR status, the wild-type EGFR cohort showed similar results with the overall cohort. However, the mutation-type EGFR cohort results did not show significant differences between the prognosis of the high-risk group and the low-risk group. In addition, the performance of the risk model was poor, which may be caused by the small sample size (Figures 10D-I). We found that the risk score was significantly higher in the advanced T stage, N stage, and TNM stage. Furthermore, we found that the risk score was significantly negatively correlated to the infiltration of CD8+ T cells, which validated our bioinformatics analysis ( Figure 11A). Figures 11B-E.    according to the EPIC algorithm (p < 0.01). These results implied that the expression of those genes might be negatively correlated with the infiltration of CD8+ T cells. The heatmap of key genes is shown in Figure 12.

Subgroup Analysis
The Kaplan-Meier curves were plotted in subgroups of data from TCGA database, including age, gender, T stage, N stage, M stage, and TNM stage. Although the prognosis of the high-risk group and the low-risk group was not significantly different in the age <60 group or the N0 group, the high-risk group had a poor prognosis in other subgroups ( Figure 13). Similarly, the model underwent subgroup analysis with the GEO and mIHC cohorts, and the same results were found (Figures 14, 15). The differences of risk scores among age, gender, M stage, N stage, T stage, and TNM stage were tested in TCGA cohort. The results showed that the risk scores were higher in men, M1 stage, N2 or N3 stage, T3 or T4 stage, and TNM stage III or IV (Supplementary Figure 1). Similarly, differences in risk scores were detected in the GEO cohort and the mIHC cohort. The results showed that the advanced TNM stage had a higher score. Furthermore, the distribution of the risk scores between different EGFR statuses was tested in the GEO cohort, and the results showed that the risk scores were significantly higher in the wild type. However, in the mIHC cohort, we found that the risk score distribution in EGFR status was not significantly different. Hence, whether these five genes are related to EGFR status needs further exploration (Supplementary Figures 2, 3).

DISCUSSION
LUAD is the most common type of lung cancer (24). Nowadays, surgery combined with chemotherapy, targeted therapy, or immunotherapy is the primary treatment strategy for most LUAD patients (25)(26)(27)(28)(29). For non-oncogene advanced LUAD, chemoimmunotherapy is an essential treatment strategy. In recent years, several reports found that the immune microenvironment plays a ALG3 (a-1,3-mannose glycosyl transferase) belongs to the ALG family and is located on the chromosomal region 3q27.1. ALG3 upregulation is related to lymph node metastasis of esophageal squamous cell carcinoma (35) and the proliferation of cervical cancer cells (36). ALG3 expression is higher in NSCLC tissues than in normal tissues and is associated with a higher T stage, lymph node metastasis, tissue differentiation, and prognosis (37). Similar to MZT2A, there are no reports on the relationship between ALG3 and CD8+ T-cell infiltration, which we will further explore.
ATIC (5-aminoimidazole-4-carboxamide ribonucleotide formyltransferase/IMP cyclohydrolase) encodes a bifunctional protein and catalyzes the last two steps in de novo synthesis of purines. ATIC is overexpressed in hepatic cell carcinoma and is associated with a poor prognosis in patients (38). The fused protein between ATIC and anaplasia lymphoma kinase (ALK, a common oncogene) was discovered in lymphoma patients (39,40). Interestingly, frame-shift mutations and missense mutations of ATIC were found in a case of radiation sensitivity, and biochemical research showed that purine biosynthesis involving ATIC might help with DNA damage repair (41). Hence, these results imply that ATIC may be  involved in tumorigenesis and may influence the survival of cancer cells. Additional research would be needed to explore the mechanism of ATIC and its relationship with CD8+ Tcell infiltration. GPI (glucose-6-phosphate isomerase) is an enzyme that participates in the glycolysis pathway. GPI is a cytoplast dimer that can catalyze the conversion from glucose-6-phosphate to fructose-6-phosphate. GPI is a protein similar to the autocrine movement factors involved in the migration and invasion of tumor cells and angiogenesis (42). In various cancers, the expression of GPI is induced by c-Myc, and HIF-1 is overexpressed at the same time (43,44). HIF-1 can induce GBE1 upregulation, which would decrease CCL5 and CXCL10 secretion, hindering the recruitment of CD8+ T lymphocytes (45,46). GPI can also induce the protein expression of matrix metalloproteinase-3 and thereby promote the invasiveness of tumors (47). GPI, which is overexpressed in renal cancer, plays a role in tumor progression and is negatively correlated with the clinical prognosis of patients (48). However, the role of GPI in lung cancer has not been investigated to date. GAPDH (glyceraldehyde-3-phosphate dehydrogenase) is one of the housekeeping proteins, and the mechanism of its anaerobic conversion to glucose critically regulates tissue regeneration and tumor growth (49,50). Cancer cells can persistently survive under metabolic stress, anoxia, or starvation; and their glycolysis capacity must be improved by the Warburg effect, such as to improve the activity of enzymes involved in this function. Herein, the total glycolysis flux rate is precisely decided by the conversion stage from GALP (glyceraldehyde-3-phosphate) to biphosphoglyceride and is regulated by the activity of GAPDH (51). As an essential factor in the speed-limiting step of glycolysis, it plays a pivotal role in the energy metabolism of cancer cells. Hence, increased GAPDH activity will increase glycolysis rate and promote tumor growth, leading to poor prognosis (52). Anoxia is one of the major phenomena during tumor growth and activates the HIF-1a transcription factor to upregulate GAPDH expression (53,54). In addition, upregulation of GAPDH may enhance HIF-1a transcription and activity, restricting the recruitment of CD8+ T lymphocytes (45,46,55). Moreover, the high activity of GAPDH increased the mobility of cancer cells, and epithelialmesenchymal transition (EMT) markers are significantly associated. Colon cancer cell chromatin immune precipitation experiments proved the direct interaction between GAPDH and SPI transcripts, leading to the upregulation of the main regulatory factor in EMT, the zinc finger protein SNAI1 (Snail) (56)(57)(58). The initiation of GAPDH synthesis may be a protection mechanism for tumor cells to regulate metabolism and improve survival under anoxic conditions. Therefore, our results indicate that the expression of GAPDH may influence prognosis.
Based on these five genes, we established a risk score model. We found that the risk score could reasonably predict the prognosis of LUAD, and it was negatively related to the CD8+ T-cell infiltration and correlated with the advanced tumor stage. These results implied that these five genes might play a role in the infiltration of CD8+ T cells into the immune microenvironment. Among these genes, GAPDH and GPI may influence the infiltration of CD8+ T cells through the HIF-1/GBE1 pathway (45,46,55). Furthermore, the risk score was significantly upregulated in tumor tissues and correlated with advanced tumor stage. The validation of the overall cohort results by GEO dataset and the cohort of tissue microarray was consistent with the results of TCGA. Because the EGFR status is critical in LUAD, we also performed the subgroup analysis of the EGFR status. The performance of the risk model in both wild type and mutation type was good in TCGA cohort, as well as in the GEO cohort. However, the prediction accuracy is deficient in patients with mutation-type EGFR status in the mIHC cohort due to the small sample size. In a future study, we will increase the sample size of patients with mutation-type EGFR status to verify the risk model.
Nevertheless, this study also has some limitations. First, the mechanisms about how the genes affect the infiltration of CD8+ T cells were not explored in this report, but they will be investigated in a future study. Secondly, we constructed a risk score model that depended on gene expression but did not consider gene mutation, methylation, or other genetic events that can affect the occurrence and progression of cancers. In a subsequent study, we may consider more genetic modification to make our risk model more accurate. Finally, a large-sample prospective study is needed to further validate the clinical applicability of the risk score.

CONCLUSIONS
In conclusion, this study may provide a novel risk assessment model for prognosis prediction and a new prospect for exploring the mechanism of tumor immune microenvironment related to CD8+ T-cell infiltration in LUAD.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.

ETHICS STATEMENT
The study was approved by the ethics committee of National Cancer Center, Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, and all enrolled patients had given written informed consent to participate in this study.

FUNDING
This work was supported by the National Key R&D Program of China (Grant No. 2020YFE0202200).