Identification of a Prognostic Model Based on Immune-Related Genes of Lung Squamous Cell Carcinoma

Immune-related genes (IRGs) play considerable roles in tumor immune microenvironment (IME). This research aimed to discover the differentially expressed immune-related genes (DEIRGs) based on the Cox predictive model to predict survival for lung squamous cell carcinoma (LUSC) through bioinformatics analysis. First of all, the differentially expressed genes (DEGs) were acquired based on The Cancer Genome Atlas (TCGA) using the limma R package, the DEIRGs were obtained from the ImmPort database, whereas the differentially expressed transcription factors (DETFs) were acquired from the Cistrome database. Thereafter, a TFs-mediated IRGs network was constructed to identify the candidate mechanisms for those DEIRGs in LUSC at molecular level. Moreover, Gene Ontology (GO), together with Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis, was conducted for exploring those functional enrichments for DEIRGs. Besides, univariate as well as multivariate Cox regression analysis was conducted for establishing a prediction model for DEIRGs biomarkers. In addition, the relationship between the prognostic model and immunocytes was further explored through immunocyte correlation analysis. In total, 3,599 DEGs, 223 DEIRGs, and 46 DETFs were obtained from LUSC tissues and adjacent non-carcinoma tissues. According to multivariate Cox regression analysis, 10 DEIRGs (including CALCB, GCGR, HTR3A, AMH, VGF, SEMA3B, NRTN, ENG, ACVRL1, and NR4A1) were retrieved to establish a prognostic model for LUSC. Immunocyte infiltration analysis showed that dendritic cells and neutrophils were positively correlated with IRGs, which possibly exerted an important part within the IME of LUSC. Our study identifies a prognostic model based on IRGs, which is then used to predict LUSC prognosis and analyze immunocyte infiltration. This may provide a novel insight for exploring the potential IRGs in the IME of LUSC.

Immune-related genes (IRGs) play considerable roles in tumor immune microenvironment (IME). This research aimed to discover the differentially expressed immune-related genes (DEIRGs) based on the Cox predictive model to predict survival for lung squamous cell carcinoma (LUSC) through bioinformatics analysis. First of all, the differentially expressed genes (DEGs) were acquired based on The Cancer Genome Atlas (TCGA) using the limma R package, the DEIRGs were obtained from the ImmPort database, whereas the differentially expressed transcription factors (DETFs) were acquired from the Cistrome database. Thereafter, a TFs-mediated IRGs network was constructed to identify the candidate mechanisms for those DEIRGs in LUSC at molecular level. Moreover, Gene Ontology (GO), together with Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis, was conducted for exploring those functional enrichments for DEIRGs. Besides, univariate as well as multivariate Cox regression analysis was conducted for establishing a prediction model for DEIRGs biomarkers. In addition, the relationship between the prognostic model and immunocytes was further explored through immunocyte correlation analysis. In total, 3,599 DEGs, 223 DEIRGs, and 46 DETFs were obtained from LUSC tissues and adjacent non-carcinoma tissues. According to multivariate Cox regression analysis, 10 DEIRGs (including CALCB, GCGR, HTR3A, AMH, VGF, SEMA3B, NRTN, ENG, ACVRL1, and NR4A1) were retrieved to establish a prognostic model for LUSC. Immunocyte infiltration analysis showed that dendritic cells and neutrophils were positively correlated with IRGs, which possibly exerted an important part within the IME of LUSC. Our study identifies a prognostic model based on IRGs, which is then used to predict LUSC prognosis and analyze immunocyte infiltration. This may provide a novel insight for exploring the potential IRGs in the IME of LUSC.

INTRODUCTION
Lung cancer remains a leading factor leading to cancer-related deaths worldwide (1). Lung cancer is associated with a high mortality compared with that of breast cancer (BC), prostate cancer (PCa), colorectal cancer (CRC), and leukemia (1,2). Lung squamous cell carcinoma (LUSC) occupies about 20-30% nonsmall cell lung cancer (NSCLC) cases, which causes an annual of 400,000 deaths around the world (3,4). For TNM stage II LUSC cases, the survival at 5 years is 40%, while that for LUSC cases at pathological TNM stage IV is <5% (5). Moreover, the prognostic biomarkers that can be used in a prediction model for LUSC patients are still lacking (6,7).
Recently, immunotherapy has been widely recognized to be the efficient therapy for many cancer types (8)(9)(10)(11). Recently, Fan et al. had identified reliable markers for predicting the immunotherapy effect on non-small cell lung cancer (NSCLC) (12). Currently, immunotherapy has been considered as the potentially efficient therapy in tumor patients (12). Prat et al. estimated the correlations of immune-related genes (IRGs) expression profiles in squamous NSCLC (sqNSCLC) cases with advanced non-squamous NSCLC (non-sqNSCLC) after PD-1 blockade (13). Furthermore, several clinical studies promote tumor immunology development within LUSC (14). Recently, Li et al. used IRGPs to construct the personalized prognostic model to predict the prognosis for early non-sqNSCLC patients (15). However, the prognostic significance of IRGs and clinical relevance in LUSC have not been illustrated so far.
This study aimed to obtain the differentially expressed genes (DEGs), differentially expressed IRGs (DEIRGs), and differentially expressed TFs (DETFs) in LUSC, so as to establish a Cox prediction model based on the DEIRGs to predict the prognosis for LUSC. The regulatory network between DEIRGs and DETFs possibly exerts an important part in exploring the underlying mechanisms at molecular level. Meanwhile, correlation analysis of immunocytes and risk score also sheds new light on the tumor immune microenvironment (IME) status.

Differential Expression Analysis in LUSC
All transcriptome RNA-Seq data, IRGs, and cancer TF targets were differentially analyzed using the "limma R" package (http://www.bioconductor.org/packages/release/bioc/ html/limma.html) (18), according to the thresholds of adjusted false discovery rate (FDR) P-value of <0.01 and absolute fold change (log2) of >2. DEIRGs were obtained from DEGs based on the ImmPort database, whereas DETFs were extracted from DEGs using the Cistrome database.

Functional Analyses for DEIRGs in the Context of LUSC
For exploring the functions among those DEIRGs in terms of their expression profiles, Gene Ontology (GO), together with Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis, was conducted on DEIRGs using the Database for Annotation, Visualization, and Integrated Discovery (DAVID: https://david.ncifcrf.gov/) (19). Upon GO analysis, a difference of P< 0.05 indicated statistical significance. Furthermore, the GOCircle as well as GOChord plotting was obtained using GOplot R package (https://cran.r-project. org/web/packages/GOplot/citation.html) (20). In addition, KEGG pathway enrichment analysis was performed using the "cluster profile R" package (http://www.bioconductor. org/packages/release/bioc/html/clusterProfiler.html) (21), and a difference of P<0.05 indicated statistical significance. For additionally exploring the associations of DEIRGs with KEGG pathways, the Cytoscape software (version 3.6.1) was employed to construct the pathway-IRGs network for visual analysis.

Prognosis-Related DEIRGs and TFs Mediated IRGs Regulatory Network
We used the "survival R" package to perform the univariate Cox regression analysis to obtain prognosis-related DEIRGs in LUSC (P < 0.05). TFs stand for the significant molecules that can regulate gene expression level directly. Therefore, exploring the mechanism of TFs in the regulation of prognosisrelated DEIRGs is of great necessity. The Cistrome Cancer database provides the regulatory interactions between TFs and transcriptomes from TCGA profiles (http://cistrome.org/ CistromeCancer/CancerTarget/) (17). To further examine the molecular regulatory mechanisms of prognosis-related DEIRGs, a co-expression network between prognosis-related DEIRGs and DETFs was constructed, according to the thresholds of p-value filter of <0.001 and standard coefficient filter of >0.4.      cases were classified into low or high-risk group according to the median risk score value. The receiver operating characteristics (ROC) curve is based on the specificity and sensitivity of various critical values of continuous diagnostic tests judged according to the binary gold standard (22). To evaluate the specificity and sensitivity of the prognostic model, the ROC curve was performed to examine the signature of DEIRGs (low vs. high risk) on overall survival (OS). Moreover, the area under the ROC curve (AUC) values were determined for evaluating the prognostic model to reveal the prognostic biomarkers in LUSC, with values of 0.5-0.7 representing moderate, 0.7-0.9 representing better, and more than 0.9 representing superior values.

Correlation Analysis Between Clinical Features and DEIRGs in Prediction Model of LUSC
In order to further explore the correlations between DEIRGs in prediction model and clinical characteristics of LUSC, we used the "beeswarm" R package to analyze the correlations between clinical features (age, gender, pathological T stage, pathological N stage, pathological M stage, and pathological TNM stage) and the expression levels of 10 DEIRGs in the prediction model.

Infiltration of Immunocytes
Tumor Immune Estimation Resource (TIMER), an online database, is able to analyze and visualize the tumor-infiltrating immunocyte levels (23). It reanalyzes gene expression data of 10,897 TCGA samples from 32 types of cancers for identifying correlation between immunocyte infiltration and other characteristics, incorporating dendritic cells, neutrophils, macrophages, CD4 T cells, CD8 T cells, and B cells (https:// zenodo.org/record/57669#.Xeezu9V5uMo).

Statistical Analysis
The "limma R" package in R software was used for differential analysis, whereas the "GOplot R" and "cluster profile R" packages were adopted for functional enrichment analysis. The prediction model was constructed and applied in univariate as well as multivariate Cox regression analysis. Besides, the "survival ROC R, " "survival R, " "risk Plot R, " and "beeswarm R" packages were adopted to validate the prognostic model in LUSC. The independent t-test was performed to validate the heterogeneities in clinical characteristics. A difference of P < 0.05 indicated statistical significance.

DEGs, DEIRGs, and DETFs
In the present study, a total of 551 tissues were analyzed, which included 502 LUSC as well as 49 normal tissue specimens.  (Figures 1A-F). Figure 2 presents the flow diagram of this research.

Functional Analyses for DEIRGs
For explaining biological functions of DEIRGs in LUSC patients, functional enrichment analyses were conducted. According to GO analysis results, 5 GOs displayed significant difference (P < 0.05), among which, "GO: 0005576 extracellular region" was the most significant GO term (Figures 3A,C). Figure 3B shows the correlations between the top 30 statistically significant DEIRGs and corresponding GO terms. Furthermore, the "cluster profile R" package was used for KEGG pathway enrichment analyses. The dot plot shows the 10 most significant pathways with the highest enrichment levels of DERGs within the KEGG database ( Figure 3D).  In addition, the bar plot indicates the 12 most significant pathways with the highest enrichment levels of DEIRGs within the KEGG database ( Figure 3E). Those 21 statistically significant pathways in the KEGG database were selected to construct the "pathway-DEIRGs" network ( Figure 3F). The 21 statistically significant pathways in the KEGG database are shown in Table 4. A difference of P < 0.05 indicated statistical significance.

Univariate Cox Regression Analysis and Regulatory Network of Prognosis-Related DEIRGs and DETFs
Univariate Cox regression analysis was analyzed to identify prognosis-related DEIRGs in LUSC (P < 0.05). Then, 37 OSrelated DEIRGs were identified, incorporating 31 high-risk DEIRGs and 6 low-risk DEIRGs (Figure 4A). For exploring the molecular mechanism of prognosis-related DEIRGs, we constructed the TFs-IRGs regulatory network. A total of 26 prognosis-related DEIRGs and 13 DETFs were shown in the network ( Figure 4B). As shown in Figure 4B, CENPA had a negatively relationship with A2M, TIE1, ENG, and ACVRL1. ARRB1 had a negatively relationship with TP63, SNAI2. SOX2 had a negatively relationship with ICAM1 and TNFRSF10C. The coefficient filter >0.4 and the p-value filter <0.001 were set as the threshold to indicate statistical significance. Table 5 shows the regulatory network between DETFs and prognosis-related DEIRGs in LUSC.

Establishment of the 10 DEIRGs-Based Prediction Model in LUSC
The prognostic DEIRGs were screened through univariate Cox regression analyses in LUSC, including 37 OS-related DEIRGs (P < 0.05) ( Figure 4A). Then, the 37 DEIRGs were selected to incorporate into multivariate Cox regression analysis, which suggested that 10 DEIRGs might serve to be the prognostic factors to independently predict LUSC prognosis. These 10 DEIRGs were finally screened for constructing the prediction model ( Table 6). Besides, expression profiles of these 10 DEIRGs were then linearly combined to build up the prediction model (F) the significantly statistically different 21 pathways were used Cytoscape software for constructing a pathway-IRG network with differential expression IRGs. The green rectangles indicate the pathways, the red circles indicate the up-regulation differential expression IRGs, the blue circles indicate the down-regulation differential expression IRGs.       Table 6. Based on the median riskscore value, 431 cases who had intact survival time and status data were classified as high-(n = 215) or low-(n = 216) risk group. According to survival analysis based on the prediction model with the 10 DEIRGs, LUSC cases of high-risk group were associated with notably poor prognosis compared with those of low-risk group (P = 0) ( Figure 5A). To evaluate the specificity and sensitivity of the prognostic model, we performed the ROC curve, for which the AUC value was 0.709, illustrating that the DEIRGs-based prediction model achieved better accuracy in survival monitoring ( Figure 5B). The riskscore curve and survival status data of both groups of patients are exhibited in Figures 5C,D, respectively. As shown in Figure 5E, the expression of 10 DEIRGs was profiled.

Independent Prognosis Analysis
The prognostic IRGs were screened for predicting the prognosis for LUSC cases. Eventually, altogether 37 DEIRGs showed significant correlation with overall survival (OS) (P < 0.05) ( Figure 6A). Meanwhile, univariate independent prognostic analysis showed that, pathological M stage and the riskscore were related to OS, and the difference was of statistical significance (P < 0.001). Moreover, the multivariate independent prognostic analysis showed that, pathological M stage and the riskscore might serve as the independent prognostic factors to predict the survival for LUSC (Table 7; Figure 6B). The bold values represent that the prediction model of IRGs could be act as independent prognostic factors.

Relationships Between Differential IRGs in Prediction Models and Clinical Features in LUSC
Relationships between DEIRGs in risk model and clinical characteristics, including gender, age, pathological classification, pathological T stage, pathological N stage, and pathological M stage, were analyzed ( Table 8). As observed from Figure 7, the expression levels of AMH, CALCB, ACVRL1 and NR4A1 were significantly different in LUSC at pathological I-II stage compared with those at III-IV stage (P < 0.05) (Figures 7A-D).
The expression levels of SEMA3B, AMH, CALCB, and GCGR were significantly different in LUSC at pathological M0 stage relative to those at M1 stage (P < 0.05) (Figures 7I-L). The expression of VGF was conspicuously different in LUSC at pathological N0 stage relative to that at N1-N3 stage (P = 0.005) ( Figure 7M).

Correlation Analysis Between DEIRGs in Prediction Model and Immunocyte Infiltration in LUSC
To figure out whether DEIRGs precisely reflected the status of LUSC IME, correlation analysis was carried out to examine the relationship between DEIRGs in the LUSC prediction model and immunocyte infiltration (Figure 8). As shown in Figure 8, B cells, CD4-T cells, CD8-Tcells, Macrophages were not associated with the riskScore of the prediction model (P > 0.05) (Figures 8A-E). Dendritic cells and neutrophils had a positively relationship with DEIRGs in prediction model (P < 0.05) (Figures 8D,F).

DISCUSSION
In recent years, IRGs show increasing importance to cancer development and immunotherapies (24)(25)(26)(27). However, transcriptome studies on IRGs, the relationships of IRGs with clinical characteristics, and the molecular mechanisms have not been performed yet. In the present work, the Cox prediction model was established for revealing IRGs specific to LUSC, so as to predict LUSC prognosis. The regulatory network between IRGs and TFs revealed the potential novel molecular mechanisms in LUSC. In this study, the DEIRGs obtained in LUSC might play a vital role in predicting the prognosis for LUSC. More importantly, an individualized Cox prediction model with DEIRGs was adopted for measuring immunocyte infiltration and evaluating the clinical prognosis.
In recent years, the prognostic or predictive biomarkers associated with the tumor IME are promising to identify new molecular targets and to enhance the treatment for patients during immunotherapy development (28)(29)(30)(31)(32)(33)(34). Several recent studies reveal the prognostic biomarkers in tumor IME for predicting tumor prognosis. Li et al. found that four IRGs were identified as the biomarkers to predict the prognosis for breast cancer (35). Pan et al. discovered that 149 immune genes were identified as the prognostic genes in tumor IME to predict ESCC prognosis (36). Yang et al. suggested that the diagnostic immune score (DIS) and prognostic immune score (PIS) showed diagnostic and prognostic significance for cancers in the digestive system (37). Nowadays, prognostic biomarkers related to the tumor IME in lung cancer are still lacking.
A study demonstrates that the NSCLC IME may be adopted to be the potential prognostic biomarkers to predict patient prognosis after receiving immune checkpoint inhibitor treatments (37). However, the molecular mechanism of prognosis-related DEIRGs associated with DETFs in LUSC tumor IME is not examined yet. The present study was carried out to explore DEIRGs and establish the IRGs-based prognostic model in LUSC IME to reveal the prognostic biomarkers to predict LUSC diagnosis and prognosis.
According to functional enrichment analysis in this work, DEIRGs showed the highest enrichment levels within tumorrelated typical pathways, including the JAK-STAT signal transduction pathway, the TGF-β signal transduction pathway, the PI3K-Akt signal transduction pathway, the MAPK signal transduction pathway, and the TNF signal transduction pathway. According to recent research, alterations in MET-activation and JAK2-inactivation are the independent factors that affect the response to immune checkpoint inhibitors like PD-L1 in lung cancer (25). As suggested in one study, the combination of MEK and PD-L1 inhibitors in pre-clinical and ex-vivo NSCLC models exerts an important part in predicting patient sensitivity to such therapies (38).
For further exploring the underlying mechanisms of DEIRGs at molecular level, the TF-mediated IRGs network was constructed in the present study, so as to reveal the significant TFs regulating DEIRGs in this network. FOXA2, TP63, FLI1, TCF21, EPAS1, ERG, GATA6, FOS, CENPA, SOX2, RXRG, NR4A1, and SNAI2 were the DETFs that might regulate the DEIRGs in LUSC. Tang et al. discovered that curcumin inhibited the growth of human NCI-H292 LUSCs by up-regulating FOXA2 expression (39). FLI1 acts as a novel oncogenic diver to promote the metastasis of small cell lung cancer (SCLC). LncRNA LINC00163 serves as the tumor suppressor through transcriptionally up-regulating TCF21 expression in inhibiting the development of lung cancer (40). In addition, the hypoxic-stabilized EPAS1 proteins transactivate DNMT1, which further promotes the hypermethylation of EPAS1 promoter and down-regulates EPAS1 mRNA expression in NSCLC (41). A recent study showed that CENPA could act as a novel diagnostic biomarker in lung adenocarcinoma (42). Compared with previous studies, our study had first constructed the co-expression DETFs-prognosis-related DEIRGs regulatory network in LUSC using bioinformatics analysis. From the network in our study, the DETFs positively and negatively regulated the DEIRGs, which shed new light on exploring the DEIRGs mechanisms in LUSC at molecular level.
In our study, univariate as well as multivariate Cox regression analysis was carried out to constructed the DEIRGs-based Cox prediction model. Eventually, the 10 DEIRGs in prediction model played critical parts in predicting LUSC prognosis. In addition, the AUC was 0.709, while the P-value between high and low risk groups was 0, which indicated that the Cox prediction model might be able to accurately estimate LUSC prognosis. Univariate as well as multivariate independent  prognosis analysis in our study indicated that, the pathological M stage and the riskScore of the Cox prediction model might serve as the independent prognostic factors to predict LUSC prognosis. Furthermore, 10 DEIRGs were applied for correlation analysis between the expression profiles of IRGs and clinical characteristics. Semaphorin 3B (SEMA3B) can be used to be the tumor suppressor gene to suppress the Akt signal transduction pathway, which is achieved via the neuropilin-1 receptor in lung cancer cells (43). The AMH/AMHR2 axis provides a novel insight to illustrate the TGF-β/BMP resistance-associated signaling in NSCLC (44). Epigenetic modifications facilitate the expression of VGF, up-regulate its protein expression, and promote epithelial-mesenchymal transition (EMT) progression as well as kinase inhibitor resistance within NSCLC (45). GCGR acts as a member of the prognostic model, which can exert an important part in predicting LUSC survival (46). In addition, NR4A1 exerts an important part in the regulation of TGFβinduced invasion and migration of lung cancer cells (47).
Our study first used bioinformatic analysis to integrate the clinical characteristics of LUSC with the expression profiles of 10 DEIRGs to explore the statistically significant DEIRGs for forecasting LUSC diagnosis and prognosis. Finally, immunocyte correlation analysis was conducted using the contents of the TIMER database of six types of immunocytes. According to our results, dendritic cells and neutrophils exhibited a significantly positive regulatory relationship with the riskscore of the Cox prediction model. Compared with previous studies, the present study presented the new signature in which DEIRGs were selected as the center, which might be used to predict LUSC prognosis. Furthermore, the DEIRGs might act as the prognostic biomarkers and immune status monitor for predicting LUSC prognosis. We explored the relationships between DEIRGs in prediction model and immunocyte infiltration to reflect the status of IME in LUSC. Dendritic cells and neutrophils were positively correlated with DEIRGs in prediction model, which indicated that the high infiltration levels of dendritic cells as well as neutrophils might be identified in high-risk LUSC patients. These results showed that DEIRGs in prediction model could act as predictor for predicting immunocyte infiltration in LUSC. A study demonstrated that STAT3 and NF-κB signaling pathways were simultaneously attenuated in dendritic cells of lung cancer (48). A study showed that the tumor-associated CD66b neutrophils were correlated with adverse prognostic factors of NSCLC (49). The number of mature dendritic cells were positively correlated with survival time in NSCLC patients (50), While our findings showed that dendritic cells had a positively relationship with the riskScore of the prediction model. These results demonstrated that the prediction model based on DEIRGs could predict the status of immunocyte infiltration in LUSC.
Our prognostic model, which was constructed based on 10 DEIRGs in LUSC, indicated favorable clinical viability. It showed that DEIRGs performed moderately in the ROC curve, and were associated with age, gender, pathological TNM stage, and metastasis. This predictive model may provide a treatment plan based on the immunocyte infiltration degree revealed by DEIRGs.
In conclusion, our study identifies the DEGs, DEIRGs, and DETFs by bioinformatics analysis from TCGA, ImmPort and Cistrome databases. A TFs-IRGs network is performed to reveal the possible mechanism for DEIRGs in LUSC at molecular level. Additionally, the Cox prediction model is constructed for identifying the prognostic independent factors to predict LUSC prognosis. Immunocyte correlation analysis is also performed to identify the relationships between the immune status and the clinical outcomes for LUSC patients.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/supplementary material.