Discovery and Verification of an Immune-Related Gene Pairs Signature for Predicting Prognosis in Head and Neck Squamous Cell Carcinoma

The study of IRGPs to construct the prognostic signature in head and neck squamous cell carcinoma (HNSCC) has not yet elucidated. The objective of this study was to explore a novel model to predict the prognosis of HNSCC patients. The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) datasets were set as training and validation cohorts, respectively. The least absolute shrinkage and selection operator (LASSO) and time-dependent ROC were employed to screen the highest frequency immune-related gene pairs (IRGPs) and their best cut-off value. Survival analysis, Cox regression analysis were applied to discover the effects of selected IRGPs signature on survival outcomes. The immune cell proportions were deconvoluted by the CIBERSORT method. After a couple of filtering, we obtained 22 highest frequency IRGPs. The overall survival time of HNSCC patients with a high score of IRGPs was shorter as compared to the ones with a low score in two independent datasets (P < 0.001). Six kinds of immune cells were found to be differentially distributed in the two different risk groups of HNSCC patients (P < 0.001). GO and GSEA analysis showed these differentially expressed genes enriched in multiple molecular functions. The new IRGPs signature probably confers a new insight into the prognosis prediction of HNSCC patients.


INTRODUCTION
Head and neck squamous cell carcinoma (HNSCC) is one of the most common cancers globally, and more than 600,000 cases are diagnosed annually (Leemans et al., 2011). Smoking, heavy alcohol, and HPV infection are the main potential causes of HNSCC development. Local or distant metastasis is the major factor for cancer-associated death (Huang et al., 2018). Despite multimodal treatments, such as surgery, chemoradiotherapy, and targeted therapies, the 5-year survival rate of HNSCC patients remains poor Zhang et al., 2020). Interestingly, even patients with the same clinical symptoms, TNM stage, and treatments would have different prognoses (Karam and Raben, 2019). Therefore, the inaccurate effects of the TNM stage on predicting the prognosis of HNSCC patients lead to a clinically meaningful puzzle to explore and develop some novel and better molecular models.
Immunity is a disease defense system composed of a series of biological structures and processes in an organism (Montecino-Rodriguez et al., 2013). It is composed of various proteins, cells, organs, and tissues. Disorders of the immune system can lead to a variety of diseases (O'Byrne and Dalgleish, 2001;de Visser et al., 2006). Many studies have confirmed that the immune system plays an important role in the occurrence and development of tumors. Charlotte Lecerf et al. (2019) recently found OX40L, PDGFRB, and PD-1 act as the potential prognostic biomarkers and major therapeutic targets. Moreover, blockade to the PD1/PD-L1 axis has been approved by the FDA and shows groundbreaking progress in improving HNSCC patients' survival time (Tsangaris et al., 1969). Discovering and understanding the relationship between immune-related genes and HNSCC could be crucial.
The tumor microenvironment (TME) includes various cell types and extracellular components, such as vasculature, cancer-associated fibroblasts, immune cells, tumor-associated endothelial cells, and extracellular matrix (Wu and Dai, 2017). Accumulated studies show that TME influences therapeutic response and clinical outcomes. Regulation of TME by CTX/L-NIL could improve the sensitivity of tumor cells of chemoradiotherapy resistance (Hanoteau et al., 2019). The expression level of PD-L1 in exosomes of HNSCC correlated with the clinical outcome of patients. The blocking of PD-L1 + exosomes weakens immune suppression in HNSCC (Theodoraki et al., 2018).
Previous studies showed many gene-expression-based models for predicting cancer patient prognosis (Resteghini et al., 2018). Unfortunately, none of them could be implemented in routine clinical practice because of some issues like overfitting on small discovery data sets and lack of sufficient validation. With more and more large-scale gene expression profiles published online, it could identify potentially more reliable prognosisrelated biomarkers in HNSCC. However, these diverse largescale data also represent a daunting challenge. For instance, the traditional method to normalize gene expression is a difficult task given the potential biological heterogeneity among data sets and technical biases across measurement platforms. Instead, the relative ranking of gene expression may be a good way to eliminate the requirement for data scaling and normalization (Li et al., 2017), producing a robust result in various applications.
Considering that the important roles of immune-related genes, there are emerging studies that calculate the score of immune-related gene pairs (IRGPs) based on its relative raking and report a significant correlation between IRGPs and the prognosis of cancer, including hepatocellular carcinoma, colorectal cancer, gastric cancer, lung adenocarcinoma and cervical cancer (Wu et al., 2019;Zhang et al., 2019;Nie et al., 2020;Sun et al., 2020;Zhao et al., 2021). To our best knowledge, no study applying IRGPs to construct the prognostic signature in HNSCC. Here, we used TCGA datasets, GEO datasets, and immune-related genes from the ImmPort database to establish and validate the immune prognostic signature, which was constructed by 22 immune-related gene pairs consisting of 28 unique immune-related genes. The relationships of IRGPs with clinical features and underlying molecular function enrichments and immune cell proportions were further discovered in our study.

Acquisition of Relevant Public Data
The RNA-seq expression data and clinical data of 525 HNSCC patients in the training group were downloaded from the TCGA dataset 1 , which was used as the training cohort. Two independent datasets were used as the validation group were downloaded from the GEO dataset in this study, including GSE65858 (N = 270, Illumina HumanHT-12 V4.0 expression bead chip) (Wichmann et al., 2015) and GSE2379 (N = 34, Affymetrix Human Genome U95 Version 2 Array and Affymetrix Human Genome U95A Array) (Cromer et al., 2004). For each dataset, the probe ID was converted to the corresponding gene symbol according to its annotation file without making further standardization. If more than one probe matches the same gene, the average value is calculated as the expression value of the gene. The immunerelated gene list was download from the ImmPort database 2 .

Establishment and Validation of an Immune Gene-Related Prognostic Model
All the 2,498 immune-related genes from the ImmPort database were selected for the candidates' immune-related genes to be intersected with the TCGA dataset and GEO dataset, respectively. The immune-related genes obtained from the above three cohorts are intersected to obtain a matrix that included 145 co-IRGs. During screen immune-related genes (IRG), the median absolute deviation must be greater than 0.5. Next, each gene will have a chance to be paired up with another gene in the matrix. The estimated score of the gene pair is up to the reduction from the gene counts. If the gene pair i is made up of gene i and gene i+1 . The score will be 1 if the expression count of gene i is higher than gene i+1 ; otherwise, its score will be 0. More importantly, we should discard the gene if the percentage of the score was 1 or 0 was less than 20%. We used LASSO to get the maximum information but the parsimonious model (iteration = 1,000). The IRGPs' signature was calculated as follows: Risk score = (Expr genepair−1 × Coef genepair−1 ) + (Expr genepair−2 ×Coef genepair−2 ) + . . . + (Expr genepair−n × Coef genepair−n ). Expr means the expression value of gene pairs, and Coef means the regression coefficient derived. Then we used the TCGA survival data, risk score, and time-dependent ROC at 3 years in the TCGA dataset to screen the highest frequency IRGPs and calculate the cut-off value to determine the patient with high risk or low score. The capability of this model to predict overall survival (OS) was visualized and compared by Kaplan-Meier, univariate and multivariate Cox proportional hazard analysis. The external dataset GSE65858 and GSE22379 were used for the validations.

Correlations Between IRGPs and Tumor-Infiltrating Immune Cells Proportions
The CIBERSORT algorithm, a method for characterizing cell composition from gene expression profiles of complex tissues , was used to explore the relationship between tumor immune cells and IRGPs. It should enable large-scale analysis for inferring immune cells, cellular biomarkers in tumor transcriptomes . The proportions of 22 immune cell types were estimated in the TCGA cohort. A radar map is drawn for comparing these immune cell distributions between the patients with a high score and the ones with a low score according to the results of the CIBERSORT algorithm.

Gene Enrichment Analysis
Gene ontology (GO) analysis was used to discover the enriched signaling pathway based on the genes in the IRGPs prognostic signature in HNSCC. Gene set enrichment analysis (GSEA) was used to understand the potential biological mechanisms these genes in IRGPs prognostic signature may involve.

Statistical Analysis
All statistical analyses were performed using R (version 4.0.2 3 ).
The log-rank test was used in the Kaplan-Meier analysis. The ROC curve was performed by the "survivalROC" package. The "survival" package was used to perform survival analysis curves, univariate Cox survival analysis and multivariate Cox survival analysis. The radar map was drawn by the "fmsb" package. The GSEA curve and the bubble chart of GO enrichment analysis were performed by the "fgsea" and "ggplot2" packages. For all tests, the significant differences were considered at P < 0.05.

Construction and Validation of IRGPs Signature in HNSCC
The flow chart of this study is shown in Figure 1. The characterizations of the training and validation cohort are summarized in Table 1. As showing in Table 2, 22 highest frequency IRGPs were obtained from 2,498 immune-related genes in the ImmPort database after a couple of filtering mentioned above. The best cut-off value for the determination 3 https://www.r-project.org/ of high or low scores was calculated by ROC analysis in the TCGA dataset of HNSCC patients (sensitivity = 0.701, specificity = 0.812) (Figure 2A). The Kaplan Merrier analysis showed that patients with a high score of IRGPs had been predicted a poorer outcome as opposed to a low score of IRGPs (P < 0.001, Figure 2B), which was further validated in the GSE65858 cohort (P < 0.05, Figure 2C) and GSE2379 (P = 0.032, Figure 2D).

The Differential Tumor-Infiltrating Immune Cells Between HNSCC Patients With Different IRGPs
There are many different proportions of immune cells found in HNSCC patients with high-score over low-score of IRGPs (Figure 4). As shown in the radar map (Figure 4A), we found that naïve B cells, Plasma cells, Resting mast cells, CD8 + T cells,  and Treg cells were significantly enriched in patients with a low score of IRGPs than the high score of IRGPs (P < 0.001) (Figures 4B-F). However, M0 macrophages were significantly higher in the high score group than the low score group ( Figure 4G). The comparisons about the other immune cells in the two groups of patients were shown in Supplementary  Figure 1. Furthermore, we explored the potential associations between the IRGPs' signature and 22 immune cells by Pearson correlation analysis. We found that the signature risk score was significantly and negatively correlated to resting mast cells

Performance of IRGPs on Predicting HNSCC Prognosis
We applied the ROC analysis to explore the predictive ability of the IRGPs' signature on the 5-year overall survival in the two validated HNSCC cohorts. The value of under the curve (AUC) in GSE65858 and GSE2379 was 0.765 and 0.905, respectively (Figures 6A,B).

Enrichment Analysis Explored Potential Signaling Pathways
Go analysis showed that the signature of IRGPs in the TCGA cohort of HNSCC was involved in the positive regulation of cell activation, regulation of lymphocyte activation, regulation of immune effector process, humoral immune response, plasma membrane and phagocytosis ( Figure 6C). According to the GSEA analysis, we revealed positive regulation of cell activation, regulation of lymphocyte activation, regulation of immune effector process, regulation of gene expression epigenetic, phagocytosis, mRNA binding, membrane signaling receptor complex, production of molecular mediator, and humoral immune response (Figure 7). Based on their normalized enrichment score, a lot of enriched signaling pathways were identified. It mainly including regulation of cell activation, regulation of lymphocyte activation, regulation of immune effector process, phagocytosis, and humoral immune response, which may be the immune-related pathways of significant enrichments in HNSCC. It provides possible immunologic concepts to support the prediction of IRGPs for HNSCC prognosis.

DISCUSSION
HNSCC is the sixth most prevalent malignant tumor worldwide.
The immune system is the main defense system of people against diseases, including cancers (Huse, 2017). Exploring the mechanisms between the immune-related genes and cancer prognosis may open a new perspective for predicting the outcomes of HNSCC patients. Currently, immunotherapy, receiving extensive attention all around the world, has been demonstrated as a groundbreaking remedy for cancer patients by blocking CTLA-4 or PD-1 pathways (Yang, 2015). Besides, researchers show that human peripheral blood T lymphocytes transduced by CD19-CAR could treat leukemia well (Brentjens et al., 2003). Therapeutic cancer vaccines and tumor neoantigens are other options for joining the armies of immunotherapy. Due to a lack of ideal vaccine design and the existence of   an immunosuppressive tumor microenvironment, therapeutic vaccination remains suboptimal (Melief et al., 2015). Continued efforts to understanding the role of the immune microenvironment will provide better methods for the treatment of HNSCC. So, we applied 22 immunes relate gene pairs to construct a prognostic signature and further explore its potential role in HNSCC prognosis. A previous study has reported that immune-related genes provide a model for predicting outcomes in HNSCC (She et al., 2020). However, a limitation is the lack of robust normalization to reduce batch effects if different platforms or sources are involved. Therefore, the first step needs to standardize all genes to reduce the errors caused by different samples and platforms. To overcome this limitation, we adopted a method named IRGPs (validated in a variety of cancers; Peng et al., 2018) to assess the score of the sample by using the comparable relative score of a pair of gene expressions in the same sample, which could remove the biases from the different platforms and samples.
In this study, 515 HNSCC patients from TCGA datasets act as the training group, the data of 270 patients from GSE65858 and 34 patients from GSE2379 act as the validation groups. Lasso penalized Cox regression was used to construct a 22-IRGPs signature and ROC curve analysis was used to calculate the best cut-off value. The HNSCC patients with a high or low score of IRGPs have a significant difference in survival analysis.
Univariate and multivariate Cox regression analyses showed the high score was an independent prognostic factor. Besides, we used two external cohorts to confirm the conclusions, which remain the same as in the training group. Consequently, our signature has high accuracy and could be a potential prognostic factor in HNSCC. We found that naïve B cells, Plasma cells, Resting mast cells, CD8 + T cell, and Treg cells were significantly enriched in patients with a low score of IRGPs by immunocyte infiltration analysis, which may play an inhibitory role in the development of HNSCC. It has report showed that the proportions of naïve B cells, resting mast cells, and CD8 + T cells in non-cancer tissues were significantly higher than HNSC tissues (Krishna et al., 2018). It is further confirmed the accuracy of the results of our study. Furthermore, some research also found that plasma cells were associated with distant cancer metastasis, clinical stage. Naïve B cells and Tregs were significantly associated with pathological grade (Liang et al., 2020). These differential proportions of tumor-infiltrating cells may be reasons for the different overall survival observed and validated between a high and low score of IRPGs and associated with the IRGP interactions in HNSCC patients.
The 22 IRGPs in our model consisted of 28 immune-related genes, including Wnt5a, CCL1, CCL5, CCL9, HLA-DOB, IFN, and CD247, which mainly involved some important pathways and cellular processes such as proinflammatory, antimicrobial responses, antigen identification, and immune cell signaling. Wnt signaling regulates a series of signaling pathways, including cell proliferation, differentiation, adhesion, and motility (Wend et al., 2010). Wnt5a act as a non-canonical Wnt ligand that plays a critical role during growth and development (Oishi et al., 2003). The abnormal activation and inhibition of Wnt5a lead to the genesis in tumors. Increased Wnt5a level promotes the ability of invasion in metastasis melanoma (Weeraratna et al., 2002). In contrast, Wnt5a acts as a potential inhibitor. The increased Wnt5a level inhibits cell growth and migration in thyroid carcinoma (Kremenevskaja et al., 2005). Furthermore, several reports have already suggested that Wnt5a plays an important role in cancerassociated inflammation. Wnt5a induced gastric epithelial cells to produce IL-1, leading macrophages gathered in gastric mucosa inducing gastric inflammation (Asem et al., 2016).
Most of the research reported that chemokines and their receptors play an important role in mediated cancer progress and metastasis. For instance, CCL1 derived from tumorassociated macrophages promotes breast cancer metastasis through activating NF-κB/SOX4 signaling . CXCL9 predominantly mediates lymphocytic infiltration to the focal sites and suppresses tumor growth (Tokunaga et al., 2018). The knockout of NLRX1 could significantly decrease IFN-I dependent T cell infiltration and inhibit tumor progression (Luo et al., 2020). HLA-DOB consists of a class II molecule and a beta chain and expresses on the antigen-presenting cells and B lymphocytes (Pruitt et al., 2009). The above descriptions indicated that immune response contributes to the development and progression of carcinomas in the interaction of the immune gene. It is consistent with our results that IRGPs were significant associated with some kinds of immune cells, including CD8 + T cells, Treg cells, and Resting mast cells (Li et al., 2021). The immune-related genes inside our IRGPs might play an important role in the tumor microenvironment of HNSCC. The GSEA and GO enrichment analysis implied that regulation of positive regulation of cell activation, regulation of lymphocyte activation, regulation of immune effector process, regulation of gene expression epigenetic, phagocytosis, mRNA binding, membrane signaling receptor complex, production of molecular mediator and humoral immune response might be the significant enrichment of related pathways in HNSCC, which give us more confidence in the capability of IRGPs to predict the HNSCC prognosis as we all know that the expression of gene reduced can promote most of diseases development. For example, mice with the knockout of EIF4E-BP1, a direct target of mTOR, formed more and larger lesions than its wild type mice . The humoral immune response is an important part of the immune system. Since tumors originate from containing selfantigens of autologous cells, the abnormal exposure of these antigens can activate the autoimmune response to promote the development of cancer (Zaenker et al., 2016). Therefore, we suggested that the 22 IRGPs signature was constructed in our research act as an important role in the development and prognosis of HNSCC. The reason that we chose two datasets GSE65858 and GSE22379 as validation cohorts is that these two databases of HNSCC contain the most complete clinical data. Although the second database contains a relatively small number of patients, it contains comprehensive patients' information such as survival time, survival status, and TMN staging that we want. The clinical data of age in the multivariate COX regression analysis among three groups are not in the same range, P < 0.001, =0.036, =0.479, respectively, in Figures 3B,D,F. Therefore, we don't think age can be used as a prognostic factor for HNSCC.
However, we are obliged to admit the limitations of our research. Firstly, although we used the validation group to prove the training group, the number of samples is not large enough. Secondly, it can be more persuasive if we further validate our findings through RT-QPC or IHC.
In conclusion, we established a new IRGPs signature for predicting the prognosis in HNSCC, which may provide a possible tool to help clinical physicians to give HNSCC patients consult about the prognosis.

CONCLUSION
We developed a new IRGPs signature prognostic model in HNSCC. It probably confers a new insight into the prognosis prediction of HNSCC patients, which may provide some clues for future personalized medicine decisions.

DATA AVAILABILITY STATEMENT
The raw data of this study are derived from the TCGA database (https://portal.gdc.cancer.gov/) and GEO data portal (https: //www.ncbi.nlm.nih.gov/geo/; Accession Nos.: GSE65858 and GSE2379), which are publicly available databases. The R codes will be available upon request.

AUTHOR CONTRIBUTIONS
MH: conception, revision for important intellectual content, and supervision. JH: interpretation or analysis of data. XF: preparation of the manuscript. All authors contributed to the article and approved the submitted version.