Integrated Analysis of RNA-Binding Proteins Associated With the Prognosis and Immunosuppression in Squamous Cell Carcinoma of Head and Neck

RNA-binding proteins (RBPs) interacting with target RNAs play essential roles in RNA metabolism at the post-transcription level. Perturbations of RBPs can accelerate cancer development and cause dysregulation of the immune cell function and activity leading to evade immune destruction of cancer cells. However, few studies have systematically analyzed the potential prognostic value and functions of RBPs in squamous cell carcinoma of head and neck (SCCHN). Here, for the first time, we comprehensively identified 92 differentially expressed RBPs from The Cancer Genome Atlas (TCGA) database. In the training set, a prognosis risk model was constructed with six RBPs, including NCBP2, MKRN3, MRPL47, AZGP1, IGF2BP2, and EZH2, and validated by the TCGA test set, the TCGA all set, and the GEO data set. In addition, the risk score was related to the clinical stage, T classification, and N classification. Furthermore, the high-risk score was significantly correlated with immunosuppression, and low expression of EZH2 and AZGP1 and high expression of IGF2BP2 were the main factors. Thus, the risk model may serve as a prognostic signature and offer highlights for individualized immunotherapy in SCCHN patients.


INTRODUCTION
Globally, squamous cell carcinoma of head and neck (SCCHN) represents the sixth most common malignancy, with increasing incidence and over 300,000 deaths annually (Ferlay et al., 2015;Siegel et al., 2020). The major causes of SCCHN include alcohol consumption, tobacco use, and human papilloma virus (HPV) infection (Hill and D'Andrea, 2019). Despite advances in multimodal treatments, including surgery, chemotherapy, and radiotherapy, the 5-year survival rate has not notably improved (Cramer et al., 2019). Hence, new reliable and prospective biomarkers are urgently required for efficient diagnosis and prognosis assessment and the development of therapeutic strategies to decrease the mortality rates of SCCHN patients.
RNA-binding proteins (RBPs) serve as post-transcriptional regulators interacting with target RNAs. Because RBPs play essential roles in RNA stability, alternative splicing, modification, translation, translation, and degradation, they impact the function and destiny of transcripts in the cell and maintain cellular homeostasis (Anantharaman et al., 2002;Mitchell and Parker, 2014;Pereira et al., 2017). Previous studies have shown that dysfunction of RBPs can eventually lead to multiple diseases ranging from hereditary diseases to cancers (Chelly and Mandel, 2001;Chénard and Richard, 2008;Neelamraju et al., 2018). In all, 1542 human RBPs, accounting for 7.5% of all protein-coding genes, interacting with all known RNA types have been identified utilizing deepsequencing approaches (Gerstberger et al., 2014;Beckmann et al., 2015), which provide a rare opportunity for systematic analysis of RBP genes in cancers. However, few studies have comprehensively analyzed the relationship between RBPs and the prognosis of squamous cell carcinoma of head and neck (SCCHN).
Here, to comprehensively analyze the prognostic value and potential function of RBPs in SCCHN, we obtained gene expression profiles of SCCHN patients from The Cancer Genome Atlas (TCGA) database to construct a prognosis risk model. Interestingly, our study showed that the high-risk score was associated with immunosuppression.

Data Sets
The RNA sequencing (RNA-Seq) data and clinical information were downloaded from the TCGA database 1 of 500 SCCHN patients with 44 adjacent normal samples. For TCGA data, we selected 498 SCCHN patients with follow-up data and randomly divided them into two groups: the TCGA training set (n = 298, Supplementary Table 1) and the TCGA test set (n = 298, Supplementary Table 2). Additionally, the GSE65858 data set was downloaded from the Gene Expression Omnibus (GEO) database 2 , as an external independent verification set with RNA-Seq data and clinic information. We performed data analysis utilizing R project (version 3.6.3) 3 . Clinical features of HNSCC patients of TCGA and GEO databases were shown in Table 1.

GO and KEGG Pathway Analyses
To analyze the function of DERBPs, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were performed by the enrichplot R package (Yu et al., 2012). GO terms included biological process (BP), cellular component (CC), and molecular function (MF). For the analysis results, both P-value and FDR < 0.05 were defined as statistical significance.

Protein-Protein Interaction (PPI) Network Construction
We constructed the PPI network of DERBPs to investigate protein interactions using STRING (version 11.0) 4 (Szklarczyk et al., 2019) according to a combined score >0.4. Then, the PPI network was visualized by Cytoscape software (version 3.7.1) (Smoot et al., 2011). Furthermore, the Molecular Complex Detection (MCODE, version 1.6.1) (Bader and Hogue, 2003) plug-in in Cytoscape was utilized to screen the key modules based on Degree Cutoff = 2, Node Score Cutoff = 0.2, K-Core = 2.

Construction of a Prognostic Risk Model
To identify overall survival (OS)-associated DERBPs, we performed univariate Cox regression analysis. We chose the 4 https://string-db.org/ candidate prognostic genes according to P-value < 0.05. Subsequently, the multigene prognostic risk model was constructed by Lasso regression analysis in the TCGA training set. We calculated the risk score of each patient using the regression coefficients of each candidate gene according to the following computational formula: Coef gene k * Exp k Here, n is the number of the candidate genes of the prognostic risk model, gene k is the k th candidate gene, Coef is the estimated regression coefficient of the candidate genes from the Lasso regression analysis, and Exp k is the mRNA expression level of the k th candidate gene. Then, we clustered the SCCHN patients into high-risk and low-risk groups with the median the risk score of the TCGA training set. The association between the candidate genes and risk scores were shown using the hierarchical cluster heat map.

Gene Set Enrichment Analysis
Gene set enrichment analysis (GSEA) is an analytical method used to estimate significant differences between two biological conditions to determine specific functional gene sets (Subramanian et al., 2005). In our research, GSEA was performed utilizing GSEA (version 4.0.3) 5 with the Molecular Signatures Database (MSigDB) (Liberzon et al., 2011). C2 curated gene sets, and a list of significantly different gene sets between the high-risk and low-risk groups was generated. Gene sets, performed 1,000 times for each analysis, with p < 0.05 and FDR < 0.25 were defined as significantly enriched.

Evaluation of Immune Scores and Immune Cell Infiltration
The ESTIMATE (Estimation of Stromal and Immune cells in Malignant Tumor tissues using Expression data) algorithm is a method used to calculate the immune and stromal scores of tumor samples. The immune and stromal scores of SCCHN samples TCGA data set was calculated by the estimate R package (Yoshihara et al., 2013).
In addition, we assessed the composition fraction of tumor-infiltrating immune cells of each SCCHN sample by CIBERSORT 6 . CIBERSORT is an algorithm used to characterize the cell composition of complex tissues according to gene expression profiles (Newman et al., 2015).

Statistical Analysis
All statistical analyses were performed utilizing R project (version 3.6.3). Wilcoxon rank-sum test was a non-parametric statistical hypothesis test mainly used for comparisons between two groups and Kruskal-Wallis test was suitable for two or more categories. Survival analysis was estimated using the Kaplan-Meier curve with the log-rank test. The diagnostic values of the risk score and other clinical factors were evaluated utilizing ROC curve analysis. The correlation between the variables was identified by Spearman's rank correlation test. P < 0.05 was identified as statistically significant.

Analysis of Differentially Expressed RBPs in SCCHN Samples
We analyzed the expression profiles of 1,542 human RBPs (Supplementary Table 3 including sex chromosomes X and Y (Supplementary Figure 1), in 498 SCCHN with 44 normal tissues in the TCGA data set. Then, we identified 92 differentially expressed RBPs (DERBPs), including 74 upregulated and 18 downregulated DERBPs (FDR < 0.05 and |logFC| > 1). The DERBPs are listed in Supplementary Table 4 and are visualized by the heat map ( Figure 1A) and the volcano plot ( Figure 1B).

Function Analysis of DERBPs in the TCGA Data Set
The potential function of DERBPs in the TCGA data set was analyzed utilizing GO and KEGG pathway enrichment analyses. The top 10 enriched GO terms of BP, CC, and MF for DERBPs were displayed using the scatter plot (Figures 2A-C). The most significant enriched terms of BP, CC, and MF were associated with defense response to virus, cytoplasmic ribonucleoprotein granule, and catalytic activity, acting on RNA, respectively. The enriched pathways of KEGG pathway analysis were also demonstrated with the scatter plot ( Figure 2D). The results showed that the DERBPs might be related to measles, influenza A, hepatitis C, RNA transport, Epstein-Barr virus infection, mRNA surveillance pathway, cytosolic DNA-sensing pathway, RIG-I-like receptor signaling pathway, and RNA degradation. The functional analyses revealed that the DERBPs are mainly related to RNA metabolism.

PPI Network Construction
The PPI network of DERBPs was constructed using STRING according to combined scores > 0.4, and then the results Frontiers in Genetics | www.frontiersin.org were visualized by Cytoscape software (Figure 3A), in order to better understand the potential interactions among DERBPs. In addition, the key modules of the PPI network were screened utilizing MCODE and two modules were selected (Figures 3B,C).

Identification of a Prognostic Risk Model in the TCGA Training Set
To identify prognostic DERBPs of SCCHN patients, the expression profiles of the 92 DERBPs in the TCGA training set were analyzed using univariate Cox regression analysis. Moreover, six prognosis-associated DERBPs, including NCBP2, MKRN3, MRPL47, AZGP1, IGF2BP2, and EZH2, of the TCGA training set are exhibited by forest plot (Figure 1C). Then, a prognostic risk model of six prognosis-associated DERBPs was constructed utilizing LASSO regression analysis (Supplementary Figure 2). The information and the coefficient values of the six genes are shown in Table 2. The prognostic risk score of The SCCHN patients in the TCGA training set were divided into low-risk and high-risk groups according to the median cutoff value of risk scores (0.2962). Survival analysis demonstrated that the overall survival (OS) of the high-risk group was significantly worse than that of the low-risk group (P < 0.0001, Figure 4A). The receiver operating characteristic (ROC) curve analysis showed that the area under the ROC (AUC) value was 0.712, higher than other clinical factors ( Figure 4B). The risk scores and survival status of SCCHN patients in the training set were ranked with dot plots (Figures 4C,D). The expression patterns of six genes in the high-risk and low-risk groups of the training set are demonstrated using the heat map ( Figure 4E), which indicated that high expressions of NCBP2, MKRN3, MRPL47, and IGF2BP2 serve as risk factors associated with the highrisk score, while high expressions of AZGP1 and EZH2 act as protective factors associated with the low-risk score.

Verification of the Prognostic Risk Model in the TCGA Data Sets
To validate the prognostic risk model, independent validation data sets were used to test. According to the risk model from the training data set, all SCCHN patients in the TCGA test data set were also segregated into high-risk and low-risk groups. Kaplan-Meier curve analysis showed the survival of the high-risk group was worse than that in the low-risk group (P < 0.001, Figure 5A). The AUC value of the risk score in the TCGA test set was 0.626 using the ROC curves analysis, higher than other clinical parameters ( Figure 5B). The association between the expression patterns of the six risk genes and the risk score was consistent with the training set ( Figure 5C). A similar analysis also was performed in the TCGA data set; the results were also consistent with the training set (Figures 5D-F).

Verification of the Prognostic Risk Model in the GEO Data Set
Further, the GEO data set (GSE65858) with 270 SCCHN patients was used as an external independent data set to validate the risk model. The patients in the GEO test set were also classified into high-risk and low-risk groups, and the prognoses of the highrisk group were also significantly worse than those of the low-risk group (P < 0.0001, Figure 6A). The AUC value of the risk score was 0.602, also higher than other clinical parameters, except for T classification (Figure 6B). The risk scores and survival status of SCCHN patients were also shown with dot plots (Figures 6C,D). The association between the expression profiles of the six genes and the risk score in the GEO data set was also in line with the training set ( Figure 6E).

Association Between the Risk Score and the Clinical Parameters of SCCHN Patients
The clinical parameter subgroup analysis of the risk score was shown (Figures 7A-E and Table 3), and the results revealed that the risk score of SCCHN patients with stages III-IV, T3-4, and N + were higher than that with stages I-II, T1-2, and N0, respectively (P < 0.0001, P < 0.01, and P < 0.05, respectively).
However, the risk scores between the subgroups of grade and M classification were not statistically significant (P = 0.147 and P = 0.347, respectively). In addition, we analyzed the association between the risk score and other clinical parameters using logistic regression in the TCGA data set ( Table 4). The level of risk score was significantly associated with clinical stage (P < 0.01), T classification (P < 0.05), and N classification (P < 0.05). However, it was not correlated with other clinical parameters, including age (P = 0.817), gender (P = 0.234), histological grade (P = 0.344), and M stage (P = 0.347). These results suggested that the risk score was closely associated with the progression of SCCHN.

GSEA Analysis of the Risk Score-Associated Signaling Pathway
GSEA analysis was performed to unravel significantly enriched pathways of the high-risk and low-risk groups in the TCGA data set. The top 10 enriched pathways of the high-risk group and thirty enriched pathways of the low-risk group were demonstrated (Supplementary Table 5). Enriched pathways with significant differences (FDR < 0.25, NOM p < 0.05) were selected ( Table 4). The results demonstrated that protein degradation and export related pathways were significantly enriched in the high-risk group ( Figure 8A); however, immune, inflammatory response and fatty acid metabolism were significantly enriched in the low-risk group (Figures 8B-D). Intriguingly, the B cell receptor signaling pathway and T cell receptor signaling  pathway were enriched in the low-risk group (Figures 8E,F), which indicated that the high-risk score may be associated with

Association Between the Risk Score and Tumor Immunity
For the hint from the GSEA results that the high-risk score may be associated with tumor immunosuppression, we performed the ESTIMATE to identify the immune/stromal score of the TCGA data set. Our results showed that tumor samples in the low-risk group had higher immune scores than those in the high-risk group (P < 0.0001, Figure 9A). In addition, the risk score was significantly and negatively correlated with the immune score in SCCHN samples by Spearman's rank test (R = -0.16, P < 0.001, Figure 9B), and there was no significant correlation between the risk score and the stromal score in SCCHN samples (Figures 9C,D).
Moreover, we identified the composition of infiltrating immune cells of SCCHN samples in the TCGA data set using the CIBERSORT to analyze immune cells between the risk subgroups  Figure 9E). Consistent with the GSEA results, the result revealed that SCCHN samples in the high-risk group contained a lower fraction of naïve B cells (P < 0.001), CD8 T cells (P < 0.01), and follicular helper T (P < 0.05) compared with those in the lowrisk group. These results suggested that the high-risk score was associated with immunosuppression.

Correlation of the Genes of the Risk Model With the Three Immune Cell Types
In line with the association between the risk score and the above three immune cell types (Figure 10A), we investigated the association between three types of immune cells with the expression levels of six genes in the risk model (Figures 10B-G). Consistently with the expression patterns of the risk genes, the reduction of naïve B cells was associated with the low expression of EZH2 (P < 0.001) and AZGP1 (P < 0.001).
In addition, the decrease of CD8 T cells was related to the low expression of EZH2 (P < 0.001) and the high expression of IGF2BP2 (P < 0.01). Moreover, the asthenia of follicular helper T cells was associated with the low expression of EZH2 (P < 0.001) and the high expression of IGF2BP2 (P < 0.01). Hence, the risk genes EZH2, AZGP1, and IGF2BP2 play key roles in immunosuppression of SCCHN.

DISCUSSION
Evasion of immune destruction of cancer cells is one key hallmark of cancer (Hanahan and Weinberg, 2011), and RBPs can regulate the function and activity of immune cells, which eventually may be linked to immune surveillance evasion of cancer cells through managing the RNA metabolism at the post-transcription level (Kafasla et al., 2014;Pereira et al., 2017).
In addition, accumulating studies have reported that dysregulated expression of RBP-facilitated cell proliferation, invasion, and metastasis and pluripotency and stemness in multiple cancers (Yu et al., 2007;Dong et al., 2019;Soni et al., 2019;Velasco et al., 2019;Elcheva et al., 2020;Pascual et al., 2020). However, few studies have analyzed the expression patterns and roles of RBPs in SCCHN. In our study, for the first time, we comprehensively analyzed the expression patterns and potential functions of RBPs in SCCHN.
Here, we initially comprehensively analyzed the associations between the prognosis of SCCHN and 92 differentially expressed RBPs (DERBPs, Figures 1A,B). Subsequently, we constructed a prognosis risk model in the training set with six RBPs, including NCBP2, MKRN3, MRPL47, AZGP1, IGF2BP2, and EZH2 (Figure 1C), which showed a robust performance for predicting prognosis compared with clinical parameters in training and multiple validation sets (Figures 4-6).
In this prognostic risk model, AZGP1 and EZH2 served as protective factors, while NCBP2, MKRN3, MRPL47, and IGF2BP2 acted as risk factors. Low AZGP1 expression was significantly associated with increased risk of biochemical relapse in margin-positive localized prostate cancer (Yip et al., 2011;Bruce et al., 2016). AZGP1 as a cancer suppressor inhibited cell proliferation, migration, and invasion via TGF-β and PTEN/Akt signaling pathways in pancreatic cancer and hepatocellular carcinoma (Kong et al., 2010;Tian et al., 2017). These studies consistently with our results suggested AZGP1 as an anticancer gene. Previous studies have shown that EZH2 silencing reduced cancer cell growth, migration and invasion in SCCHN (Liu et al., 2013;Chang et al., 2016), but which lack transgenic animal experiments and does not involve the influence of EZH2 on tumor microenvironment regulation of tumor genesis and development. IGF2BP2 overexpression was observed in pancreatic cancer, colorectal cancer, and SCCHN, which promoted cancer cell proliferation by activating the PI3K/Akt signaling pathway (Ye et al., 2016;Wang et al., 2019;Xu et al., 2019;Deng et al., 2020). These studies consistently with our results suggested that IGF2BP2 served as an oncogene. However, the roles of NCBP2, MKRN3, and MRPL47 in cancers are still unclear. Our study was first to suggest that they can be acted as risk factors in a prognostic risk model of SCCHN.
Interestingly, our GSEA results unveiled that the B cell receptor (BCR) signaling pathway and the T cell receptor (TCR) signaling pathway enriched in the low-risk group (Figures 8E,F), which indicated that they may be asthenic in the high-risk group. BCR and TCR signalings have been shown pivotal for B cell and T cell proliferation and development for adaptive immunity, and their abnormalities could lead to immunodeficiency (Burger and Wiestner, 2018;Tan et al., 2019;Berry et al., 2020;Takeuchi et al., 2020). Therefore, we identified the correlation between the risk score and the immune score and analyzed the composition of immune cells between the risk subgroups in SCCHN samples of the TCGA data set. As we confirmed, there was a negative correlation between the high-risk score and tumor immune score, and the highrisk group contained lower fractions of naïve B cells, CD8 T cells, and follicular helper T cells compared with the low-risk group (Figure 9). Hence, the results revealed that the high-risk score may be an essential factor for B and T cell growth and differentiation leading to tumor immunosuppression, and the low expression of EZH2 and AZGP1 and high expression of IGF2BP2 were the main factors of tumor immunosuppression in the risk model (Figure 10). During the humoral immune response, EZH2 expression was remarkably elevated in the B cells of the germinal center (GC) (Caganova et al., 2013;Herviou et al., 2019), which directly inhibited cell cycle inhibitors of B cells, including CDKN1A (Béguelin et al., 2013(Béguelin et al., , 2017. Similar to B cell development, EZH2 promoted generation and differentiation of mature T lymphocytes via preventing p53 stabilization to suppress CNKN2 (Jacobsen et al., 2017). In addition, EZH2 depletion in CD8 + T cells restrained the amplification of antigen-specific effector cells after pathogenic microorganisms infection (Gray et al., 2017;Chen et al., 2018). Albeit the roles of AZGP1 and IGF2BP2 in the immune response have not been investigated, our study was first to suggest that AZGP1 downregulation and IGF2BP2 upregulation may act as suppressors in tumor immune response in SCCHN.
Although we identified a prognostic risk model with six RBPs and revealed that the high-risk score was significantly associated with cancer immunosuppression, the results of our study performed with bioinformatics analysis were not robust enough needing to be confirmed utilizing experimental approaches. Thus, multicenter studies with larger sample sizes are required.

CONCLUSION
In summary, in our study, we developed a robust prognostic risk model with six differentially expressed RBPs. The results showed that the risk score has great potential as a prognostic and immunosuppression state biomarker in SCCHN patients. Therefore, the risk model may act as a prognostic signature and offer highlights for individualized immunotherapy of SCCHN patients.

AUTHOR CONTRIBUTIONS
ZL contributed to this project design and final approval of the manuscript. GH, JYa, QJ, LL, HP, YW, SL, YT, and JYu performed the data analysis, interpretation, visualization, and drafting. All authors contributed to the article and approved the submitted version.  Table 3, except the B cell receptor signaling pathway and T cell receptor signaling pathway.