The RNA Methylation Modification 5-Methylcytosine Impacts Immunity Characteristics, Prognosis and Progression of Oral Squamous Cell Carcinoma by Bioinformatics Analysis

Disorders pertaining to 5-methylcytosine (m5C) modifications are involved in the pathological process of many diseases. However, the effect of m5C on the tumorigenesis and progression of oral squamous cell carcinoma (OSCC) remains unclear. In this study, we integrated the genomic and clinical data of 558 OSCC samples to comprehensively evaluate m5C modification patterns. Based on 16 m5C methylation regulators, two m5C modification clusters were identified with distinct tumor immune microenvironment (TIME) characteristics and prognosis in OSCC. We then performed weighted gene co-expression network analysis (WGCNA) to identify m5C modification cluster-related modules. Genes in the selected module were chosen to construct the m5Cscore scoring system for evaluating m5C modification pattern in individual OSCC patients. Patients with a high m5Cscore had higher immune, stromal, and ESTIMATE scores; lower tumor purity score; lower immune activity; and higher tumor mutational burden. The overall survival rate and progression-free survival rate were markedly worse and the tumor recurrence rate was higher in OSCC patients with a high m5Cscore. Furthermore, patients with oral leukoplakia who also had a high m5Cscore had a higher risk of deterioration to OSCC. This study demonstrated that m5C modification patterns might affect the TIME in OSCC. m5Cscore may provide a new approach for predicting the prognosis and progression of OSCC.

INTRODUCTION 5-methylcytosine (m5C) is one of the longest-known post-transcriptional RNA methylation modifications and widely exists in mRNA, rRNA, and tRNA transcripts (Van Haute et al., 2019) (Schumann et al., 2020)(Bohnsack et al., 2019. m5C plays a crucial regulatory role in various cellular processes, including RNA processing, protein translational regulation, and stress responses (Liu et al., 2017). At the post-transcriptional level, m5C modification is dynamically regulated by mediator proteins known as "writers," "erasers," and "readers" (Trixl and Lusser, 2019). The "writers" comprise methyltransferase including NOP2, NSUN2-7, DNMT1, DNMT3A, DNMT3B, and TRDMT1, which add a methyl group at the C5 position of RNAs. The "erasers" are TET2, TET3 and ALKBH1, which serve as demethylases to remove methyl groups from m5C. The "readers," such as ALYREF and YBX1, recognize and bind to m5C sites on mRNA (Chen Y. S. et al., 2021) (Sibbritt et al., 2013) (Fu et al., 2014). m5C modification has been verified to involve in the development of many diseases, such as cancers. Chen et al. found that NSUN2 and YBX1 promoted oncogenesis in human bladder urothelial carcinoma by targeting the m5C methylation site in the untranslated region of HDGF3 (Chen et al., 2019a). NSUN2 is reported to upregulate in the human breast carcinoma and colon carcinoma tissues, compared with healthy tissues (Frye and Watt, 2006).
Oral squamous cell carcinoma (OSCC) is the most common epithelial malignancy in the oral area and is characterized by high recurrence rates, poor survival outcomes, and high cervical lymph-node metastasis rates (Gődény, 2014) (Bray et al., 2018) (Warnakulasuriya, 2009). Recent epidemiological investigations indicated that smoking, alcohol consumption, betel-nut chewing, diet, and nutrition are high-risk factors related to the oncogenesis and development of OSCC (Mashberg et al., 1993) (Chen et al., 2008). It is commonly thought that OSCC arises de novo or precedes latent lesions in the oral cavity (Choi and Myers, 2008). Leukoplakia is the most common potentially precancerosis in the oral area (Quan et al., 2020). Approximately 0-36% cases of leukoplakia may progress to OSCC (van der Waal, 2014) (Lee et al., 2000) (Chaturvedi et al., 2020). Over the past decades, great achievements have been made in the diagnosis and treatment of OSCC through methods such as imaging technologies, surgery, chemotherapy, molecular targeted therapies, and immunotherapy (Chai et al., 2020) (Kaidar-Person et al., 2018). However, the prognosis of OSCC remains poor, the main reasons for which are metastasis and diagnosis at advanced stages (Zhang et al., 2019). Therefore, further investigation into the mechanisms involved in the oncogenesis and development of OSCC might help improve diagnostic accuracy at an early stage, as well as the treatment effective rate.
It has been verified that dysfunction of the tumor immune microenvironment (TIME) affects the oncogenesis and development of OSCC (Quan et al., 2020) (Bhat et al., 2021). There is growing evidence that OSCC/head and neck squamous cell carcinoma (HNCSS) is highly immunosuppressive, mediated by recruitment of host immunosuppressive cells and inhibitory mediators (Jewett et al., 2006) (Tong et al., 2012). Patients with OSCC usually have worse antigen-presenting ability, lower infiltrating-lymphocyte viability, and lower immune infiltrating cell counts than healthy samples (Lasisi et al., 2013) (Barth et al., 2004) (Grimm et al., 2016). Furthermore, the rapid deterioration of oral leukoplakia into OSCC has also be proven in an immunosuppressed liver transplant recipient (Hernández et al., 2003). Nevertheless, further investigations in the underlying regulatory mechanisms are still needed.
In this study, we integrated the genomic and clinical data of 558 OSCC samples to comprehensively evaluate m5C modification patterns, and we correlated m5C modification patterns with prognosis and TIME characteristics. Weighted gene co-expression network analysis (WGCNA) was performed to identify m5C modification cluster-related modules. Genes in the selected module were chosen to construct the m5Cscore scoring system for evaluating m5C modification pattern in individual OSCC patients. The TIME characteristics and prognosis between m5C modification patterns and m5Cscore groups (high or low) were distinct. The m5C modification patterns and the m5Cscore scoring system might help in the prediction of the prognosis and progression of OSCC.

Data Processing
The mRNA expression dataset and clinicopathological characteristic of OSCC patients were extracted from The Cancer Genome Atlas (TCGA) (https://portal.gdc.cancer.gov/) and the NCBI GEO database (https://www.ncbi.nlm.nih.gov/geo/). In total, 558 OSCC samples, 32 normal samples, and 101 oral leukoplakia samples were screened out for further evaluation. 322 OSCC samples and 32 normal control samples were extracted from TCGA-HNSCC cohort, the cancerous sites included alveolar ridge, hard palate, floor of the mouth, the base of the tongue, oral cavity, buccal mucosa, and oral tongue. GSE65858 includes 83 OSCC patients; GSE41613 contains 97 OSCC samples; GSE26549 includes 86 oral leukoplakia patients; GSE31056 includes 22 OSCC patients; and GSE85195 contains 34 OSCC patients and 15 oral leukoplakia patients. Detailed information on these datasets is provided in Supplementary  Table S1. The clinicopathological information of OSCC patients is provided in Supplementary Table S2. The data processing were performed as we previously described (Gao et al., 2021). To remove the batch effects, the RNA sequencing data (FPKM format) data of OSCC patients in TCGA HNSCC was transformed into transcripts per kilobase million (TPM) format and then combined with microarray data in GSE65858. Copy number variations (CNVs) were obtained from the UCSC Public Hub and analyzed using the RCircos package in R. Somatic mutation expression data were downloaded from TCGA and analyzed using the maftools package in R.

Estimation of Tumor Immune Microenvironment Characteristics
We conducted single-sample gene set enrichment analysis (ssGSEA) to quantify the relative amount of 27 immune cell types in the TIME of OSCC using the GSVA and GSEABase packages. These gene sets acquired from recent studies were provided in Supplementary Table S3 (Charoentong et al., 2017) (He et al., 2018. ESTIMATE method was used to estimate the proportions of stromal cells (SCs), immune cells (ICs) in the TIME of OSCC. Tumor purity was predicted based on the SCs, ICs, and ESTIMATE scores. The deconvolution approach CIBERSORT was used to investigate the differences in 22 immune cell subtypes among the m5C-modification clusters. The "signature matrix" of 547 genes was acquired from CIBERSORT (https://cibersort.stanford.edu/ index.php), namely, LM22.txt. The differences in physiological processes in different m5C-modification clusters were evaluated by gene set variation analysis (GSVA). Gene sets "c2.cp.kegg.v7.4. symbols" acquired from the MSigDB (http://www.gsea-msigdb.org/ gsea/msigdb) were used for the GSVA analysis.

Construction of Gene Co-expression Network
We used WGCNA to explore m5C-modification cluster-related modules. Based on the top 50% of the most variable genes (7,949 genes), the gene co-expression network was constructed. The correlation of the expression of these filtered genes was calculated, and β 6 was selected as optimal soft threshold for building a scale-free topological network. Topological overlap matrix (TOM) analysis was performed to cluster the adjacency matrix of gene expression. The modules were identified with the criterion set as the mini-size of module gene numbers was 40, and a cut height of 0.9. The module with the strongest relationship with m5C-modification clusters was selected for further analysis.

Generation of m5Cscore
M5Cscore was adopted for the quantitative analysis of m5Cmodification patterns in an individual patient with OSCC. This method was established as we previously described (Gao et al., 2021). Briefly, the prognostic value of each gene in the selected module was evaluated by Cox regression analysis. Subsequently, principal component analysis (PCA) was conducted to calculate the principal components (PCs) 1 and 2, which were used for m5Cscore calculations.
where i is the expression of the selected genes.

Statistical Analysis
All statistical analyses were implemented using R software version3.5.3 (https://www.r-project.org/). The Kaplan-Meier analysis and log-rank analysis were performed to evaluate the progression-free survival (PFS) and overall survival (OS) between clusters. We conducted Pearman correlation analysis to explore the correlation between m5Cscore and known gene signatures and the correlation coefficient of gene expression. Biological gene signatures were retrieved from previous studies (Couzin-Frankel, 2013) (Melero et al., 2021. Student's t-test was used to evaluate the statistical differences between two groups. Kruskal-Wallis tests and one-way ANOVA tests were performed when there were more than two groups. In all cases, the p-value was two-sided, and a p-value < of 0.05, was considered statistically significant.

Depiction of m5C-Modification Clusters and Related Biological Functions in OSCC
To comprehensively evaluate the association among these m5C regulators, we conducted consensus clustering analysis to stratify 405 OSCC patients into two independent clusters: clusters A and B (  shows that the expression of m5C regulators in the two clusters was significantly distinct. The expression of m5C regulators was noticeably higher in cluster B. The OS analysis indicated that compared with cluster A, prognosis of patients in cluster B was markedly poorer (p 0.020) ( Figure 2C). What's more, in GSE41613, OSCC patients were also classified into two clusters (Supplementary Figures  S2D-F). Most m5C regulators were highly expressed in  Figure S2G). The OS outcomes in cluster A were worse than those in cluster B (p 0.047) ( Figure 2D).
We further evaluated the effect of m5C modification patterns on the TIME in OSCC. CIBERSORT analysis results indicated that compared with cluster A, cluster B had markedly lower fractions of ESTIMATE analysis for OSCC patients in high-or low-m5Cscore groups, including immune score (H), stromal score (I), ESTIMATE score (J), tumor purity scores (K). (L) The abundance of immune infiltrating cells between high-and low-m5Cscore group by ssGSEA analysis. *p < 0.05; **p < 0.01; ***p < 0.001; ns not significant. m5C, 5-methylcytosine; OS, overall survival; OSCC, oral squamous cell carcinoma; PFS, progression-free survival.
Frontiers in Bioengineering and Biotechnology | www.frontiersin.org December 2021 | Volume 9 | Article 760724 7 resting dendritic cells (DCs), activated natural killer (NK) cells, neutrophils, and CD8 + T cells, and markedly higher fractions of resting memory CD4 + T cells and M0 macrophages ( Figure 2E). Furthermore, ssGSEA analysis showed compared with cluster A, cluster B had lower immune activity in most immune cell types, included CD8 + T cells, tumor-infiltrating lymphocytes (TILs), and B cells, which indicated a deeper immunosuppressed state in cluster B ( Figure 2F). GSVA enrichment analysis revealed that cluster B was significantly enriched in ubiquitin-mediated proteolysis, mismatch repair-related pathways, progesterone-mediated oocyte maturation, and cell-cycle. Cluster A was markedly enriched in cytokinecytokine receptor interaction, neuroactive ligand receptor interaction, and PPAR signaling pathway-related pathways ( Figure 2G). These findings imply that m5C-related modification patterns might participate in TIME regulation in OSCC.

m5C Phenotype-Related Genes in OSCC
To identify the m5C-modification cluster-related modules, we constructed a gene co-expression network ( Figure 3A). β 6 was selected as the optimal soft threshold for building a scalefree topological network (Supplementary Figure S3A). As illustrated in Figure 3B; 11 modules were identified. The yellow module demonstrated the highest correlation with the m5C cluster (r 0.64, p 3e-48). In addition, genes in the yellow module were markedly co-expressed (weighted correlation 0.83, P < 1e-200) ( Figure 3C). 53 genes in this module were screened out for further analysis, with the screening criteria of |MM| ≥ 0.8 and |GS| ≥ 0.2 (Supplementary Table S4).
Next, we conducted a consensus clustering analysis based on m5C cluster-related genes. OSCC patients were divided into two m5C gene clusters: gene clusters A and B ( Figure 3D; Supplementary Figures S4B-D). Cluster B had the highest expression of most m5C regulators and poorer prognosis (p 0.032) (Figures 3E,F). We verified and obtained the same outcomes in the validation cohort (p < 0.001) ( Figure 3G; Supplementary Figures S3E-H).

m5Cscore Scoring System and Related Prognostic and Immune Characteristics in OSCC
We calculated the m5Cscore for individual OSCC patients using PCA. Patients with high-m5Cscore showed poorer OS than those with low-m5Cscore ( Figures 4A,B). A consistent prognosis result was verified using GSE41613 ( Figure 4C). The PFS of patients in high-m5cscore group was also poorer than those in low-m5Cscore group in GSE65858 ( Figure 4D). We further evaluated the m5Cscore between patients in different tumor stages, nodal stages, and clinical stages. As shown in Figures 4E-G, m5Cscore progressively increased with the progression of the T stage ( Figure 4E), N stage ( Figure 4F), and clinical stage ( Figure 4G). At different T and N stages, OSCC patients with a high m5Cscore showed poorer OS (Supplementary Figures S4 A-D). m5Cscore and m5C clusters and m5C gene clusters had a positive correlation, i.e., m5Cscore were higher in m5C cluster B and m5C gene cluster B (Supplementary Figures S4 E-F). In addition, compared with low-m5Cscore group, the SCs, ICs, and ESTIMATE scores in the high-m5Cscore group were noticeably lower, while the tumor purity score was noticeably higher (Figures 4H-K). What's more, the immune activity of most types of immune cells in the high-m5Cscore group were significantly lower, compared with low-m5Cscore group ( Figure 4L), which further suggests an association between a suppressive TIME and high m5Cscore in OSCC patients.
To verify the correlation between tumor mutational burden (TMB) and m5Cscores, we performed TMB quantification analysis in the high-and low-m5Cscore groups. In both m5Cscore groups, the somatic mutation rates of TP53, TTN, CDKN2A, NOTCH1, and CASP8 were relatively higher. Generally, the proportion and extent of somatic mutations in the high-m5Cscore group were higher and more extensive (Figures 5A,B). In addition, there was a high correlation between high TMB and high m5Cscore (p 0.0028) ( Figures  5C,D), as well as a worse prognosis ( Figure 5E). The combination of a high m5Cscore and high TMB suggested an extremely poor OS for patients with OSCC ( Figure 5F). Besides, m5Cscore was positively correlated with antigen processing machinery, CD8 + T effector, and immune checkpoint, but negatively correlated with cell cycle, Wnt target, mismatch repair, and cell cycle regulator by Spearman correlation analysis ( Figure 5G).

Predictive Effect of m5Cscore in OSCC Progression
We further investigated other predictive roles of m5Cscore in OSCC progression. Tumor recurrence is a crucial prognostic factor for OSCC. In this study, we compared the m5C regulators expression and m5Cscores between OSCC patients with and without recurrence in GSE31056. As illustrated in Figure 6A, there were five regulators-NOP2, DNMT1, DNMT3A, DNMT3B, and TET3-that were upregulated in OSCC patients with recurrence. Surprisingly, for patients with recurrence, m5Cscore was markedly higher than those without recurrence ( Figure 6B). Moreover, 100% of patients in the high-m5Cscore group experienced recurrence in the course of OSCC ( Figure 6C). We then analyzed the m5Cscore in patients with oral leukoplakia (Figures 6D-F). The m5Cscore was also markedly higher in patients with oral leukoplakia who developed OSCC. Three m5C regulators-NOP2, DNMT3B, and ALYREF-were upregulated ( Figure 6D). We also verified the expression levels of m5C regulators in the microarray dataset GSE85195, which contained 34 OSCC patients and 15 oral leukoplakia patients. The expression of eight m5C regulators was upregulated in OSCC patients compared with that in oral leukoplakia patients ( Figure 6G). These findings suggest that m5C methylation regulators might affect OSCC progression and that m5Cscore might has a predictive effect on the recurrence of OSCC and the deterioration of oral leukoplakia.

DISCUSSION
M5C-associated disorders are widely involved in the pathological process of many diseases, including cancer (Chen et al., 2019b).
Frontiers in Bioengineering and Biotechnology | www.frontiersin.org December 2021 | Volume 9 | Article 760724 8 However, the effects of m5C on the occurrence and progression of OSCC are still unclear. In this study, we analyzed the roles of m5C regulators in the TIME and prognosis in OSCC and developed the m5Cscore scoring system. The results of these analyzes indicated m5C modification patterns and m5Cscore have potential predictive ability in the prognosis and progress of OSCC.
In this work, based on sixteen m5C regulators, two m5C modification patterns were identified. Compared to cluster A, cluster B was characterized by worse survival outcomes and higher expression levels of m5C regulators. Accumulated evidence reveals that OSCC is an immunosuppressive disease. In the TIME of OSCC, immune cells such as TILs, DCs, NK cells, and T-cells are in an inactive state (Jewett et al., 2006). Our ssGSEA analysis results showed that compared with cluster A, cluster B had lower immune activity in the above immune cell types, which suggests a more suppressive state of the TIME in cluster B. Recent reports have demonstrated that TILs widely affect the TIME in OSCC (Boxberg et al., 2019) (Quan et al., 2016) (Chen J. et al., 2021). Among the TILs, high expression of CD8 + T cells has been proven to be related to a better prognosis in HNSCC (Nakano et al., 2001). Moreover, a high CD4/CD8 ratio could be regarded as a marker of poor survival outcomes in cancer patients (Sato et al., 2005) (Quan et al., 2016). Our CIBERSORT and ssGSEA analysis results showed that the expression level of CD8 + T cells in cluster B were noticeably lower, while the expression level of CD4 + T cells was markedly higher than that in cluster A. Combined with the OS analysis results, our analysis results were in accordance with previous Frontiers in Bioengineering and Biotechnology | www.frontiersin.org December 2021 | Volume 9 | Article 760724 10 findings. Therefore, we speculated that m5C-related modification patterns might influence the TIME of OSCC, potentially affecting eventual survival outcomes.
We then constructed the m5Cscore scoring system to evaluate the m5C modification pattern in individual OSCC patients. Patients with a high m5Cscore had a more suppressive TIME state and worse OS outcomes compared with patients with a low m5Cscore. In addition, a higher m5Cscore was associated with poor survival outcomes and high TMB. These results indicate that m5Cscore may provide a new approach for predicting the TIME state and prognosis in OSCC.
For OSCC patients, locoregional recurrence after surgical excision is a problem that could negatively affect prognosis (Anderson et al., 2015). Recent studies have reported that the recurrence rate of OSCC was 16.0-32.7%; the survival rate was markedly lower in patients with local recurrence (33.3%), compared with that in patients without local recurrence (94.3%) (Yanamoto et al., 2012) (Wang et al., 2013). In this study, we analyzed the m5Cscore in OSCC patients with or without recurrence. Surprisingly, the m5Cscore in patients with recurrence was markedly higher, compared with those without recurrence. Five m5C regulators-NOP2, DNMT1, DNMT3A, DNMT3B, and TET3-were significantly upregulated in OSCC patients with recurrence. We speculate that there is a correlation between m5C methylation modification and OSCC recurrence, which is worthy of further in-depth study.
Approximately 0-36% of oral leukoplakia cases are expected to progress to OSCC (van der Waal, 2014) (Lee et al., 2000) (Chaturvedi et al., 2020). However, in clinics, there is still a lack of sensitive and effective methods to predict the deterioration of oral leukoplakia. Our analysis indicated that the m5Cscore was markedly higher in patients with oral leukoplakia who developed OSCC. The expression of three m5C regulators-NOP2, DNMT3B, and ALYREF-was upregulated in oral leukoplakia patients from two microarray datasets, GSE26549 and GSE85195. These three gene signatures may serve as deterioration biomarkers for patients with oral leukoplakia. However, further experimental and clinical experiments are needed to verify this hypothesis.
Human papillomavirus (HPV) infection was once thought to be a trigger for OSCC. However, recent studies have dismissed this viewpoint. No significant difference exists in the survival outcomes between HPV-positive and HPVnegative OSCC patients (Kansy et al., 2014) (Jiang and Dong, 2017). In this study, GSE65858 contained 70 HPVnegative and 13 HPV-positive OSCC patients. Our analysis results indicated that no significant difference existed in the prognosis (p 0.109) and m5Cscore (p 0.9) between HPVnegative and HPV-positive OSCC patients (Supplementary Figures S4G-H). In addition, the validation cohort GSE41613 containing 97 HPV-negative OSCC patients. We obtained almost consistent results for m5C-regulator expression, TIME characteristics, and survival outcomes in GSE41613, which suggested our findings is stable and reliable.
To summarize, this study revealed that m5C modification patterns play a crucial role in the regulation of the TIME in OSCC and can affect the prognosis and progress of OSCC patients. The constructed m5Cscore scoring system could predict the TIME state and the prognosis of individual OSCC patients, including the survival outcome, tumor recurrence, and deterioration of oral leukoplakia to OSCC. The m5C modification patterns and the m5Cscore scoring system might help in the prediction of the progress and prognosis of OSCC.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/ Supplementary Material.

AUTHOR CONTRIBUTIONS
LG, YK, and XH: conceptualization; LG, RC and LZ: investigation, methodology, formal analysis, and writing-original draft preparation; LG, MS and MM: data curation, Review and Editing; YK, XH, and KO: funding acquisition and supervision.

ACKNOWLEDGMENTS
We would like to thank Editage (www.editage.com) for English language editing.