The Expression and Effection of MicroRNA-499a in High-Tobacco Exposed Head and Neck Squamous Cell Carcinoma: A Bioinformatic Analysis

Background: Few studies have directly investigated the differential expression of microRNAs (miRNAs) in head and neck squamous cell carcinoma (HNSCC) with low, medium, and high tobacco exposure. The purpose of this study is to screen the differentially expressed miRNAs and to investigate their clinical significance and potential biological mechanisms in the three groups of HNSCC. Methods: The datasets of HNSCC were obtained from The Cancer Genome Atlas (TCGA). The edgeR package was used to determine differentially expressed miRNAs and genes among the three groups of HNSCC. Statistical methods were applied to assess the clinical significance of miRNA and its correlation with genes. The correlation between gene expression and clinical characteristics was analyzed using weighted gene co-expression network analysis (WGCNA). Three online databases were used to predict the target genes of miRNAs. More importantly, qRT-PCR was employed to verify the differential expression of miRNAs and genes in our patients. Results: 32 differentially expressed miRNAs and 1,820 differentially expressed genes were found among the three groups of HNSCC. Patients with high expression of hsa-miR-499a had lower overall survival than the ones with low expression in high-tobacco exposed HNSCC. Cox regression analysis found that high expression of hsa-miR-499a and female were independent risk factors for prognosis in high-tobacco exposed HNSCC. Chi-square test found that hsa-miR-499a was associated with N stage in high-tobacco exposed HNSCC. WGCNA identified four gene modules associated with N stage in high-tobacco exposed HNSCC. Then three online databases were used to predict potential target genes for hsa-miR-499a, which were AEBP2 and ZNRF1. Pearson correlation analysis showed that hsa-miR-499a was negatively correlated with AEBP2 and ZNRF1. qRT-PCR supported bioinformatic results that hsa-miR-499a, AEBP2, and ZNRF1 were differentially expressed among the three groups of HNSCC in our patients. Conclusion: 32 differentially expressed miRNAs and 1,820 differentially expressed genes were successfully identified in HNSCC with low, medium, and high tobacco exposure. The patients with high expression of hsa-miR-499a had poor prognoses compared with patients with low expression in high-tobacco exposed HNSCC. Hsa-miR-499a was associated with N stage in high-tobacco exposed HNSCC. AEBP2 and ZNRF1 were the potential target genes of hsa-miR-499a.


INTRODUCTION
Head and neck squamous cell carcinoma (HNSCC) is the sixth most common human cancer (1). Most patients have locally advanced disease, and more than 50% of patients relapse within 3 years (2). About two-thirds of HNSCC patients have advanced disease, usually involving regional lymph nodes metastasis (3). HNSCC remains a clinical challenge, especially for patients diagnosed with the advanced or recurrent disease, HNSCC is almost fatal (4). Although significant efforts have been made to understand HNSCC, its mechanisms of occurrence and development has not yet been elucidated. Obviously, the exact mechanisms for exploring the occurrence and development of HNSCC is of great significance.
On one hand, to date, many studies have shown that tobacco exposure is an important risk factor for HNSCC (5)(6)(7)(8). Cumulative tobacco tar exposure value is closely related to HNSCC (9). On the other hand, abnormal expression of miRNA has now been demonstrated in many cancer types (10,11). The correlation between the abnormal expression of some miRNAs and outcome of HNSCC patients is indeed reported (12,13). However, the role of miRNAs in HNSCC patients with different level of tobacco exposure remains unknown.
To further explore the relationship between patient clinical characteristics and gene expression, we apply co-expression network analysis. Weighted gene co-expression network analysis (WGCNA) is widely used to analyze large-scale datasets and to find modules for highly related genes (14), and it is successfully used to explore the association between gene expression and clinical characteristics, and to identify candidate biomarkers (15).
The present study combined and analyzed the miRNA-seq data, gene expression data (mRNA-seq data) and clinical data in The Cancer Genome Atlas (TCGA) database. Using bioinformatic analysis, this study identified differentially expressed miRNAs and genes in HNSCC with low, medium, and high tobacco exposure, and explored the possible role of hsa-miR-499a in high-tobacco exposed HNSCC. WGCNA was used to predict potential gene modules associated with N stage of high-tobacco exposed HNSCC. The potential target genes of hsa-miR-499a, i.e., AEBP2, ZNRF1, were predicted using three online databases and Pearson correlation analysis. In addition, we observed the independent effect of lifetime tobacco exposure value on the expression of hsa-miR-499a, AEBP2, and ZNRF1, excluding the effect of age at initial diagnosis. Furthermore, we validated the differential expression of hsa-miR-499a, AEBP2, and ZNRF1 in HNSCC patients with low, medium, and high tobacco exposure using tumor tissues of HNSCC patients from our research center by qRT-PCR.

Patient Selection, miRNA and Gene Screening
The HNSCC datasets were downloaded from TCGA (https:// cancergenome.nih.gov/). Our research included laryngeal, hypopharyngeal, oropharyngeal, and tonsillar carcinomas of HNSCC. A total of 182 HNSCC patients were found in TCGA database, which included 118 patients with lifetime tobacco exposure value. Those HNSCC patients with complete information containing miRNA-seq data, gene expression data (mRNA-seq data) and clinical data could be included in our study. Ten HNSCC patients were excluded due to lack of miRNA-seq or mRNA-seq data. And finally, 108 HNSCC patients were selected for subsequent analysis, and their clinical characteristics were shown in Supplementary Table 1.
In TCGA database, numerically calculated values represent lifetime tobacco exposure, defined as the number of cigarettes per day multiplied by the number of years of smoking divided by 20. The HNSCC patients were categorized into three groups based on the lifetime tobacco exposure value which was <30 for lowtobacco exposed HNSCC (n = 31), ≥30 and <50 for mediumtobacco exposed HNSCC (n = 36), and ≥50 for high-tobacco exposed HNSCC (n = 41).
With normalized data, analysis of differentially expressed miRNAs and genes was performed using the edgeR package in R software. The fold change (FC), P-value and false discovery rate (FDR) were calculated. Next, the differentially expressed miRNAs we found in the three groups of HNSCC were further divided into low and high expressions with the median of each miRNA expression level as the cut-off value in lowtobacco, medium-tobacco, and high-tobacco exposed HNSCC, respectively (Supplementary Table 2).
To verify the bioinformatic analysis results, we further collected 22 cases of smoking patients with HNSCC (including 15 cases of laryngeal cancer, 4 cases of hypopharyngeal cancer, 1 case of oropharyngeal cancer, and 2 cases of tonsillar cancer) from the Ruijin Hospital, School of Medicine, Shanghai Jiaotong University. Our HNSCC patients (or their parents or guardians) have signed the written informed consent form. The use of human tissue samples and clinical data has been approved by the Ruijin Hospital Ethics Committee. Similarly, we used the lifetime tobacco exposure value to group patients. The patients with lifetime tobacco exposure value <30 were included in low-tobacco exposed group, ≥30 and <50 were included in medium-tobacco exposed group, and ≥50 were included in high-tobacco exposed group (including 4 low-tobacco exposed HNSCC, 6 medium-tobacco exposed HNSCC, and 12 high-tobacco exposed HNSCC). We collected patient's tumor specimens and recorded the patient's clinical characteristics. The general characteristics of these patients were comparable (Supplementary Table 3). The entire study design followed appropriate medical ethics principles.

Association of miRNAs With Patients' Prognosis and Clinical Characteristics
Kaplan-Meier survival analysis (log-rank method) and univariate/multivariate Cox proportional hazard regression analysis were used to explore the relationship between each parameter and overall survival of the patients in TCGA database. Chi-square test was used to assess the relationship between differentially expressed miRNAs and clinical characteristics of patients.

Gene Expression Profiling Data Processing
Gene expression data (mRNA-seq data) was obtained from TCGA database. A total of 32,216 genes were identified for each sample. The variance analysis was performed, and it was ranked from large to small. The top 25% of the genes (8,054 genes) with larger variance were selected for WGCNA analysis.

WGCNA Co-expression Network Construction
The expression data profile of these 8,054 genes was constructed to a gene co-expression network using the WGCNA package in R software (15). An adjacency matrix was constructed using the WGCNA function adjacency by calculating the Pearson correlation between all pairs of genes in all selected samples. In this study, the power of β = 7 (scale-free R 2 = 0.9) was used as a soft threshold parameter to ensure a scale-free network. To further identify functional modules in the co-expression network with these 8,054 genes, the adjacency matrix was used to calculate the topological overlap measurement (TOM) representing the overlap in the shared neighbors.

Identification of Clinical Significant Modules
The module eigengenes (MEs) were considered to be a representation of the gene expression profile in the module. Correlation and P-value between the module and clinical characteristics were evaluated by calculating the MEs. In the correlation between the module and clinical characteristics, red represents a positive correlation with clinical characteristics, and green represents a negative correlation with clinical characteristics.

Gene Ontology and Pathway Enrichment Analysis
The Database for Annotation, Visualization and Integrated Discovery (DAVID) is a database for annotating gene functions and visualizing them. Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis of the genes within significant modules were performed using the DAVID (16) (version 6.8; https://david.ncifcrf.gov/) online tool: Functional Annotation Tool. The ontology contained three categories: biological processes (BP), cellular components (CC), and molecular functions (MF). Enriched GO terms and KEGG pathways were determined based on the adjusted critical criteria of P < 0.05.

Differential Expression Analysis Among Three Different Age Groups of Non-smoking HNSCC
Because there were differences in age at initial diagnosis among the low-tobacco, medium-tobacco, and high-tobacco exposed HNSCC (Supplementary Table 1, P = 0.02), we performed this additional analysis. Sixty four non-smoking HNSCC patients were found in TCGA database. Those HNSCC patients with complete information containing miRNA-seq data, gene expression data (mRNA-seq data) and clinical data could be included in this additional analysis. Five non-smoking patients were excluded due to lack of miRNA-seq or mRNA-seq data. And finally, 59 non-smoking HNSCC patients were selected for this additional analysis, and their clinical characteristics were shown in Supplementary Table 1.
The non-smoking patients were categorized into three groups based on the age at initial diagnosis which was ≤55 for low-age group (n = 19), >55 and <65 for medium-age group (n = 21), ≥65 for high-age group (n = 19). With normalized data, analysis of differentially expressed miRNAs and genes was performed using the edgeR package in R software.

Quantitative Reverse Transcription-Polymerase Chain Reaction (qRT-PCR)
Total RNA was isolated from HNSCC samples using TRIzol reagent. MiRNA first strand cDNA synthesis (Tailing Reaction) and microRNAs qPCR (SYBR Green Method) kits were purchased from Sangon Biotech for miRNA qRT-PCR reactions. HiScript III RT SuperMix for qPCR Kit and ChamQ Universal SYBR qPCR Master Mix were purchased from Vazyme Biotech for mRNA qRT-PCR reactions. MiRNA levels were normalized to U6 gene expression, mRNA levels were normalized to GAPDH gene expression, and relative expression levels were analyzed using the Ct method. The primers were designed and purchased from Sangon Biotech. The following primers were used: hsa-miR-499a-5p, forward,

Statistical Analysis
Statistical analysis was performed using SPSS Statistics software (version 24.0) and R software (version 3.4.2; https://www. r-project.org/). P-value <0.05 was considered statistically significant. The column diagram was graphed with Graphpad Prism software (version 7.0a).

Differentially Expressed miRNAs and Genes
From TCGA database, we noted that 1,881 miRNAs and 32,216 genes were analyzed for each patient. After restricting the cutoff criteria (logFC > 1, FDR < 0.05), we found 3 differentially expressed miRNAs and 637 differentially expressed genes between low-tobacco and medium-tobacco exposed HNSCC, 23 differentially expressed miRNAs and 805 differentially expressed genes between medium-tobacco and high-tobacco exposed HNSCC, and 25 differentially expressed miRNAs and 1,038 differentially expressed genes between low-tobacco and hightobacco exposed HNSCC (Tables 1-3, Supplementary Table 4). These results contain 32 differentially expressed miRNAs and 1,820 differentially expressed genes.

Prognostic Value of Differentially Expressed miRNAs in the Three Groups of HNSCC
The survival data were extracted from TCGA database. We identified the effects of 32 differentially expressed miRNAs on survival rate in the three groups of HNSCC. We found that patients with low expression of hsa-miR-499a (Figure 1) and hsa-miR-129-2 (Supplementary Figure 1), and high expression of hsa-miR-508 (Supplementary Figure 2) had higher survival rates than the ones with high expression of hsa-miR-499a and hsa-miR-129-2, and low expression of hsa-miR-508 in hightobacco exposed HNSCC. While different expression levels of hsa-miR-499a, hsa-miR-129-2 and hsa-miR-508 had no effect on patients' survival in low-tobacco and medium-tobacco exposed HNSCC. Twenty nine other miRNAs were not found to have an effect on overall survival in the three groups of HNSCC (P > 0.05).
In addition, we evaluated the prognostic value of hsa-miR-499a, hsa-miR-129-2, hsa-miR-508, and the clinical parameters in the three groups of HNSCC patients by univariate and multivariate Cox proportional hazard regression analysis. We found that high expression of hsa-miR-499a (HR, 3.28) and female (HR, 3.49) were risk factors for prognosis in hightobacco exposed HNSCC. Furthermore, high expression of hsa-miR-499a (HR, 3.26) and female (HR, 3.38) were found to be independent risk factors for prognosis in high-tobacco exposed HNSCC ( Table 4). We did not find the prognostic value of hsa-miR-499a in low-tobacco and medium-tobacco exposed HNSCC (Supplementary Table 5). However, we did not find the prognostic value of hsa-miR-129-2, hsa-miR-508 in the three groups of HNSCC (P > 0.05). Therefore, we chose hsa-miR-499a for further analysis.
Identifying the Clinical Significance of hsa-miR-499a in the Three Groups of HNSCC To identify the clinical significance of hsa-miR-499a, we evaluated the association between the expression of hsa-miR-499a and clinical characteristics of the patients with high-tobacco exposure ( Table 5), low-tobacco and medium-tobacco exposure (Supplementary Table 6). In high-tobacco exposed HNSCC, the expression of hsa-miR-499a was associated with N stage (P < 0.01). In low-tobacco and medium-tobacco exposed HNSCC, no correlation between the expression of hsa-miR-499a and clinical characteristics were found.

Weighted Co-expression Network Construction and Key Modules Identification
To further explore the genes associated with N stage in hightobacco exposed HNSCC, we performed a WGCNA analysis. The samples of high-tobacco exposed HNSCC patients (n = 41) were clustered using average linkage method and Pearson correlation method. We excluded 1 outlier sample and finally included 40 samples for analysis (Figure 2). Constructing a WGCNA needs the best soft-thresholding power to which co-expression similarity is raised to calculate adjacency. Therefore, we performed a network topology analysis of various soft-thresholding powers to have relatively balanced scale independence and average connectivity of WGCNA. In this study, the power of β = 7 (scale-free R 2 = 0.9) was selected as the soft-thresholding parameter to ensure a scale-free network (Figure 3).
Through dynamic tree cut and merged dynamics, 47 different gene modules were generated in a hierarchical clustering tree from 40 samples, and each module marked by a different color was displayed through a tree diagram, wherein each tree branch FIGURE 1 | Survival curve for the three groups of HNSCC using Kaplan-Meier analyses (log-rank method). In low-tobacco (A, P = 0.11) and medium-tobacco (B, P = 0.73) exposed HNSCC, the patients with various expression levels of hsa-miR-499a were not different in overall survival. In high-tobacco exposed HNSCC, the patients with low expression of hsa-miR-499a had higher overall survival rates than the ones with high expression (C, P = 0.02). constituted a module and each leaf in the branch was one gene. As shown in Figure 4A, the horizontal line defined the threshold, by merging similar modules, 46 distinct gene modules were identified (Figure 4B). We selected the modules that were negatively correlated with N stage and P < 0.01. Four modules (lightsteelblue1, tan, violet, lightcyan) were found to have association with N stage (Figure 4C, Supplementary Table 7), and those modules were selected as the significant modules for further analysis.

GO and KEGG Pathway Enrichment Analysis
The genes of significant modules were categorized into 3 functional groups, i.e., BP, CC, and MF. The genes in the BP group were mainly enriched in cell-cell adhesion, cell migration, angiogenesis, regulation of cell proliferation, and Ras protein signal transduction. The genes in the CC group were significantly enriched in cell-cell adherens junction, cytoplasm, cytosol, extracellular space, and membrane. The genes in the MF group were mainly enriched in cadherin binding involved in cell-cell adhesion, serine-type endopeptidase inhibitor activity, protease binding, structural molecule activity and motor activity. According to KEGG pathway analysis, our results demonstrated that these genes were mainly involved in MAPK signaling pathway, protein processing in endoplasmic reticulum, viral carcinogenesis, endocytosis and choline metabolism in cancer. These results indicated that the genes of significant modules were mainly involved in cell-cell adherens junction (Figure 5, Supplementary Table 8).

Predicting Potential Target Genes of hsa-miR-499a
The potential target genes of hsa-miR-499a were predicted using three online databases we introduced above. Next, we selected the top 50 genes in each significant module according to the degree of connectivity within the module. To improve the accuracy of the prediction, we performed the Pearson correlation analysis between hsa-miR-499a and 1,820 genes differentially expressed among the three groups of HNSCC. According to P < 0.05, we found 342 genes associated with hsa-miR-499a, in which AEBP2 (Cor = −0.343, P = 0.028) and ZNRF1 (Cor = −0.330, P = 0.035) were negatively correlated with hsa-miR-499a (Supplementary Table 9). Finally, we took the intersection of the above three results. AEBP2 and ZNRF1 were identified as potential target genes for hsa-miR-499a ( Figure 6A).

Independent Effect of Lifetime Tobacco
Exposure Value on the Expression of hsa-miR-499a, AEBP2, and ZNRF1 There were differences in age at initial diagnosis among the low-tobacco, medium-tobacco, and high-tobacco exposed HNSCC (Supplementary Table 1, P = 0.02). In order to further determine the independent effect of lifetime tobacco exposure value on the expression of hsa-miR-499a, AEBP2, and ZNRF1, we performed differential expression analysis among the three nonsmoking groups of HNSCC. After excluding the effect of tobacco exposure, we did not find differential expression of hsa-miR-499a, AEBP2, and ZNRF1 between low-age group and medium-age group, medium-age group and high-age group, as well as lowage group and high-age group (Supplementary Table 10, FDR > 0.05). These results indicated that the differential expression of hsa-miR-499a, AEBP2, and ZNRF1 in HNSCC with low, medium, and high tobacco exposure was due to the different lifetime tobacco exposure value, rather than the different age at initial diagnosis.
Validation of the Differential Expression of hsa-miR-499a, AEBP2, and ZNRF1 To validate the results of bioinformatic analysis, we conducted a qRT-PCR for evaluation of hsa-miR-499a-5p, hsa-miR-499a-3p, AEBP2, and ZNEF1 expression levels of tumor samples in lowtobacco, medium-tobacco, and high-tobacco exposed HNSCC patients in our hospital. We found that the expression of hsa-miR-499a-5p and hsa-miR-499a-3p in high-tobacco exposed HNSCC was higher than those in low-tobacco and mediumtobacco exposed HNSCC, and the expression of hsa-miR-499a-5p and hsa-miR-499a-3p in medium-tobacco exposed HNSCC was higher than those in low-tobacco exposed HNSCC. Meanwhile, the expression of AEBP2 and ZNRF1 in hightobacco exposed HNSCC was lower than those in low-tobacco and medium-tobacco exposed HNSCC, and the expression of AEBP2 and ZNRF1 in medium-tobacco exposed HNSCC was lower than those in low-tobacco exposed HNSCC (Figure 7). The qRT-PCR results validated the differentially expressed hsa-miR-499a, AEBP2, and ZNRF1 we found in the analysis of the TCGA datasets.
In addition, we used the TargetScan database to predict the binding site of hsa-miR-499a-5p to AEBP2, and the binding site of hsa-miR-499a-3p to ZNRF1 (Figure 6B). Then, we performed the Pearson correlation analysis of expression levels between hsa-miR-499a and AEBP2, ZNRF1 in our own patients. The results showed that hsa-miR-499a-5p was negatively correlated with AEBP2, and hsa-miR-499a-3p was negatively correlated with ZNRF1 (Figure 8, Table 6), which was consistent with the results from the analysis of TCGA datasets.

DISCUSSION
HNSCC is a serious hazard to human health, and it is easy to relapse even after comprehensive treatment. The 5-year overall survival rate of HNSCC patients is <50%. Although the treatment of HNSCC has improved over the past few decades, the prognosis of patients with advanced HNSCC remains poor. Therefore, it is extremely important to study the mechanisms of HNSCC occurrence and progression.
Previous studies have confirmed that smoking is an independent risk factor for the occurrence and progression of HNSCC (5)(6)(7)(8), and lifetime cumulative tobacco tar exposure is associated with cancer risk (9). Most studies categorize tobacco exposure as "forever" and "never" or according to the number of years of smoking. The former does not consider the important impact of smoking cessation on the risk of tobacco-attributable cancer. The latter did not take into account the impact of smoking volume on the risk of cancer (20). In TCGA database, numerically calculated values represent lifetime tobacco exposure, defined as the number of cigarettes per day multiplied by the number of years of smoking divided by 20 (21,22). Lifetime tobacco exposure value includes the number of cigarettes smoked and the time of smoking. Even patients who have quit smoking can still assess the effects of tobacco on tumorigenesis and tumor development. On the other hand, the abnormal expressions of miRNAs play an important role in HNSCC (23,24). Studies have shown that the recruitment of miRNA-induced silencing complexes (miRISC) is linked to mRNA targets to reduce target protein production (25). However, in HNSCC with different tobacco exposure, the roles of miRNAs remain unknown. To our knowledge, this study is the first to use miRNA-seq data, gene expression data (mRNA-seq data) and clinical data from HNSCC patients in TCGA database to analyze differentially expressed miRNAs and genes in HNSCC with low, medium and high tobacco exposure, and to explore the relationship between differentially expressed miRNAs and clinical characteristics, and to predict the potential target genes of miRNA by WGCNA, online databases and Pearson correlation analysis.
After analyzing the TCGA datasets, we found 32 differentially expressed miRNAs and 1,820 differentially expressed genes among the three groups of HNSCC, including hsa-miR-499a, which was upregulated in high-tobacco exposed group compared to low-tobacco (logFC = 3.9279, FDR = 0.0005) and mediumtobacco (logFC = 2.7289, FDR = 0.0075) exposed groups. Although hsa-miR-499a had a trend to upregulate in mediumtobacco exposed group compared to low-tobacco exposed group (logFC = 1.1367, P = 0.0124), we could not currently consider hsa-miR-499a to be upregulated in medium-tobacco exposed group due to FDR = 0.3034 in the analysis of TCGA datasets. However, the qRT-PCR of our own HNSCC patients showed that the expression of hsa-miR-499a-5p and hsa-miR-499a-3p in high-tobacco exposed HNSCC was higher than those in low-tobacco and medium-tobacco exposed HNSCC, and the expression of hsa-miR-499a-5p and hsa-miR-499a-3p in medium-tobacco exposed HNSCC was higher than those in low-tobacco exposed HNSCC. Therefore, hsa-miR-499a has a potential tendency to increase expression with increasing tobacco exposure. The results of differential expression analysis mean different tobacco exposed HNSCC may lead to different regulatory modes of miRNAs and genes. Most importantly, our research has gained new information that can guide further research. Previous other studies have shown that the 32 differentially expressed miRNAs we find are related to the biological characteristics of tumors. Such as hsa-miR-509 promotes cisplatin-induced apoptosis in ovarian cancer cells (26), hsa-miR-506 enhances the sensitivity of human colorectal cancer cells to oxaliplatin (27), hsa-miR-129 regulates cisplatin-resistance in human gastric cancer cells (28), hsa-miR-503 enhances the radiosensitivity of laryngeal carcinoma cells (29) and so on. These miRNAs are closely related to chemoradiotherapy sensitivity. Because of different tobacco exposure, miRNAs in HNSCC patients are differentially expressed, which means the clinical characteristics of patients may also be different, and the patient's treatment may change, for example, the way of chemoradiotherapy.
In our research, KM survival analysis and Cox regression showed that patients with high expression of hsa-miR-499a had a poor prognosis compared with the ones with low expression of hsa-miR-499a in high-tobacco exposed HNSCC. Chi-square test showed that hsa-miR-499a was associated with N stage in high-tobacco exposed HNSCC. In low-tobacco and mediumtobacco exposed HNSCC, we did not find the clinical significance of hsa-miR-499a.
It has been reported that hsa-miR-499a can have dual and opposite functions in various cancer types, either as a tumor suppressor miRNA or as a tumorigenic miRNA. Hsa-miR-499a is highly expressed in colorectal cancer patients with lymph node metastasis and promotes the migration and invasion of colorectal cancer by regulating the expression of FOXO4 and PDCD4 (30). In contrast, hsa-miR-499a inhibits the expression of EST1 and blocks the invasion and migration of HepG2 cells. This means that hsa-miR-499a may have tumor suppressor function in the pathogenesis of hepatocellular carcinoma by targeting ETS1 (31). ETS1 has been reported as an oncogene in previous studies, and in most cancers, ETS1 expression is associated with poor survival (32). Similarly, the expression of hsa-miR-499a is low in oral squamous cell carcinoma tissue compared with the corresponding adjacent normal tissue. It was also found that the expression level of hsa-miR-499a is related to advanced disease and survival rate (33). However, the role of hsa-miR-499a needs further investigation in HNSCC.
We further performed WGCNA in high-tobacco exposed HNSCC to explore gene co-expression modules associated with N stage. A total of 8,054 genes were used to construct a coexpression network and 46 modules were identified. We selected the modules that were negatively correlated with N stage and P < 0.01. Four modules (lightsteelblue1, tan, violet, lightcyan) were found to have association with N stage.
We used three online databases (DIANA, TargetScan, miRDB) to predict potential target genes of hsa-miR-499a. To improve the accuracy of the prediction, we performed the Pearson correlation analysis between hsa-miR-499a and 1,820 genes differentially expressed among the three groups of HNSCC. According to P < 0.05, we found 342 genes associated with hsa-miR-499a, in which AEBP2 (Cor = −0.343, P = 0.028) and ZNRF1 (Cor = −0.330, P = 0.035) were negatively correlated with hsa-miR-499a. After taking the intersection of the genes associated with N stage obtained by WGCNA, the target genes of hsa-miR-499a predicted by three online databases, and genes co-expressed with hsa-miR-499a, AEBP2 and ZNRF1 were found to be potential target genes for hsa-miR-499a. Some studies have reported the role of ZNRF1. CBX8 triggers the progression of hepatocellular carcinoma by increasing the expression of hsa-miR-365a-3p down-regulated ZNRF1 (34). Deletion of ZNRF1 may be involved in the mechanism of B-cell acute lymphoblastic leukemia associated with PAX5 alteration (35). This indicates that ZNRF1 is a tumor suppressor gene.
Studies have reported the function of AEBP2 in tumors. Deletion of other members of AEBP2 and polycomb suppression complex 2 is associated with secondary AML following chronic myeloproliferative neoplasms and myelodysplastic syndromes (36). The absence of AEBP2 is also pathologically important in children with new-onset AML (37). These results indicate that the absence of AEBP2 leads to the occurrence and progression of leukemia and AEBP2 is a tumor suppressor gene.
Furthermore, although there were differences in age at initial diagnosis among the low-tobacco, medium-tobacco, and hightobacco exposed HNSCC, the additional analysis indicated that the differential expression of hsa-miR-499a, AEBP2, and ZNRF1 FIGURE 7 | The results of qRT-PCR showed that (A) the expression of hsa-miR-499a-5p and hsa-miR-499a-3p in high-tobacco exposed HNSCC was higher than those in low-tobacco and medium-tobacco exposed HNSCC, and the expression of hsa-miR-499a-5p and hsa-miR-499a-3p in medium-tobacco exposed HNSCC was higher than those in low-tobacco exposed HNSCC. (B) The expression of AEBP2 and ZNRF1 in high-tobacco exposed HNSCC was lower than those in low-tobacco and medium-tobacco exposed HNSCC, and the expression of AEBP2 and ZNRF1 in medium-tobacco exposed HNSCC was lower than those in low-tobacco exposed HNSCC (*P < 0.05, **P < 0.01).

FIGURE 8 |
The scatter plot showed a negative correlation between hsa-miR-499a-5p and AEBP2 (A), and the scatter plot showed a negative correlation between hsa-miR-499a-3p and ZNRF1 (B) in 12 high-tobacco exposed HNSCC patients of our hospital. in HNSCC with low, medium, and high tobacco exposure was due to the different lifetime tobacco exposure value, rather than the different age at initial diagnosis.
In our bioinformatic analysis, hsa-miR-499a, AEBP2, and ZNRF1 were differentially expressed in HNSCC with low, medium, and high tobacco exposure. Our own patients' results also supported the differential expression of hsa-miR-499a, AEBP2, and ZNRF1 among the three groups of HNSCC by qRT-PCR.
Our current study is a preliminary investigation of the clinical significance and potential biological mechanisms of hsa-miR-499a in HNSCC with low, medium, and high tobacco exposure. We will further use our clinical samples to verify the clinical and prognostic value of hsa-miR-499a in HNSCC with low, medium, and high tobacco exposure. Our research may reveal the underlying mechanisms of hsa-miR-499a in HNSCC oncogenesis and provide novel strategy in HNSCC treatment.

CONCLUSION
Our study successfully identified 32 differentially expressed miRNAs and 1,820 differentially expressed genes in HNSCC with low, medium, and high tobacco exposure. The patients with high expression of hsa-miR-499a had poor prognoses compared with patients with low expression in hightobacco exposed HNSCC. Hsa-miR-499a was associated with N stage in high-tobacco exposed HNSCC. The potential target genes of hsa-miR-499a were identified as AEBP2 and ZNRF1 based on WGCNA, three online databases and Pearson correlation analysis. Even so, further clinical and experimental research are needed to verify these conclusions.

DATA AVAILABILITY
The datasets generated for this study can be found in TCGA, https://cancergenome.nih.gov/.

ETHICS STATEMENT
Our HNSCC patients (or their parents or guardians) have signed the written informed consent form. The use of human tissue samples and clinical data has been approved by the Ruijin Hospital Ethics Committee.

AUTHOR CONTRIBUTIONS
S-QG is responsible for research design, data collection, bioinformatic analysis, qRT-PCR experiments, and manuscript writing. MX is responsible for data collection and statistical analysis. M-LX is responsible for providing tumor samples and guiding research ideas. Y-MS guides research ideas, data interpretation, and experimental methods. HZ guides research ideas, design, research methods, and manuscript revision.