Analysis of Differentially Expressed Long Non-coding RNAs and the Associated TF-mRNA Network in Tongue Squamous Cell Carcinoma

Accumulating evidence indicates that long non-coding RNAs (lncRNAs) play crucial roles in tongue squamous cell carcinoma (TSCC) tumorigenesis. However, the comprehensive regulation of lncRNAs-transcription factors (TFs)-messenger RNAs (mRNAs) in TSCC remains largely unknown. The purpose of this study was to identify aberrantly expressed lncRNAs and the associated TF-mRNA network in TSCC. To explore lncRNA and mRNA expression profiles and their biological functions in TSCC, we surveyed the lncRNA and mRNA expression profiles of TSCC and adjacent tissues using next-generation RNA sequencing in six patients. Thousands of significantly differentially expressed lncRNAs (DELs) and mRNAs (DEGs) were identified. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were performed to demonstrate the principal functions of the significantly dysregulated lncRNAs and genes. A total of 40 DELs were screened between TSCC and adjacent non-cancerous tissues. Results obtained from GEPIA and StarBase confirmed the expression levels of nine pivotal DELs obtained in our study. Three of the nine deregulated DELs were identified to have a significant impact on the overall survival of TSCC patients, which were evaluated with GEPIA and StarBase. LncMAP was used to predict the lncRNA-TF-mRNA triplets in TSCC. Furthermore, based on these results, we established lncRNA-TF-mRNA coexpression networks for the up- and downregulated lncRNAs using Cytoscape. We also found that among the nine pivotal lncRNAs, there is limited research on the abnormally expressed lncRNAs, including RP11-54H7.4, CTD-2545M3.8, RP11-760H22.2, RP4-791M13.3, and LINC01405, in TSCC pathogenesis. This is the first study to show that RP11-54H7.4, LINC00152, and LINC01405 can be acted as a prognostic target for TSCC. Our findings provide a novel perspective and lay the foundation for future research on the potential roles of lncRNAs, TFs, and mRNAs in TSCC. Verification of the potential lncRNA-TF-mRNA regulatory networks will provide a more comprehensive understanding of the pathogenesis of TSCC.


INTRODUCTION
As the most lethal and commonly occurring oral malignancy, oral squamous cell carcinoma (OSCC) accounts for 95% of all oral cancer diagnoses and causes more than 500,000 deaths per year (1). Approximately 25-40% of OSCCs are diagnosed as tongue squamous cell carcinoma (TSCC) (2). TSCC is characterized by frequent lymphoid metastasis, a high rate of regional recurrence, and a poor prognosis. Moreover, despite great progress in surgical techniques, diagnostic procedures, chemotherapy, and radiotherapy, the 5-year overall survival rate of TSCC patients has not improved significantly over the past decades. Therefore, a better understanding of the molecular mechanisms behind the initiation and progression of TSCC may facilitate the diagnosis and treatment of this malignant tumor.
Long non-coding RNA (lncRNA) represents a class of RNA species that are transcribed predominantly by RNA polymerase II, with a length exceeding 200 nucleotides and no apparent protein-coding potential, as indicated by the lack of strongly translated open reading frames (ORFs) (3). lncRNA plays a crucial role in diverse biological systems through genomic imprinting, cell cycle regulation, and cell differentiation and has been linked to a number of human diseases, especially cancer (4,5). Recently, growing evidence has demonstrated that lncRNAs may also have essential roles in TSCC. For example, overexpression of the lncRNA FALEC represses malignant behaviors in TSCC (6). Upregulation of the lncRNA HOTTIP in TSCC patients indicates a poor clinical prognosis (7). In TSCC cell lines, cell growth, invasion, and migration abilities are associated with the overexpression of MALAT-1 and AFAP1-AS1 via the Wnt signaling pathway, while the knockdown of MALAT-1 in TSCC cells leads to the upregulation of certain SPRR proteins (8)(9)(10). High expression of the lncRNA UCA1 has been linked to the migratory ability of epithelial cancer cells and regional lymph node metastasis in TSCC (11). Moreover, the lncRNAs KCNQ1OT1 and SNHG17 act as competing endogenous RNAs (ceRNAs) to regulate cell proliferation and cancer progression in TSCC (12,13).
In our study, we analyzed the expression profiles of lncRNAs and mRNAs in TSCC tissue through high-throughput sequencing. In addition, we identified molecular function, cellular component, biological process, and pathway analyses enriched by mRNA-associated lncRNAs and differentially expressed mRNAs in TSCC. Furthermore, our work revealed several pivotal lncRNAs related to the overall survival of TSCC patients. Finally, to optimize the regulatory mechanism of lncRNAs, we further explored several reliable transcription factors (TFs) in the regulatory regions of lncRNAs and constructed a lncRNA-TF-mRNA coexpression network.

Clinical Samples and Data Processing
Tumors and adjacent tongue tissues were obtained from patients with TSCC who underwent surgery at The First Affiliated Hospital of Fujian Medical University. Following surgical resection, six pairs of tissue samples were immediately immersed in liquid nitrogen and then stored at −80 • C. Written informed consent was obtained from all participants in accordance with the Declaration of Helsinki, and the study protocol was approved by the Ethics Committee of The First Affiliated Hospital of Fujian Medical University (Fuzhou, China).
Next-generation RNA sequencing assays were performed to detect the mRNA and lncRNA expression profiles by KangChen Biotech (Shanghai, China) using an Illumina HiSeq 4,000 system (Illumina, San Diego, CA, USA). Solexa pipeline version 1.8 was used to align the reads to the genome, generate raw counts corresponding to each known gene (a total of 17,242 genes), and calculate the reads per kilobase per million (RPKM) values. The differentially expressed lncRNAs (DELs) and mRNAs (DEGs) were identified through fold change/p-value/FDR filtering [fold change ≥1.5, p < 0.05, and FDR (adjusted p-value) <0.05].
Then, the data associated with TSCC were also retrieved from The Cancer Genome Atlas (TCGA) database (https:// portal.gdc.cancer.gov/) and were downloaded through GDC data transfer tool, including the RNA sequencing (RNA-Seq) of transcriptome profiling and clinical data. Then, in the predictive survival related model, we excluded 27 samples because they lacked complete clinical data. Finally, 146 TSCC samples and 15 normal control samples were collected in our study. EdgeR package in R (version 3.6.3) was used to identify the DELs. After that, we used the annotation file in GTF format (Homo_sapiens.GRCh38.100.chr.gtf) to identify and annotate DELs with the thresholds of |log2FC|>2.0 and FDR < 0.05.

Cell Lines
Three human TSCC cell lines, CAL-27, SCC-9, and SCC-25 were purchased from the American Type Culture Collection (ATCC; Manassas, VA, USA), and the human normal oral keratinocyte (hNOK) cell line was obtained from the Institute of Biochemistry and Cell Biology of the Chinese Academy of Sciences (Shanghai, China). CAL-27 cells and hNOK cells were grown in DMEM (Gibco) supplemented with 10% FBS (Invitrogen, Carlsbad, CA, USA). SCC-9 and SCC-25 cells were cultured in DMEM/F-12 (Gibco) supplemented with 10% FBS and 0.4 µg/ml hydrocortisone. All cells were maintained at 37 • C in an incubator supplied with 5% CO 2 .

RNA Extraction and Quantitative Real-Time PCR Analysis
Total RNA was extracted from tissues or cells using TRIzol reagent (Invitrogen, CA, USA) according to the manufacturer's instructions. Total RNA was reverse transcribed into cDNA using a PrimeScript RT reagent kit (TaKaRa). RT-qPCR analyses were performed using SYBR Green Master Mix (TaKaRa). This process was performed using an Applied Biosystems 7,500 Real-Time PCR System. The results were normalized to the expression of glyceraldehyde-3-phosphate dehydrogenase (GAPDH). The relative expression level was calculated by the 2 − Ct method. The primers were provided by the SunYa Company.

Filtering DELs and DEGs
DELs and DEGs were selected using the following cut-off criteria: for DELs: the top 20 upregulated and downregulated lncRNAs, for DEGs: |log2-fold change| >2 and p < 0.05. Heatmap charts and volcano plots were established the gplots package in R software. Frontiers in Oncology | www.frontiersin.org

Gene Set Enrichment Analysis (GSEA)
GSEA is a calculation method used to determine whether a given gene set is significantly different between different groups. The genes in these gene sets have a certain degree of correlation (e.g., the same biological function, located close to each other on the chromosome or defined by themselves according to a certain standard). However, LncRNA can not be directly used for GSEA as mRNA, but the correlation between each DEL and all mRNAs can be calculated. Then, GSEA of DELs can be performed using GSEAPreranked in GSEA software.
The Gene Ontology (GO) project provides a controlled vocabulary to describe gene and gene products attributed in any organism (http://www.geneontology.org). GO covers three domains: biological process, cellular component, and molecular function. The lower the p-value is, the more significant the GO

Survival Analysis of the Candidate lncRNAs
The GEPIA and StarBase databases were used to evaluate the effects of DELs on the overall survival of patients with TSCC. The final lncRNA correlated with overall survival was thus regarded as the pivotal lncRNA. p < 0.05 was considered statistically significant using the log-rank test. The cut-off value between two groups was set as the ''median.'' Establishing the lncRNA-TF-mRNA Network LncMAP (http://bio-bigdata.hrbmu.edu.cn/LncMAP/index.jsp) was used to analyze the lncRNA-TF-mRNA triplets in TSCC. A Venn diagram was drawn to determine the intersection between the mRNA in triplets and DEGs. The expression levels of these overlapping mRNAs were validated with GEPIA, and the correlation of the TF and mRNA was analyzed with StarBase. On the basis of these findings, we constructed a lncRNA-TF-mRNA regulatory network using Cytoscape (version 3.7.2) software to visualize their interactions. lncRNA-TF-mRNA triplets that showed no correlations with other triplets were excluded from the network.

Identification of the Prognostic Biomarkers
After analyzing the effects of pivotal DELs on the overall survival of patients with TSCC, we verified our findings in the TCGA difference analysis results, and it confirmed that LINC00152, LINC01405, RP11-54H7.4, and RP11-760H22.2 were simultaneously identified to be differentially expressed in TSCC. Thus, we also made the expression scatter and pairing plots of this four pivotal lncRNA by R software.

Statistical Analysis
Data are presented as the mean ± standard error (SE). Measured data were compared between groups using Student's paired ttests or independent t-test. All statistical analyses were performed using SPSS and R software (version 22; IBM, Armonk, NY, USA) and presented graphically in GraphPad Prism 5.0 (GraphPad, La Jolla, CA, USA). Making the scatter diagram and paired plot by wilcoxon test in R software. All tests were two-tailed, and p < 0.05 was considered statistically significant.

Identification of the Candidate DELs and DEGs in TSCC
Tumor and adjacent tissue samples were collected from six patients with TSCC. The main clinical characteristics of the

The GSEA Results of DELs
We also calculated the correlation between each DEL and all mRNAs so that GSEA could be performed using GSEAPreranked  data showed that the biological processes associated with the aberrantly regulated lncRNAs were cellular respiration, oxidative phosphorylation, and ATP synthesis coupled electron transport. The cellular components associated with the aberrantly regulated lncRNAs were the oxidoreductase complex mitochondrial respiratory chain and respiratory chain. The molecular functions associated with the aberrantly regulated lncRNAs were NADH dehydrogenase activity and oxidoreductase activity. KEGG pathway enrichment analysis of the significant DELs was performed to understand gene-related pathways and molecular interactions. Our results showed that oxidative phosphorylation, RNA transport, and mRNA surveillance pathway were associated with the dysregulated lncRNAs. Meanwhile, the GSEA results of LINC00152, LINC01405, RP11-54H7.4, and RP11-760H22.2 were showed independently in Supplementary Table 2.

Delineation of GO and KEGG Pathway Analyses of DEGs
GO enrichment analysis of the DEGs may reveal the roles of the significantly differentially regulated lncRNAs. Our data also showed that the biological processes associated with the upregulated mRNAs were immune system processes and the immune response ( Figure 5A). The downregulated mRNAs were mostly enriched in the generation of precursor metabolites and energy, energy derivation by oxidation of organic compounds, and muscle system processes ( Figure 5B).
Regarding the cellular components, most of the upregulated mRNAs were related to the cytoplasm, intracellular and cytosol, and the downregulated mRNAs were mostly related to the contractile fiber, myofibril and sarcomere. Moreover, the molecular functions mainly associated with the upregulated mRNAs were protein binding, peptide binding, and amide binding, while those associated with the downregulated mRNAs were oxidoreductase activity, the structural constituent of muscle and actin binding. Importantly, KEGG pathway analysis of the DEGs revealed 20 pathways that could play pivotal roles in the mechanism of TSCC tumorigenesis, including the cell cycle, metabolic and NOD-like receptor signaling pathway (Figures 5C,D).

Clinical Prognostic Significance of the Pivotal DELs for TSCC Patients
Then, the significance of the nine key DELs on the overall survival of patients with TSCC was evaluated by GEPIA and StarBase independently. According to the results obtained from GEPIA (Figure 6), RP11-54H7.4, and LINC00152 are related to poor overall survival in patients with HNSC, while according to StarBase (Figure 7), RP11-54H7.4 and LINC01405 are associated with a poor prognosis for patients with HNSC. Clinical prognostic information on LINC01405 was unavailable in GEPIA, and details on CTD-2545M3.8 were not accessible in StarBase.

Construction and Analysis of the lncRNA-TF-mRNA Network
Next, LncMAP was applied to determine lncRNA-TF-mRNA triplets in TSCC. Seven of the nine pivotal DELs associated with TFs were traced. Venn diagrams were used to identify 33 overlapping mRNAs (i.e., overlapping mRNA in triplets and DEGs; Table 3). The expression of the intersecting mRNAs was further confirmed by GEPIA. The expression of S100A12, HSPB3, GAMT, FABP5, ASB16, CPA4, LIMCH1, MLXIPL, PDZK1IP1, TRIM55, USP13, XIRP1, and ZNF106 was not significantly different between HNSC and normal tissues (Supplementary Figures 1, 2). Moreover, the coexpression of TFs and overlapping mRNAs was analyzed with StarBase (Supplementary Figures 3, 4).
Pairs of closely related TFs-mRNAs were used to build a regulatory network. Ultimately, we successfully constructed and visualized the lncRNA-TF-mRNA network, including the upregulated ( Figure 8A) and downregulated ( Figure 8B) DELs, with Cytoscape.

Screening Biomarkers
Furthermore, we also compared our findings with the DELs screened by TCGA database. As expected, several biomarker expression levels in our results were consistent with TCGA. A few DELs worth noting are RP11-54H7.4, LINC00152, LINC01405, and RP11-760H22.2. The expression of RP11-54H7.4, LINC00152, LINC01405, and RP11-760H22.2 were then analyzed in the TSCC and normal tissues, and their expression in the paired groups (Figure 9). Moreover, several pivotal GSEA results of them were performed by GSEA software independently (Figure 10). LINC00152, and LINC00511 was consistent with the sequencing results ( Figure 11). All primers information are listed in Supplementary Table 3.

DISCUSSION
Various studies have demonstrated that lncRNAs are abnormally expressed in diverse tumor types and have broad application prospects in cancer diagnosis, monitoring, prognosis, and treatment response prediction (7,16,17). TSCC is the most common type of OSCC, and its pathogenesis involves the dysregulation of gene networks, including both protein-coding genes and non-coding RNAs (6)(7)(8). To date, multiple lncRNAs have been identified in TSCC, providing new perspectives for exploring the molecular mechanism of TSCC pathogenesis (18). In the current study, we sought to screen aberrantly expressed lncRNAs by performing RNA-seq analysis of TSCC tumor tissues. These data indicate that 51,349 transcripts, 38,567 of which are protein coding, 4,224 mRNAs and 1,470 lncRNAs are differentially expressed between TSCC tumor and adjacent tissues. Through data preprocessing and limma package analysis, we identified 40 candidate DELs in TSCC. Then, 15 dysregulated lncRNAs were further verified using qRT-PCR, the results may be related to the heterogeneity between different TSCC cell lines. We also selected nine pivotal lncRNAs through an online public database. Among them, in TSCC, RP11-54H7.4, MIR31HG, LINC00152, and LINC00511 were upregulated, while H19, CTD-2545M3.8, RP11-760H22.2, RP4-791M13.3, and LINC01405 were consistently downregulated. The most well-known lncRNA, MIR31HG, can act as a HIF-1α coactivator to modulate hypoxia signaling pathways in oral cancer (17). It has also been reported that the lncRNA MIR31HG facilitates HNSC cell proliferation and tumorigenesis via HIF1A and p21 by promoting cell cycle progression and inhibiting cell apoptosis (19). A previous study showed that LINC00152 might promote the growth and invasion of OSCC by regulating miR-139-5p (20). The same study proved that LINC00511 could modulate TSCC progression (21). Various findings have revealed that H19 plays a critical role in the regulation of TSCC migration and invasion (22,23). Our results are consistent with those of previous studies, and we also found that there is limited research on RP11-54H7.4, CTD-2545M3.8, RP11-760H22.2, RP4-791M13.3, and LINC01405 in TSCC pathogenesis.
GO analysis of the DELs revealed that some biological processes and molecular functions, such as cellular respiration, and oxidative phosphorylation could be involved in the progression of TSCC. Annotation of the most significant KEGG pathways associated with the DELs manifested the important pathways that could play pivotal roles in the mechanism of TSCC tumorigenesis, including oxidative phosphorylation, RNA transport, and mRNA surveillance pathway, suggesting that dysregulated lncRNAs may have a great influence on these targets by regulating associated pathways in TSCC. However, the functions of most lncRNAs are not well-understood. The most significant GO terms of the DEGs were immune system processes, the immune response, and protein binding. KEGG pathway analysis of the DEGs revealed 20 pathways that could play pivotal roles in the mechanism of TSCC tumorigenesis, including the cell cycle, metabolic, and NODlike receptor signaling pathway. The results suggest that these lncRNA-TF-mRNA Network in TSCC FIGURE 11 | qRT-PCR validation of significant DELs from sequencing (***p < 0.001, **p < 0.01, and *p < 0.05).
pathways might contribute significantly to the pathogenesis and progression of TSCC.
This study found that RP11-54H7.4, LINC00152, and LINC01405 are significantly aberrantly expressed based on the RNA-seq expression profile and associated with poor clinical outcomes in TSCC patients, the results are consistent with prior studies (24,25). The expression of these lncRNAs was further confirmed in GEPIA and StarBase. Besides, the expression of RP11-54H7.4, LINC00152, LINC01405, and RP11-760H22.2 were also verified with TSCC samples in TCGA. Thus, RP11-54H7.4, LINC00152, and LINC01405 could represent prognostic biomarkers of TSCC.
Research shows that lncRNAs might play a role in trans-acting regulation via TFs, while the interactions between lncRNAs and TFs may ameliorate the expression levels of their target genes (26). The expression of target genes could also be regulated by TFs combining cis-elements at promoter locations (27). It is known that some TFs are involved in TSCC pathogenesis.
To further investigate the functions of DELs in TSCC, seven of nine pivotal DELs associated TFs, namely, SMAD2, E2F6, E2F4, CEBPA, STAT1, MYC, ERG, and NFYB, were further analyzed. Research has demonstrated that SMAD2 acts as a predictor of overall survival in OSCC patients (28). Additionally, SMAD2 directly binds to the lncRNA MACC1-AS1 promoter and thus increases MACC1-AS1 expression in nasopharyngeal carcinoma (29). Studies have also shown that silencing the lncRNA CEBPA-AS1 can regulate OSCC cell proliferation, cell apoptosis, migration, and invasion by targeting CEBPA and via a novel pathway, CEBPA/Bcl2 (30). Similarly, abundant studies demonstrated that E2F4, STAT1, MYC, ERG, and NFYB may play an important role in the pathogenesis and progression of various cancers, including OSCC, and the underlying mechanism may be related to dysregulated lncRNA (31)(32)(33)(34)(35)(36)(37). All of these studies confirm that the TFs screened in our research play a critical role in tumor pathogenesis. However, the underlying mechanisms remain unknown. Therefore, we hypothesized that these TFs play vital roles in the tumorigenesis of TSCC by regulating lncRNAs and mRNAs. In addition, the relationship between lncRNAs and TFs requires further investigation.
The overlapping genes were screened by comparing the mRNA in triplets and DEGs. Some vital mRNAs, including MYO1B, ACTN1, PYGL, S100A12, and SLC7A5, are associated with TSCC (38)(39)(40). We also found that many TF expression levels were significantly correlated with the expression of multiple protein-coding genes. We thus deduce that these TFs might be correlated with the carcinogenesis of TSCC by regulating coexpressing genes. Therefore, a coexpression network was built to further investigate the relationship of the dysregulated TFs and mRNAs. We were able to successfully build the lncRNA-TF-mRNA network by combining all the results. The lncRNA-TF-mRNA coexpression network was constructed to predict lncRNA function. More critically, the interaction network of the significantly dysregulated lncRNAs with their TFs and coding genes was delineated, which might provide new clues for elucidating the underlying mechanism of TSCC.

CONCLUSION
To conclude, we found a profile of dysregulated lncRNAs, TFs, and mRNAs that could serve as prospective clinical biomarkers because of their tissue specificity and association with the tumorigenesis and progression of TSCC. Our study might lay a foundation for further functional research on lncRNAs-TFs-mRNAs in TSCC. The composite analysis of lncRNAs, TFs, and differentially coexpressed genes may provide a more comprehensive understanding of the pathogenesis of TSCC. Our results also suggest that specific lncRNAs, TFs, and mRNAs could be valuable for the diagnosis and treatment of tongue cancer, contribute to the application of lncRNAs in cancer and provide deep insights into the biological mechanism of TSCC. Intriguingly, this is the first study to show that RP11-54H7.4, LINC00152, and LINC01405 can be acted as a prognostic target for TSCC.
Similar to other approaches, our study also had certain limitations. First, in the survival analysis, because of the lack of clinical data that matched our data, it was difficult to further validate some prognosis-related lncRNAs. Furthermore, the interaction between lncRNAs, TFs, and mRNAs still lacks experimental verification. These deficiencies will be improved by complementing these data in the future.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://www.ncbi.nlm. nih.gov/geo/, GSE149008/b.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Ethics Committee of the First Affiliated Hospital of Fujian Medical University. The patients/participants provided their written informed consent to participate in this study.