LncRNA–miRNA–mRNA Networks of Gastrointestinal Cancers Representing Common and Specific LncRNAs and mRNAs

Gastrointestinal (GI) cancers are responsible for approximately half of cancer-related deaths, highlighting the need for the identification of distinct and common features in their clinicopathological characteristics. Long ncRNA (lncRNAs), which are involved in competitive endogenous RNA (ceRNA) networks with critical roles in biological processes, constitute a substantial number of non-coding RNAs. Therefore, our study aimed to investigate the similarities and differences in the ceRNA networks of The Cancer Genome Atlas (TCGA)-GI cancers. We performed a comprehensive bioinformatics analysis of ceRNA networks for TCGA-GI cancers in terms of the deferential mRNA, lncRNA, and miRNA expression levels, ceRNA networks, overall survival analysis, correlation analysis, pathological cancer stages, and gene set enrichment analysis. Our study revealed several common and distinct mRNAs and lncRNAs with prognostic values in these networks. It was specifically noteworthy that MAGI2-AS3 lncRNA was found to be shared in almost all GI cancers. Moreover, the most common shared mRNAs between GI cancers were MEIS1, PPP1R3C, ADAMTSL3, RIPOR2, and MYLK. For each cancer ceRNA network, we found that the expression level of a number of lncRNAs and mRNAs was specific. Furthermore, our study provided compelling evidence that several genes, most notably KDELC1, can act as novel proto-oncogenes in cancers. This, in turn, can highlight their role as new prognostic and therapeutic targets. Moreover, we found cell cycle and extracellular matrix structural constituent as the top shared KEGG and molecular function, respectively, among GI cancers. Our study revealed several known lncRNAs and known and unknown mRNAs in GI cancers with diagnostic and prognostic values.


INTRODUCTION
The gastrointestinal (GI) system consists of the substantial cellular mass in the human body, and its cancers are among the most common malignancies in different populations, accounting for 35% of the total deaths due to cancers (Arnold et al., 2020). Over the past decade or two, tremendous efforts have been invested into the identification of the molecular and biological processes responsible for the development of these cancers, mainly for colorectal cancer, hepatic cancer, gastric cancer, and head and neck squamous cell carcinomas (Sharma et al., 2018). In particular, identification of distinct and common features in their molecular and biological processes and also their clinical presentations can help shed light on the development and identification of diagnostic and therapeutic biomarkers.
More than 90% of the mammalian genome is transcribed into non-coding RNA (ncRNA), with a considerable number of them consisting of long ncRNAs (lncRNAs), transcripts over 200 nucleotides (nt) long (Sulayman et al., 2019). Increasing evidence has shown that lncRNAs are involved in competitive endogenous RNA (ceRNA) networks playing critical biological functions including the regulation of major cellular processes, namely, proliferation, differentiation, apoptosis, and stress response (Niu et al., 2020b;Li et al., 2021). Therefore, our study aimed to investigate the similarities and differences in the ceRNA networks of TCGA-GI cancers with more focus on lncRNAs. We included GI cancers with 11 and more than 11 matched normal tissues. These cancers were as follows: colon adenocarcinoma (COAD), rectal adenocarcinoma (READ), esophageal carcinoma (ESCA), stomach adenocarcinoma (STAD), head and neck squamous cell carcinoma (HNSC), and liver hepatocellular carcinoma (LIHC). Since HNSC is involved in the mucosa of the aero-digestive tract and also occurs in the oral cavity and salivary glands, we considered this cancer in our study as well.
Here, we report different and common shared lncRNAs and mRNAs involved in the ceRNA networks of these cancers. Moreover, our study shows the common and specific pathways in these cancers. Our finding also shows that a combination of lncRNAs and mRNA with prognostic values can promote their use for diagnostic and therapeutic aims. Furthermore, we propose a number of mRNAs that can function as new proto-oncogenes or tumor suppressors in their corresponding cancers, highlighting their diagnostic and therapeutic aspects.
Since our study reveals common shared and distinct mRNAs, lncRNAs, miRNAs, KEGG, GO-MF, GO-BP, and GO-CC among GI cancers, the identified results can help other researchers to use the data for further functional studies and differential diagnostic and prognostic strategies.

Data Collection and the Differential Expression Levels of lncRNAs, mRNAs, and miRNAs
The Cancer Genome Atlas (TCGA) (https://www.cancer.gov/ about-nci/organization/ccg/research/structural-genomics/tcga) contains raw data of genomic, epigenomic, transcriptomic, and proteomic experiments from over 20,000 primary tumor tissues and their matched normal counterparts from 33 cancer types. In this study, we used an R/Bioconductor package, GDCRNATool developed by Ruidong Li et al.  to download, organize and analyze lncRNAs, mRNAs, and miRNAs and clinical data of patients with GI cancers (HNSC, ESCA, STAD, LIHC, COAD, and READ) from TCGA.
Based on the following advantages, we used GDCRNATool for our analysis: One-GDCRNATools is an easy-to-use package for researchers with little coding experience. Two-It includes all data needed for performing the entire analysis smoothly in a very convenient way to download, organize, and perform comprehensive RNA expression analysis of TCGA data, with main focus on construction of the lncRNA-mRNA-miRNArelated ceRNA networks in cancer. Three-It gives not only genes involved in ceRNA networks but also all differentially expressed protein-coding genes, non-coding genes, and their GO terms and pathways. Therefore, it also has a flexibility to do other analyses that are not in the scopes of ceRNA network analysis. As an example, the differential expression data extracted from GDCRNATool can be used for considering the functional studies. Four-Moreover, to show that our differential expression data have an acceptable performance comparable to the state-ofthe-art methods, we validated our expression data by looking for the expression of one gene (UBE2C) that was shown to be upregulated in all cancers (Dastsooz et al., 2019). Our differential expression data extracted from GDCRNATool also showed its higher expression in all these GI cancers. So, this tool can validate the differential expression data.
To validate our results, we also performed all analyses for the example that the developer of the tool did in their paper (for CHOL cancer). After confirmation of consistency between our results and theirs, we performed our analyses for GI cancers. So, we confirmed that this method of analysis is reproducible for all TCGA cancers. In our study, several analyses were carried out using GDCRNATools, which included differential gene expression analysis, univariate survival analysis, competing endogenous RNA network analysis, and functional enrichment analysis.
The appropriate data of these GI tumors and their matched adjacent non-tumor tissues were extracted. We started our analysis with the following sample numbers of RNA sequencing, miRNA, and clinical data for each cancer: COAD: 521 mRNA data, 465 miRNA data, and 459 clinical data; ESCA: 173 mRNA data, 200 miRNA data, and 185 clinical data; HNSC: 546 mRNA data, 569 miRNA data, and 528 clinical data; LIHC: 424 mRNA data, 425 miRNA data, and 377 clinical data; READ: 177 mRNA data, 165 miRNA data, and 171 clinical data; STAD: 407 mRNA data, 491 miRNA data, and 443 clinical data.
In this package, we downloaded RNA sequencing, mature miRNA, and clinical data from TCGA. Then, we parsed RNA sequencing metadata, filtered duplicated samples in this RNA metadata, and filtered non-primary tumor and non-solid tissue normal samples in them. Then, we parsed miRNAs metadata, filtered duplicated samples in these miRNAs data, and filtered non-primary tumor and non-solid tissue normal samples in the miRNAs metadata. Next, we separately merged each RNA sequencing, miRNAs, and clinical data. After that, we normalized RNA sequencing data and miRNAs data (using gdcVoomNormalization). We then used the gdcDEAnalysis function to have a whole differential expression and the gdcDEReport function for differentiation expression of each mirRNA, lncRNA, and mRNA. Using the gdcCEAnalysis function considering data of differential expression of lncRNA and protein-coding genes, and extracting data of Starbas tool, we could decipher miRNA-lncRNA and miRNA-mRNA interactions and ceRNA networks of each GI cancer.
We used this package to have the gdcParseMetadata function for the efficient organization of RNA and clinical data. With this tool, we applied the gdcFilterDuplicate function to remove duplicated samples and gdcFilterSampleType to filter out samples without primary tumors or matched normal counterparts. Raw counts data were normalized using the gdcVoomNormalization function, which applied the TMM method in edgeR and the voom method in limma. Low-expression genes with logcpm less than one were excluded from the analysis. In this R package, the gdcDEAnalysis function, which included limma, edgeR, and DESeq2, was used for the identification of differentially expressed genes (DEGs) and miRNAs between primary GI tumors and their matched normal tissues. Visualization methods were volcano, scatter, and bubble plots, and also three simple shiny apps were used in GDCRNATools. All the figures were plotted using the ggplot2 package included in this tool.

Construction of the ceRNA Network
Using GDCRNATool, we investigated ceRNA networks in GI cancers. This tool with its gdcCEAnalysis function considers hypergeometric test (significantly commonly shared miRNAs between lncRNA and mRNA are detected), Pearson correlation analysis (lncRNA and mRNA with positive correlations are picked up), regulation similarity analysis (commonly shared miRNAs with similar function for regulation of the lncRNA and mRNA are considered), and sensitivity Pearson partial correlation to construct ceRNA networks. Using this tool, we first extracted several miRNAs that were correlated among both the lncRNAs and mRNAs. Then, this tool identified lncRNAs and mRNAs with positive expression correlation (Pearson correlation coefficient). Finally, we gained miRNAs with similar regulative effects on lncRNAs and mRNAs. In our analysis, we applied lncTarget (to get miRNA-lncRNA interactions) and pcTarget (to consider miRNA-mRNA interactions) data along with gdcCEAnalysis function to extract miRNA-target interactions predicted from several datasets. GdcCEAnalysis function considers data of miRNA-mRNA and miRNA-lncRNA interaction databases, for example, the interaction between miRNA and mRNA and also mRNA predicted by online datasets such as Starbas. In our analysis, lncRNA-miRNA-mRNA interactions extracted as edges and nodes were visualized in Cytoscape 3.7.2.

mRNA-lncRNA Correlation Analysis
To find the main mRNA in each ceRNA network of GI cancers, using GDCRNA package with gdcCEAnalysis function, we also investigated lncRNAs-mRNAs correlation.

Survival Analysis
We performed univariate survival analysis using the gdcSurvivalAnalysis function in the GDCRNA package with the selection of Kaplan-Meier (KM) analysis. In this analysis, the patients were divided into high-and low-expression groups according to the median.

Pathological Tumor Stages
We looked for expression of lncRNAs and mRNAs identified in ceRNA networks of GI cancers across their pathological tumor stages using GEPIA2 web server (http://gepia2.cancer-pku.cn/ #index) (Tang et al., 2019).

Protein-Protein Interaction Network
We used the STRING database (https://string-db.org) (Szklarczyk et al., 2019) to analyze the possible protein-protein interactions between possible novel biomarkers identified in ceRNA networks of these cancers.

Functional Enrichment Analysis
Using GDCRNATool, we investigated the role of the genes that were found to be involved in the GI cancers, in biological processes (BP), molecular functions (MF), and cellular components (CC). Using gdcEnrichAnalysis, we also carried out Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG).

Differential Expression Profile of ceRNA Network Components in GI Cancers
In the current study, we found different numbers of up-and downregulated mRNAs, lncRNAs, and miRNAs in HNSC, ESCA, STAD, COAD, READ, and LIHC given in Table 1. Regarding COAD, these deferentially expression numbers and the lncRNAs were also previously listed (Poursheikhani et al., 2020). Regarding STAD and LIHC, some deferentially expression numbers have been reported by Zhang et al. (2020a) and Zheng and Yu (2021), respectively.

] (Supplementary
Our findings showed that LIHC, COAD, and READ had more ceRNA networks. It is worth noting that our study revealed MAGI2-AS3 lncRNA to be shared among almost all ceRNA networks of GI cancers, except for ESCA, indicating the involvement of the same pathways in GI cancers (Supplementary Table S2, Supplementary Figure S1, Figure 1). Furthermore, PVT1 was common between LIHC, ESCA, and STAD; GAS5, SNHG1, and SNHG20 were common between LIHC, COAD, and READ; KCNQ1OT1 was common between STAD, HNSC, COAD, and READ; H19 was found to be common between LIHC, HNSC, and COAD; and finally, MIR17HG was common between STAD, COAD, and READ. We also investigated commonly shared lncRNAs between COAD and READ (CRCs) and found the presence of MAGI2-AS3, GAS5, SNHG1, KCNQ1OT1, SNHG20, and MIR17HG in both cancers. We were also able to distinguish COAD and READ using differential expression of H19, OR2A1-AS1, and MALAT1 (specific for COAD) and SNHG15, AC015813.1, and CCDC183-AS1 (specific for READ) (Supplementary Table S2 and Supplementary Figure S1).
For each cancer ceRNA network, we found that the expression level of a number of lncRNAs was specific to each cancer: SNHG12, AC005154.1, CCDC18-AS1, and SNHG7 were specific for LIHC; SNHG3, TMEM161B-AS1, and AC093010.3 were specific for ESCA; SNHG15, AC015813.1, and CCDC183-AS1 were specific for READ; C1RL-AS1, LINC00707, and LINC00958 were specific for HNSC; OR2A1-AS1 and MALAT1 were specific for COAD ( Figure 1 and Supplementary Table S2). There was no specific lncRNA for ceRNA network of STAD. However, in comparison with other GI cancers, it was possible to distinguish it from other cancers (Figure 1 Figure S1).
We mentioned that MAGI2-AS3 is shared in approximately all GI cancers. Then, we looked for proteins involved in its network and its significant interaction with them. We found several important ones shown in Supplementary Table S3. In all MAGI2-AS3 networks, it has interacted with the following miRNAs: has-miR-374a-5p and has-miR-374b-5p. Its main mRNA partners that have positive correlations are MEIS1, PPP1R3C, ADAMTSL3, RIPOR2, and MYLK. It seems that the function of MAGI2-AS3 is mainly dependent on its interaction with these five positive correlated genes in COAD, READ, and STAD. However, in LIHC, it is dependent on ADAMTSL3, RIPOR2, and MYLK, and in HNSC, it is dependent on the interaction with MEIS1 and PPP1R3C (Supplementary Table  S3 and Supplementary Figure S1).

OS Analysis
In the GI-ceRNA networks, we identified that the overexpression of LINC00958 and LINC00707 in HNSC and overexpression of SNHG20 and SNHG12 in LIHC were correlated with reduced OS and worse prognosis. However, overexpression of SNHG20 in READ and PVT1 in STAD and low expression of MAGI2-AS3 in STAD were correlated with good prognosis (Figure 2). Moreover, OS analysis for mRNAs involved in ceRNA networks of these cancers showed prognostic values of several mRNAs given in Table 3 and Supplementary Figure S2. Among them, we identified some new prognostic biomarkers (reduced OS with worse prognosis) that have not been reported or supported by functional studies in their corresponding tumors as follows: upregulation of KDELC1 (POGLUT2) in HNSC, downregulation of RIPOR2 and SMOC1, and upregulation of TMEM164, B3GNT5, ZNF607, and ANKRD13B in LIHC. These genes showed reduced OS with worse prognosis. The downregulated genes with reduced OS and worse prognosis may function as tumor suppressors, but those overexpressed mRNAs with reduced OS and worse prognosis can act as proto-oncogenes. It worth noting that for LIHC, combinations of prognostic values of SNHG20 lncRNA and BUB1 mRNA, and also SNHG12 lncRNA and TMEM164 mRNA (associated with reduced OS with worse prognosis), could strengthen the predictive value of these markers for this cancer. Each of the lncRNA-mRNA combinations was in the same ceRNA network.

Pathological GI Tumor Stages
We found that the expression of some lncRNAs with worse prognosis in ceRNA networks of GI cancers had significant alteration across cancer stages, indicating their roles in cancer progression and invasion. These lncRNAs were SNHG12 and SNHG20 in LIHC (Figure 3).
In addition to the lncRNAs, we found that several mRNA with worse prognosis in their corresponding networks showed significant altered expression across tumor stages, indicating their roles in cancer progression and invasion. These genes

Functional Enrichment Analysis
In our study, firstly, we extracted the 10 top GO terms for BP, CC, and MF in the TCGA-GI cancers. Then, we compared these GO terms between GI cancers (Supplementary Figure S4 and Supplementary Table S4). Our study revealed some common and specific GO terms for these cancers given in Supplementary Tables S5,6, respectively. These terms were previously shown in COAD cancer (Poursheikhani et al., 2020); Frontiers in Genetics | www.frontiersin.org January 2022 | Volume 12 | Article 791919 6 however, in our study, we looked for common and specific terms in GI cancers. For MF, our study revealed extracellular matrix structural constituent (GO:0,005,201) as the top shared term among all GI cancers (Supplementary Table S5). In relation to the CC, we identified collagen-containing extracellular matrix (GO:0062023), which was shared between all GI cancers (Supplementary Table S5). For BP, the most significantly shared GO terms in these cancers were extracellular matrix organization (GO:0030198), which was shared among all studied malignancies except LIHC. In addition, regulation of chromosome segregation (GO: 0051983) and mitotic nuclear division (GO:0140014) were shared among all investigated cancers excepting READ (Supplementary Table S5). Regarding KEGG, we found Cell cycle (hsa04110) as the top shared one among all GI cancers (Supplementary Table S5). However, we found that some BP, CC, and MF terms were exclusively specific for each cancer (Supplementary Table S6).

DISCUSSION
Our study revealed differential expression of mRNAs, lncRNAs, and miRNAs in ceRNA networks of GI cancers. We highlighted the important function of several lncRNAs and mRNAs specific for each cancer or commonly shared between GI cancers, which might be used for diagnostic and therapeutic purposes.
Our study showed that MAGI2-AS3 lncRNAs were common between most of these cancers, representing possible involvement of the same pathways in different GI cancers. In ceRNA networks of each cancer, we also found that deferential expression of some lncRNAs was specific to each cancer.
We found several lncRNAs as prognostic biomarkers including LINC00958 and LINC00707 in HNSC; SNHG20 and SNHG12 in LIHC (correlated with reduced OS with worse prognosis); SNHG20 in READ; and PVT1 and MAGI2-AS3 in STAD (correlated with good prognosis). We found that over-expression of SNHG12 and SNHG20 in LIHC (Figure 3) had significant alteration across cancer   stages, representing their roles in cancer progression and invasion. These lncRNAs identified in our study were previously reported in their corresponding cancers (Yang et al., 2015;Zhang et al., 2016;Li et al., 2019;Tamang et al., 2019;Feng et al., 2020a;Niu et al., 2020a;Martinez-Barriocanal et al., 2020;Shen et al., 2020;Zeng et al., 2020;Zhang et al., 2020b;Zheng et al., 2020) but a comprehensive investigation of distinct or shared lncRNAs in these cancers has not yet been conducted. Notably, we found a number of mRNAs with prognostic values (reduced OS with worse prognosis), which were not previously reported or supported by functional studies in their corresponding human tumors as follows: over-expression of KDELC1 in HNSC; over-expression of TMEM164, B3GNT5, ZNF607, and ANKRD13B; and under-expression of RIPOR2 and SMOC1 in LIHC. The role of ZNF607 and mainly KDELC1 has not been clearly elucidated in cancers. Among these mRNAs, TMEM164, ZNF607, and ANKRD13B showed significant altered expression across tumor stages, indicating their role in cancer progression and invasion. Therefore, functional studies focused on these prognostic biomarkers in their corresponding cancers are suggested. Moreover, among GI cancers, four mRNAs were more common in most of them, which included MEIS1 and PPP1R3C among COAD, READ, STAD, and HNSC; and ADAMTSL3 and MYLK among COAD, READ, STAD, and LIHC. Furthermore, we found several important shared mRNAs between ceRNA networks of COAD and READ ( Table 2, CRC, cancer cluster 3:CC-3).
According to the findings of the current study, we strongly proposed a novel proto-oncogene, KDELC1 (POGLUT2), in cancers since it is mainly localized in nucleus and endoplasmic reticulum, it specifically targets extracellular EGF repeats of important proteins involved in Notch signaling pathways, it has interactions with several main proteins involved in cancers, and it is upregulated in several TCGA cancers. Moreover, our study revealed several mRNAs associated with worse prognosis in cancer and showed their significant altered expression across tumor stages.
Our study found that in ceRNA networks of LIHC, combination of prognostic values of SNHG20 lncRNA and BUB1 mRNA, and also SNHG12 lncRNA and TMEM164 mRNA (associated with reduced OS with worse prognosis), could be used as collective prognostic values for this cancer.
In this study, we found extracellular matrix structural constituent as the top shared MF term, collagen-containing extracellular matrix as the top shared CC, and Cell cycle (hsa04110) as the top shared KEGG among all these GI cancers. However, extracellular matrix organization, except for LIHC, and regulation of chromosome segregation and mitotic nuclear division, both except for READ, were found to be the most significantly shared BP terms among the GI cancers.
In conclusion, we found several common and distinct prognostic mRNAs and lncRNAs in GI cancers. The most important finding was the involvement of MAGI2-AS3 lncRNA in almost all GI cancers. Another interesting result was the identification of the most common shared mRNAs in these cancers. Moreover, our study revealed a number of specific lncRNAs and mRNAs in GI cancers. Furthermore, our study reported several genes, most notably KDELC1, that can have proto-oncogenic roles in FIGURE 4 | KDELC1 protein network extracted from STRING webserver.
Frontiers in Genetics | www.frontiersin.org January 2022 | Volume 12 | Article 791919 8 cancers. In addition to the abovementioned findings, we reported the top shared KEGG, GO-MF, GO-BP, and GO-CC among GI cancers. Overall, our study showed that bioinformatic studies regarding mRNA-miRNA-lncRNA networks of TCGA data can help identify prognostic biomarkers. The identified common shared and distinct mRNAs, lncRNAs, miRNAs, KEGG, and GO-terms among GI cancers can be used by other researchers for further functional experiments and also for diagnostic and prognostic approaches.