Original Research ARTICLE
Construction of a Competitive Endogenous RNA Network for Pancreatic Adenocarcinoma Based on Weighted Gene Co-expression Network Analysis and a Prognosis Model
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, School of Life Sciences, Inner Mongolia University, Hohhot, China
Pancreatic adenocarcinoma (PAAD) is a pancreatic disease with considerable mortality worldwide. Because of a lack of obvious symptoms at the early stage, most PAAD patients are diagnosed at the terminal stage and prognosis is usually poor. In this study, we firstly obtained RNA sequencing data of 181 patients with PAAD from The Cancer Genome Atlas (TCGA) database to identify early diagnostic biomarkers for PAAD. Survival-related mRNAs were identified using a weighted gene co-expression network analysis (WGCNA), and then a linear prognostic model of seven long non-coding RNAs (lncRNAs) was established using univariate and multivariate Cox proportional hazards regression analyses, which is verified using a time-dependent receiver operating characteristic (ROC) curve analysis. Finally, according to the survival analysis, we constructed a survival-related competing endogenous RNA (ceRNA) network. Our results showed that: (1) The upregulated genes related to cell cycle-related pathway (including homologous recombination, DNA replication and mismatch repair) in PAAD can increase the proliferation ability of cancer cells; (2) The 7-lncRNA signature can predict the overall survival (OS) of PAAD patients; and (3) The key mRNAs and lncRNAs are involved in mutual regulation in the ceRNA network.
Pancreatic adenocarcinoma (PAAD) is a serious pancreatic disease, pancreatic ductal adenocarcinoma (PDAC) accounts for >90% of all pancreatic cancer (Kleeff et al., 2016). Although the incidence of PAAD is very low, it is still the fourth leading cause of cancer-related death in the United States, and it is expected to become the second leading cause by 2030. The overall 5-year survival rate is about 5–7%, and the average survival time is about 6 months (Rahib et al., 2014; Siegel et al., 2018; Orth et al., 2019). There is no reliable method for screening for and early detection of PAAD, so most patients are diagnosed at an advanced stage of the disease (Tesfaye et al., 2018). PAAD has very few common genetic mutations and no clearly clinically relevant biomarkers. In this respect, the study on PAAD lags far behind other solid tumors (Gallego et al., 2017; Gallmeier and Gress, 2018). The most commonly mutated genes are KRAS (GTPase) and cyclin-dependent kinase inhibitor 2A (CDKN2A) (Caldas and Kern, 1995; Dunne and Hezel, 2015) that related to PAAD. Because of the genetic heterogeneity of pancreatic cancer, the intricate stromal microenvironment, and the complex interplay with the immune system, the disease is difficult to investigate (Krempley and Yu, 2017). To reduce mortality and improve the detection and risk classification of PAAD, early diagnostic biomarkers (Le et al., 2016) and therapeutic targets urgently need to be determined.
Long non-coding RNAs (lncRNAs) are generally defined as RNA transcripts >200 nucleotides that do not encode a polypeptide (Jathar et al., 2017), which are located in the nucleus and cytoplasm of eukaryotic cells. A large number of experimental studies have shown that some lncRNAs play an important role in the occurrence and development of malignant tumors (Yan et al., 2015; Fu et al., 2016; Li et al., 2016). MicroRNAs (miRNAs) are a kind of small RNA that play a role in gene silencing and translation inhibition by binding to target mRNAs (Ambros, 2004). The regulatory network between miRNAs and target genes is very complex, and the network can regulate most physiological activities, including tumorigenesis, metastasis and metabolism (Di Leva et al., 2014; Rupaimoole and Slack, 2017; Alamoudi et al., 2018). In recent years, a competing endogenous RNA (ceRNA) hypothesis has been proposed and developed to explain the relationships between coding genes (encoding mRNAs) and non-coding genes (encoding miRNAs, lncRNAs, circRNAs, etc.) in cells and regulation of mRNA expression. Some miRNAs can recognize and bind to microRNA response elements (MREs) in mRNAs, lncRNAs and circRNAs. Through MREs, lncRNAs act as miRNA “sponges,” leading to changes in the levels of mRNAs regulated by miRNAs (Salmena et al., 2011; Tay et al., 2014). Several studies have confirmed that the lncRNA-miRNA-mRNA ceRNA network is related to the development of many cancers (Karreth and Pandolfi, 2013; Zhou et al., 2014; Qi et al., 2015; Fang et al., 2018; Gong et al., 2019). There are similar studies in pancreatic cancer. AFAP1-AS1 promotes the growth and invasion of pancreatic cancer by upregulating the IGF1R oncogene through sequestration of miR-133a (Chen et al., 2018a). And lnc-Sox2ot promotes EMT and stemness by acting as a ceRNA in PDAC (Li et al., 2018b). These lncRNAs can also be regarded as biomarker candidates. These studies show that these lncRNAs can regulate the development of pancreatic cancer by acting as a ceRNA. Through ceRNA, we can find more biomarker candidates. However, the study on PAAD are very limited, and there is still a lack of comprehensive analysis of lncRNAs and miRNAs related to PAAD based on high-throughput sequencing and large-scale samples.
In this study, the RNA sequencing data of 181 patients with PAAD from the TCGA database were used to identify survival-related mRNAs by weighted gene co-expression network analysis (WGCNA) (Langfelder and Horvath, 2008), and a 7-lncRNAs linear prognostic signature was then established using univariate and multivariate Cox proportional hazards regression analyses and verified using a time-dependent receiver operating characteristic (ROC) curve analysis. At last, according to the survival analysis, we constructed a survival-related ceRNA network.
Materials and Methods
Data Acquisition and Preprocessing
RNA expression data and clinical information of PAAD patients were obtained from the TCGA database (https://portal.gdc.cancer.gov/). The RNA data of the 181 samples involved both Illumina HiSeq RNA-seq and miRNA-seq data. Among them, 150 cases of ductal and lobular neoplasms and 27 cases of adenoma and adenocarcinomas were included, and 4 cases of paracancerous samples were used as controls, and we deleted one sample without miRNA data. The clinical data used in the analyses included gender, survival status, cancer status, age (>60/ ≤ 60), grade, stage and tumor, node, metastasis (TNM) stage. All data can be downloaded free of charge. The mRNAs and lncRNAs were identified using GENCODE (https://www.gencodegenes.org/), and the mRNAs and lncRNAs not included in the database were removed. Eventually, 19600 mRNAs and 15129 lncRNAs were obtained for the subsequent analyses. The raw counts for differential analysis and the GdcVoomNormalization function of GDCRNATools (Li et al., 2018a) was used to normalize the data for WGCNA and prognostic model analysis. The raw count data were normalized using the trimmed mean of M-values (TMM) method implemented in “edgeR” (Robinson et al., 2010) and then transformed using voom in limma for subsequent analysis (Law et al., 2014). GSE62452 is obtained from GEO database (Yang et al., 2016). We downloaded the expression matrix that the author has processed, and this data set contains 65 cases with complete clinical information. We download PACA-AU data set from ICGC database (https://icgc.org/), which include 91 samples with complete clinical information.
Analysis of Differentially Expressed mRNAs, miRNAs, and lncRNAs (DEmRNAs, DEmiRNAs and DElncRNAs)
R package “edgeR” was applied to analyze the difference between 177 cancer samples and 4 paracancerous samples. Because of the limitation of paracancerous samples, we used “upSample” and “SMOTE” in “caret” package to resample the samples and get the intersection of all results. The differentially expressed genes (DEGs) in the data set with |log2 (fold change)|≥1 and adjusted P ≤ 0.05 were selected for the subsequent analyses. 1152 DEmRNAs, 60 DEmiRNAs and 97 DElncRNAs were identified.
Construction of Weighted Gene Co-expression Network of DEmRNAs
We conducted a weighted gene co-expression network analysis (WGCNA) to analyze the interaction between genes, which can describe the patterns of the gene expression profiles. WGCNA was used to evaluate the correlations between the 1152 DEmRNAs and patients' clinical information. The pairwise Pearson correlation was applied to evaluate the weighted co-expression relationship among all the data set subjects in the adjacency matrix. Average linkage hierarchical clustering was used to check whether there are outliers in 177 samples. We calculated the dissimilarity for samples dendrogram and removed some outliers. Parameter β could emphasize strong correlations between genes and penalize weak correlations. The mean connectivity and scale independence of network modules were analyzed using the gradient test under different power values, which ranged from 1 to 20. According to the scale-free topology criterion, select a soft threshold power (β) that can make the correlation of nodes in the co-expression networks get 0.9. The green module in the WGCNA analysis was related to the PAAD OS. Using the STRING database (https://string-db.org/) for a protein–protein interaction (PPI) analysis. After the weighted correlations were determined, the characteristics of the network were displayed using Cytoscape 3.7.1 (http://www.cytoscape.org/). Using the DiseaseMeth2.0 (http://bio-bigdata.hrbmu.edu.cn/diseasemeth/) for methylation analysis. T-test was used to analyze the methylation level of gene promoter regions (2kb upstream of TSS to 0.5 kb downstream) in cancer tissues and paracancerous tissues. Absolute Methylation Difference > 0.2 was considered as differential expression. The function of the green module genes was annotated using the “clusterProfiler” package of Bioconductor (Yu et al., 2012). Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Set Enrichment Analysis (GSEA) enrichment. GSEA is a statistical method to assess whether a priori defined set of ordered genes shows statistically significant, concordant differences between two different biological statuses (Damian and Gorfine, 2004). For each marker gene, we divided the TCGA data set into two groups according to the median of expression level. According to the differential analysis of high and low expression groups, we sorted the DEGs according to logFC, then constructed the ordered gene set and enriched with GSEA. The results were sequenced according to P-value. The result was based on the threshold of P < 0.05 and enrichment score >1.0.
Construction of lncRNA Prognostic Signature for Predicting PAAD OS
The TCGA data set was divided into a training set (105) and a test set (72). In the training set, univariate Cox proportional hazards regression was carried out using the DElncRNAs, and those with P < 0.05 were selected for further analysis. A model was then selected step by step using the Akaike information criterion (AIC) to avoid overfitting; multivariate Cox proportional hazards regression analysis was performed to generate the lncRNA-based prognostic signature and we consider genes with P < 0.05 as independent prognostic indicators. After training and getting, the model is evaluated by leave-one-out-bootstrap cross validation (CV) in package “riskRegression” and “pec.” Then we use least absolute shrinkage and selection operator (Lasso) regression in R packages “glmnet” to verify the results (family=“cox”, maxit = 1000, nfolds = 5). The model calculated the Risk Score (RS) as follows:
Where expr_level is the lncRNA expression and β represents the regression coefficients of lncRNAs in the multivariate Cox proportional hazards regression analysis. Patients were divided into high- and low-risk groups according to the median prognostic score. A receiver operating characteristic (ROC) curve analysis was then used to evaluate the performance of the prognostic signature. To further examine the relationship between lncRNAs and OS, we divided the TCGA data into a training set (75%) and a test set (25%) We trained three machine learning models (XGBoost [XGB], linear support-vector machine [SVM] and Random Forest [RF]) with 5-folds cross validation. Then, we used a ROC curve analysis to assess the performance of these classification methods.
Construction of ceRNA Network
GDCRNATools was used to construct the ceRNA network of the three kinds of RNA. Based on the spongescan algorithm, the correlations among the expression levels of mRNAs, miRNAs and lncRNAs were explored further using five databases: TargetScan (Fromm et al., 2015), miRcode (Jeggari et al., 2012), miRTarBase (Chou et al., 2016), starBase (Li et al., 2014) and miRWalk (Sticht et al., 2018). A mRNA was considered to be a true miRNA target if the interaction was supported in TargetScan, miRTarBase, starBase and miRWalk, and an lncRNA was considered to be a true miRNA target if the interaction was supported in miRcode.
Survival Analysis of PAAD
The prognosis model was constructed by Cox regression in package “survival,” and the result was verified by Lasso-Cox regression in package “glmnet.” To further evaluate whether a gene is related to the PAAD OS, the patient samples were divided into high-expression and low-expression groups based on the median gene expression. We used the R package “survival” and “pec” for survival analysis. Then we use “survminer” to generate Kaplan–Meier curves. Several methods (log-rank, Wilcoxon, Fleming-Harrington, Tharone-Ware and Peto-Peto) was used to compare survival time between the two groups, and P < 0.05 was considered to be statistically significant.
DEmRNAs, DEmiRNAs and DElncRNAs in PAAD
The RNA data of 181 PAAD patients, involving 177 cancer samples and 4 paracancerous samples, were obtained from the TCGA database. In order to prevent the error caused by limitation of paracancerous samples, we resample the data for differential analysis between cancer tissues and paracancerous tissues. Genes with |log (fold change)| ≥ 1 and P ≤ 0.05 were defined as DEGs. According to this standard, we obtained 1152 DEmRNAs, comprising 611 (53.03%) upregulated genes and 541 (46.97%) downregulated genes; 60 DEmiRNAs, comprising 36 (60%) upregulated genes and 24 (40%) downregulated genes; and 97 DElncRNAs, comprising 59 (60.82%) upregulated genes and 38 (39.18%) downregulated genes (Figure 1). The complete differentially expressed RNA data are shown in Supplementary Table S1.
Figure 1. Description of DEmRNAs, DEmiRNAs, and DElncRNAs. (A) Volcano plot showing DEmRNAs (green), DEmiRNAs (yellow), and DElncRNAs (red), The X axis represents logFC, the Y axis represents log10 (P value). Genes with |logFC| ≥ 1 and P ≤ 0.05 were defined as DEGs. Heatmap showing the normalized expression of (B) DEmRNAs, (C) DEmiRNAs and (D) DElncRNAs. T indicates cancer tissue, N indicates paracancerous tissue.
Key Modules Related to PAAD OS
After obtaining the 1152 DEmRNAs, we used WGCNA to analyze the correlation between gene and clinical information. Hierarchical clustering was performed before analysis. We found that some samples have abnormal clustering results. We calculated the dissimilarity of samples, and found that the samples were divided into two groups (160 vs. 18 samples) near the cut-off value of 80. Some of them were far away from other samples. We considered these 18 samples as outliers and deleted them, and the remaining 160 PAAD samples were used for the subsequent analyses. The soft threshold power (β) of 5 was selected according to the scale-free topology criterion. After choosing the appropriate β (β = 5) to classify genes with similar expression profiles into gene modules, the average linkage hierarchical clustering was conducted according to the topological overlap matrix (TOM) based on dissimilarity measurement, and divide each module into different colors (Figure 2A). The heatmap showed the correlations of the modules and the patients' gender, survival status, cancer status, age (>60/ ≤ 60), grade, stage and TNM stage (Figure 2B). The highest correlation was between the green module and OS (r = 0.28, P = 3e-04) (Figure 2C). We selected the 85 genes in the green module (Figure 2D) for Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses, and we found that the cell cycle, the p53 pathway and other important pathways in cancer were enriched (Figure 2E; Massague, 2004; Stracquadanio et al., 2016; McFarlane and Wakeman, 2017; Connor et al., 2019).
Figure 2. Weighted gene co-expression network analysis (WGCNA) network. (A) Cluster dendrogram of DEmRNAs. Each branch represents a single gene. Height indicates the Euclidean distance. Each color indicates a single module. (B) Hierarchical clustering tree of the TCGA-PAAD samples. Dendrogram tips are labeled with the TCGA-PAAD unique name. Height indicates the Euclidean distance. (C) Heatmap showing the Pearson correlation between modules and patients' clinical information. The numbers represent correlation coefficients and P-value. (D) Correlation analysis showed that genes in green module were significantly correlated with OS (cor = 0.76) (E) KEGG enrichment analysis of genes in the green module, ranked by P-value. The size of each dot represents the number of genes enriched in the pathway.
Next, we explored the protein–protein interaction (PPI) network in the green module (Supplementary Figure S1), the average connectivity (number of nodes interacting with other node in the network) in the network was found to be 29.33. Such a high degree of average connectivity suggests that these genes are likely to have synergistic effects. We performed survival analysis on the genes in the green module, and we found that four genes: maternal embryonic leucine zipper kinase (MELK), Aurora kinase A (AURKA), kinesin family 23 (KIF23) and checkpoint kinase 1 (CHEK1) were significantly related to OS (P < 0.05). These genes showed different expression levels in different pathological stages of PAAD (Figures 3A,B). They were at the center of the PPI network and they were also in the ceRNA network. We then explored the methylation level of the promoter regions of these genes in cancer tissues and paracancerous tissues (Supplementary Figure S2). Significant difference of methylation level in cancer tissues was found by T-test (P < 0.05). The higher the expression of these genes, the lower the methylation level, and the higher the risk for patients in the survival analysis. GSEA enrichment results showed that they are mainly related to DNA replication, mismatch repair, homologous recombination and other pathways (Supplementary Figure S3), which shows that these genes play an important role in cell division. Tumors have abnormal proliferation and abnormal cyclin kinase expression leads to uncontrolled proliferation of tumor cells, which often have germline mutations in homologous recombination repair genes, so they depend on checkpoint proteins related to DNA damage, such as CHEK1, to induce G2 block and integrate threonine kinase (ATM and ATR) signals to repair damaged DNA. The co-expression and interactions of these genes were associated with poor clinical outcomes (Figure 3C). In order to verify the prognostic value of these four key genes, we used the expression profiles of 65 cases in GSE62452 and 91 cases in ICGC with complete clinical information. The results showed that these four genes have important prognostic value (Supplementary Figure S4).
Figure 3. Four key genes in the green module. (A) Expression of four genes by grade in PAAD patients (due to the lack of expression data for G4, only the expression for G1–G3 is shown). P-value, 95% CI, and other information are shown at the top of the figure. (B) Relationships between the four genes and the cancer status among PAAD patients. (C) Survival analysis of the four genes. Patients were divided into high-expression and low-expression groups according to the median gene expression.
LncRNA Prognostic Signature
The data set was divided into training set (105) and test set (72). In the training set, the univariate Cox regression was applied and obtained 22 lncRNAs related to patients' OS (Table 1). Then these lncRNAs was used as covariates and the multivariate Cox analysis was performed to screen independent prognostic indicators, and the Akaike information criterion (AIC) was used for model selection. Finally, 7 lncRNAs was selected as PAAD independent prognostic indicators and a 7-lncRNAs prognostic model were generated. In the training process, leave-one-out-bootstrap CV was used to verify the model. After several times of verification, we acquired a relatively stable result. According to this model, a risk score was generated for each patient. The risk score was calculated as follows: Risk Score = (−0.6798 × expr_MIR600HG) + (0.5330 × expr_LINC00941) + (−0.3057 × expr_CASC8) + (0.3093 × expr_UNC5B-AS1) + (−0.4540 × expr_AL365277.1) + (0.6676 × expr_AL049555.1) + (−0.2785 × expr_AC005056.1). Four of these lncRNAs (MIR600HG, LINC00941, UNC5B-AS1, and AL04955.1) were shown to be independent prognostic indicators (Table 2), which is consistent with the Kaplan–Meier curve of the 7-lncRNA signature (Supplementary Figure S5). Three genes were associated with high risk of PAAD (LINC00941, UNC5B-AS1, and AL049555.1), while four genes were associated with protection from PAAD (MIR600HG, CASC8, AL365277.1, and AC005056.1). According to the median of risk score (1.226066), the data set was divided into high-risk group and low-risk group, and Kaplan-Meier curve is shown in Figure 4. The overall survival time of high-risk group is significantly lower than that of low-risk group [the median OS was 15.6 vs. 49.4 months, hazard ratio [HR] = 3.98, 95% confidence interval [CI]: 2.22–7.13, P < 0.0001] (Figure 5A).
Figure 4. Kaplan–Meier survival curves of the training set, test set and total set. Survival curves of (A) training set (B), test set and (C) total set. Patients were divided into high- and low-risk groups according to the median prognostic score. log-rank, Wilcoxon, Fleming-Harrington, Tharone-Ware, and Peto-Peto was used to compare the differences of survival curves between two groups and get consistent results (P < 0.0001). The P-value of plots based on log-rank test.
Figure 5. LncRNA prognostic signature model and least absolute shrinkage and selection operator (Lasso) regression results. (A) Patients were divided into high- and low-risk groups according to the median prognostic score. The prognostic score, OS and expression levels of 7 lncRNAs in the two groups are shown. (B) Regression coefficient diagram using Lasso regression. (C) ROC curves of the 7-lncRNA prognostic signature for predicting 1-, 3-, and 5-year survival. (D) Diagnostic efficiency, based on the AUC, of 7-lncRNA signature models constructed using different machine learning models (random forest, support-vector machine [SVM] and XGBoost). (E) Boxplot showing the expression levels of 7 lncRNAs in the high- and low-risk groups. **p < 0.01, ****p < 0.0001.
The prognostic scores (risk scores) of patients from the PAAD data set were analyzed by the 7-lncRNA expression levels. A similar result was found using Lasso regression (the model, which involved five lncRNAs, and four of them included in the 7-lncRNA signature (MIR600HG, AL365277.1, UNC5B-AS1, and LINC00941) (Figure 5B). A time-dependent ROC curve analysis was used to evaluate the prognostic value of the 7-lncRNA signature. Based on the TCGA data set, the area under the curve (AUC) for the 7-lncRNA signature was 0.785, 0.803 and 0.813 for 1-, 3-, and 5-year survival, respectively (Figure 5C). The 7-lncRNA signature was used to build three machine learning models (random forest, SVM, and XGBoost) in training set to predict the PAAD OS. ROC curves was used to evaluate the prediction performance of the three machine learning models in test set. The results showed that the AUC values for SVM, XGB and RF were 0.958, 0.907, and 0.932, respectively (Figure 5D). The expression level of 7-lncRNA signature in high-risk group and low-risk group was significantly different (Figure 5E). After the prognostic performance in 177 patients with clinical information was evaluated by univariate Cox proportional hazards regression we found the cancer status (P < 0.001), grade (P = 0.013), stage (P = 0.01) and 7-lncRNA (P < 0.001) signature could be used as independent prognostic indicators of PAAD (Table 3, Supplementary Figure S6). We further used these clinical information as covariates to conduct a multivariate Cox proportional hazards regression analysis. The results showed that the 7-lncRNA signature (HR 2.68, P = 0.001) and cancer status (HR = 1.76, P = 0.012) were independent prognostic factors for PAAD OS.
ceRNA Network Related to PAAD OS
DEmRNAs, DEmiRNAs and DElncRNAs was used to construct a global ceRNA network. Unfortunately, we found that UNC5B-AS1 and AC005056.1 were not in the global ceRNA network. To simplify the network results, we used the 4 key mRNAs (AURKA, MELK, KIF23, and CHEK1) and 5 key lncRNAs (MIR600HG, LINC00941, CASC8, AL049555.1, and AC005056.1) as the center to prune the global ceRNA network, and then we obtained the key-nodes ceRNA network. We performed survival analysis on these nodes, and removed the nodes that were not significantly related to OS (Figure 6A), and we finally obtained seven mRNAs, six miRNAs, eight lncRNAs and 32 edges in the survival-related ceRNA network, including the 4 key mRNAs and 5 key lncRNAs mentioned above (Figure 6B). The survival analysis showed that upregulation of 6 mRNAs (ANLN, TOB1, AURKA, MELK, KIF23, and CHEK1), 3 miRNAs (miR-135b-5p, miR-424-5p, and miR-203a-3p) and 4 lncRNAs (AL049555.1, LINC00941, LINC01588, and CASC8) was associated with poor PAAD OS, while down regulation of the remaining 6 genes (GOLAG8B, miR-129-5p, MIR600HG, AL365277.1, AC005674.2, and AP004608.1) in PAAD indicated a longer survival time.
Figure 6. CeRNA network related to PAAD overall survival (OS). (A) Kaplan–Meier curves of 3 mRNAs (TOB1, GOLGA8B and ANLN),3 miRNAs (203a-3p, 424-5p, and 135b-5p) and 3 lncRNAs (AP004608.1, AC005674.2, and AL365277.1) were selected for display. (B) CeRNA network related to PAAD OS.
PAAD, the most common form of pancreatic cancer, is still one of the most aggressive and fatal cancers in the world (Kleeff et al., 2016). In the past, great efforts have been made to elucidate the molecular mechanism underlying the pathogenesis of PAAD at the level of coding and non-coding genes, and to find molecular targets related to PAAD. For predicting the prognosis of PAAD, molecular signatures are likely to be better than single biomarkers. Research based on a single-omics level limits deep exploration of the molecular mechanisms underlying PAAD, and each gene in the cells does not play an independent role. The concept of the ceRNA network was proposed in recent years. LncRNAs act as miRNA “sponges” to prevent miRNAs from binding to mRNAs (Salmena et al., 2011). A system analysis based on the hypothesis of the mRNA-miRNA-lncRNA ceRNA network may lead to a molecular signature with better prognostic value in PAAD than a single gene.
The multi-omics or meta-analysis has been applied to identify the biomarkers of pancreatic adenocarcinoma especially for PDAC (Chen et al., 2018a; Klett et al., 2018; Mishra et al., 2019). Mishra et al. Identified survival associated genes using multi-omics data from PDAC patients (Mishra et al., 2019). And Klett et al. identified 17 genes that were previously recognized as PDAC biomarkers (Klett et al., 2018) and several genes (B3GNT3, DMBT1, DEPDC1B) and lncRNAs (PVT1 and GATA6-AS) have not been reported in PDAC before, which is important for developing early detection and effective treatment regimens for PDAC (Mishra et al., 2019). At present, the focus of multi-omics study based on PAAD is to associate gene expression profile with other omics data (DNA methylation, Copy Number Variation [CNV]) (Raman et al., 2018; Mishra et al., 2019). However, the relationship between the mRNA-miRNA-lncRNA expression patterns is not discussed. Studies have reported the important role of ceRNA network in the tumor progression of PAAD. Chen et al. confirmed that lncRNA AFAP1-AS1 promotes the growth and invasion of PDAC by upregulating the IGF1R oncogene through sequestration of miR-133a (Chen et al., 2018a). Li et al. was also observed that tumor-derived exosomal lncRNA Sox2ot enhances EMT and stemness in PDAC by acting as ceRNA (Li et al., 2018b). These studies limited to a few specific genes, which does not take the advantage of high-throughput sequencing and large-scale samples. At the same time, each gene in ceRNA network will be subject to the interaction of other genes. It is not good to describe its regulation by focusing on a few marker genes. In this study, we report the results of multi-omics analysis based on ceRNA network. Our purpose is to explore the relationship between the mRNA-miRNA-lncRNA expression patterns. In order to identify the hub genes, we screened the hub genes from DEmRNAs and DElncRNAs by WGCNA and Cox regression. We constructed a ceRNA network based on the prognosis of patients by using key mRNAs and lncRNAs. To the best of our knowledge, this is the first study to investigate the specific ceRNA network in pancreatic cancer by two-way selection, instead of “lncRNA-miRNA-mRNA” or “mRNA-miRNA-lncRNA” order pattern. We also considered the interaction of other common target genes. Inspiringly, a novel mRNA-miRNA-lncRNA triple regulatory network was constructed and each genes in this network possessed a significant prognostic value in pancreatic cancer. And our results are still valid for other types of adenocarcinoma.
In our study, WGCNA was used to identify the green module (involving 85 genes) that was related to PAAD OS, and 4 genes (AURKA, MELK, KIF23, and CHEK1) were found specifically expressed in PAAD and played an important role in the development of PAAD, which may become the prognostic markers. AURKA belongs to the aurora kinases family, and its up-regulation in pancreatic cancer is related to the poor overall survival (Xie et al., 2017). And AURKA binds to cyclin-dependent kinases (CDKs) and a variety of other proteins, such as polo-like kinase 1 (PLK1) and Aurora kinase B (AURKB), controls the progression from S (mitosis) stage and G2 stage to M (mitosis) stage (Otto and Sicinski, 2017). KIF23 is a member of the kinesin family (KIF), which is mainly responsible for cytokinin regulation. Abnormal expression of KIF23 can cause tumorigenesis and cancer development (Li et al., 2019). AURKA and KIF23 are also downstream targets of YAP/TEAD. YAP pathway has been identified as an important prognostic marker of PDAC (Rozengurt et al., 2018). Their abnormal expression in PDAC may be related to the regulatory mechanism of YAP pathway. Zhou et al. found that MELK and KIF23 may be eigengenes related to the progression of pancreatic cancer (Zhou et al., 2018). MELK is a cell cycle-dependent protein kinase belonging to the KIN1/PAR-1/MARK family (Tassan and Le Goff, 2004), which regulates intracellular signal transduction, and thereby influences various biological processes in combination with a variety of proteins, including cell cycle, carcinogenesis, cell proliferation, and apoptosis (Jiang and Zhang, 2013). MELK is up-regulated in pancreatic cancer and other types of solid tumors and plays an important role in the formation and maintenance of tumor stem cells (Lu et al., 2019). Previous study suggested that MELK could control the migration of normal and transformed pancreatic duct cells (Chung et al., 2014). Moreover, OTSSP167, the MELK-targeting compound, inhibits the growth of various human cancers, including breast, lung, prostate and pancreatic cancers (Chung et al., 2012). Pan et al. found that CHEK1 was activated in stage IV in PDAC, which was related to tumor invasion (Pan et al., 2018). CHEK1 and cell cycle-related proteins (AURKA, etc.) have synergistic effects. It is reported that simultaneous inhibition of CHEK1 and AURKA leads to cell cycle arrest and apoptosis of cancer cells (Alcaraz-Sanabria et al., 2017). We found that abnormal expression of these 4 genes may lead to abnormal proliferation of cancer cells. We speculate these genes may have synergistic effects in cancer cells that enable them to jointly regulate the cell cycle of PAAD, which agree with the statement of the cell cycle played a vital role in PAAD (Zhou et al., 2018). We used the TCGA training data set to conduct a Cox proportional hazards regression analysis, and finally obtained a 7-lncRNA signature related to the prognosis of patients with PAAD. The 7-lncRNA signature, cancer status, grade and stage were associated with the PAAD OS by univariate Cox regression. According to multivariate Cox regression, the 7-lncRNA signature and cancer status were independent prognostic factors for OS. To further assess the prediction ability of the 7-lncRNA signature, three kinds of machine learning models (SVM, random forest, XGBoost) were constructed based, and used to predict the overall risk of patients in the TCGA data set. Although research on the roles of these lncRNAs in PAAD is very limited, some lncRNAs have been identified in the literature as biomarkers. For example, MIR600HG is a potential therapeutic target and molecular biomarker in PAAD (Song et al., 2018); LINC00941is thought to be associated with tumor cell proliferation and metastasis in gastric cancer (Liu et al., 2019), which affects genes and proteoglycans in cancer, the Hippo signaling pathway, cancer pathways, the cell cycle and leukocyte transendothelial migration (Luo et al., 2018); Hu et al. Showed that CASC8 could reduce the glycolysis of bladder cancer cells and inhibit the growth of bladder cancer cells (Hu et al., 2017). In papillary thyroid carcinoma, UNC5B-AS1 regulates the proliferation, invasion and migration of cancer cells (Guo et al., 2019). In our study, these lncRNAs were found exhibiting the abnormal expression patterns in PAAD, and the Cox regression results show that they can be used as independent prognostic indicators of PAAD. The abnormal expression of key-lncRNAs led to poor OS, which further confirmed its role as a biomarker.
In the process of constructing the ceRNA network, we performed a survival analysis of nodes in the network. The results showed that 7 mRNAs, 6 miRNAs, and 8 lncRNAs were related to the PAAD OS, among which the upregulation of 13 of them was related to poor OS and the downregulation of the other six was associated with a longer survival period among PAAD patients. We connected these potential prognostic markers through the ceRNA network. They have special expression patterns in PAAD and may play an important role in the development of PAAD. Studies have shown that the abnormal expression of miR-203a-3p, miR-129-5p, miR-424-5p, and miR-135b-5p is related to tumor development (Wu et al., 2013; Chaudhary et al., 2017; Zhang et al., 2017; Chen et al., 2018b; Qiu et al., 2019). The abnormal expression of these miRNAs in PAAD may affect the expression of the key mRNAs and effects of the key lncRNAs, which may lead to poor OS. These miRNAs may also be biomarker candidates for PAAD OS. According to the ceRNA network hypothesis, miRNAs can bind to MREs and thereby interact with mRNAs and lncRNAs and thus affect the interactions between them (Salmena et al., 2011; Tay et al., 2014), which indicates that miRNA plays a pivotal role in the ceRNA network.
In conclusion, we identified a central mRNA-miRNA-lncRNA ceRNA network consisting of 7 mRNAs, 6 miRNAs and 8 lncRNAs, and described the regulatory pattern of mRNAs and lncRNAs in PAAD through ceRNA network, we found that the abnormal expression of each hub gene in ceRNA network will lead to poor OS. At the same time, we propose a new perspective to describe the regulatory mechanism of prognostic markers in PAAD through ceRNA network, which is a good way to identify genes related to OS of PAAD. Our findings may provide clues for the development of a promising tool for predicting the OS of PAAD.
Data Availability Statement
Publicly available datasets were analyzed in this study. The source code for this study can be found at https://github.com/nnlrl/PDAC.
JW completed the data analysis. JX and XL participated manuscript composition. All authors read and approved the final manuscript.
This work was financially supported by the National Natural Science Foundation of China (81471001).
Conflict of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fbioe.2020.00515/full#supplementary-material
Alamoudi, A. A., Alnoury, A., and Gad, H. (2018). miRNA in tumour metabolism and why could it be the preferred pathway for energy reprograming. Brief. Funct. Genomics 17, 157–169. doi: 10.1093/bfgp/elx023
Alcaraz-Sanabria, A., Nieto-Jimenez, C., Corrales-Sanchez, V., Serrano-Oviedo, L., Andres-Pretel, F., Montero, J. C., et al. (2017). Synthetic lethality interaction between aurora kinases and CHEK1 inhibitors in Ovarian cancer. Mol. Cancer Ther. 16, 2552–2562. doi: 10.1158/1535-7163.MCT-17-0223
Chaudhary, A. K., Mondal, G., Kumar, V., Kattel, K., and Mahato, R. I. (2017). Chemosensitization and inhibition of pancreatic cancer stem cell proliferation by overexpression of microRNA-205. Cancer Lett. 402, 1–8. doi: 10.1016/j.canlet.2017.05.007
Chen, B., Li, Q., Zhou, Y., Wang, X., Zhang, Q., Wang, Y., et al. (2018a). The long coding RNA AFAP1-AS1 promotes tumor cell growth and invasion in pancreatic cancer through upregulating the IGF1R oncogene via sequestration of miR-133a. Cell Cycle 17, 1949–1966. doi: 10.1080/15384101.2018.1496741
Chou, C. H., Chang, N. W., Shrestha, S., Hsu, S. D., Lin, Y. L., Lee, W. H., et al. (2016). miRTarBase 2016: updates to the experimentally validated miRNA-target interactions database. Nucleic Acids Res. 44, D239–D247. doi: 10.1093/nar/gkv1258
Chung, C. H., Miller, A., Panopoulos, A., Hao, E., Margolis, R., Terskikh, A., et al. (2014). Maternal embryonic leucine zipper kinase regulates pancreatic ductal, but not beta-cell, regeneration. Physiol. Rep. 2:12131. doi: 10.14814/phy2.12131
Chung, S., Suzuki, H., Miyamoto, T., Takamatsu, N., Tatsuguchi, A., Ueda, K., et al. (2012). Development of an orally-administrative MELK-targeting inhibitor that suppresses the growth of various types of human cancer. Oncotarget 3, 1629–1640. doi: 10.18632/oncotarget.790
Connor, A. A., Denroche, R. E., Jang, G. H., Lemire, M., Zhang, A., Chan-Seng-Yue, M., et al. (2019). Integration of genomic and transcriptional features in pancreatic cancer reveals increased cell cycle progression in metastases. Cancer Cell 35, 267–282.e267. doi: 10.1016/j.ccell.2018.12.010
Fang, X. N., Yin, M., Li, H., Liang, C., Xu, C., Yang, G. W., et al. (2018). Comprehensive analysis of competitive endogenous RNAs network associated with head and neck squamous cell carcinoma. Sci. Rep. 8:10544. doi: 10.1038/s41598-018-28957-y
Fromm, B., Billipp, T., Peck, L. E., Johansen, M., Tarver, J. E., King, B. L., et al. (2015). A uniform system for the annotation of vertebrate microRNA genes and the evolution of the human microRNAome. Annu. Rev. Genet. 49, 213–242. doi: 10.1146/annurev-genet-120213-092023
Fu, X. L., Liu, D. J., Yan, T. T., Yang, J. Y., Yang, M. W., Li, J., et al. (2016). Analysis of long non-coding RNA expression profiles in pancreatic ductal adenocarcinoma. Sci. Rep. 6:33535. doi: 10.1038/srep33535
Gong, J., Jiang, H., Shu, C., Hu, M. Q., Huang, Y., Liu, Q., et al. (2019). Integrated analysis of circular RNA-associated ceRNA network in cervical cancer: observational study. Medicine 98:e16922. doi: 10.1097/MD.0000000000016922
Guo, K., Chen, L., Wang, Y., Qian, K., Zheng, X., Sun, W., et al. (2019). Long noncoding RNA RP11-547D24.1 regulates proliferation and migration in papillary thyroid carcinoma: Identification and validation of a novel long noncoding RNA through integrated analysis of TCGA database. Cancer Med. 8, 3105–3119. doi: 10.1002/cam4.2150
Hu, R., Zhong, P., Xiong, L., and Duan, L. (2017). Long noncoding RNA cancer susceptibility candidate 8 suppresses the proliferation of bladder cancer cells via regulating glycolysis. DNA Cell Biol. 36, 767–774. doi: 10.1089/dna.2017.3785
Jeggari, A., Marks, D. S., and Larsson, E. (2012). miRcode: a map of putative microRNA target sites in the long non-coding transcriptome. Bioinformatics 28, 2062–2063. doi: 10.1093/bioinformatics/bts344
Jiang, P., and Zhang, D. (2013). Maternal embryonic leucine zipper kinase (MELK): a novel regulator in cell cycle control, embryonic development, and cancer. Int. J. Mol. Sci. 14, 21551–21560. doi: 10.3390/ijms141121551
Klett, H., Fuellgraf, H., Levit-Zerdoun, E., Hussung, S., Kowar, S., Kusters, S., et al. (2018). Identification and validation of a diagnostic and prognostic multi-gene biomarker panel for pancreatic ductal adenocarcinoma. Front. Genet. 9:108. doi: 10.3389/fgene.2018.00108
Li, C., Zhou, L., He, J., Fang, X. Q., Zhu, S. W., and Xiong, M. M. (2016). Increased long noncoding RNA SNHG20 predicts poor prognosis in colorectal cancer. BMC Cancer 16:655. doi: 10.1186/s12885-016-2719-x
Li, J. H., Liu, S., Zhou, H., Qu, L. H., and Yang, J. H. (2014). starBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein-RNA interaction networks from large-scale CLIP-Seq data. Nucleic Acids Res. 42, D92–D97. doi: 10.1093/nar/gkt1248
Li, R., Qu, H., Wang, S., Wei, J., Zhang, L., Ma, R., et al. (2018a). GDCRNATools: an R/Bioconductor package for integrative analysis of lncRNA, miRNA and mRNA data in GDC. Bioinformatics 34, 2515–2517. doi: 10.1093/bioinformatics/bty124
Li, Z., Jiang, P., Li, J., Peng, M., Zhao, X., Zhang, X., et al. (2018b). Tumor-derived exosomal lnc-Sox2ot promotes EMT and stemness by acting as a ceRNA in pancreatic ductal adenocarcinoma. Oncogene 37, 3822–3838. doi: 10.1038/s41388-018-0237-9
Liu, H., Wu, N., Zhang, Z., Zhong, X., Zhang, H., Guo, H., et al. (2019). Long non-coding RNA LINC00941 as a potential biomarker promotes the proliferation and metastasis of gastric cancer. Front. Genet. 10:5. doi: 10.3389/fgene.2019.00005
Luo, C., Tao, Y., Zhang, Y., Zhu, Y., Minyao, D. N., Haleem, M., et al. (2018). Regulatory network analysis of high expressed long non-coding RNA LINC00941 in gastric cancer. Gene 662, 103–109. doi: 10.1016/j.gene.2018.04.023
Mishra, N. K., Southekal, S., and Guda, C. (2019). Survival analysis of multi-omics data identifies potential prognostic markers of pancreatic ductal adenocarcinoma. Front. Genet. 10:624. doi: 10.3389/fgene.2019.00624
Orth, M., Metzger, P., Gerum, S., Mayerle, J., Schneider, G., Belka, C., et al. (2019). Pancreatic ductal adenocarcinoma: biological hallmarks, current status, and future perspectives of combined modality treatment approaches. Radiat. Oncol. 14:141. doi: 10.1186/s13014-019-1345-6
Pan, Z., Li, L., Fang, Q., Zhang, Y., Hu, X., Qian, Y., et al. (2018). Analysis of dynamic molecular networks for pancreatic ductal adenocarcinoma progression. Cancer Cell Int. 18:214. doi: 10.1186/s12935-018-0718-5
Qiu, Z., Wang, X., Shi, Y., and Da, M. (2019). miR-129-5p suppresses proliferation, migration, and induces apoptosis in pancreatic cancer cells by targeting PBX3. Acta Biochim. Biophys. Sin. (Shanghai). 51, 997–1007. doi: 10.1093/abbs/gmz096
Rahib, L., Smith, B. D., Aizenberg, R., Rosenzweig, A. B., Fleshman, J. M., and Matrisian, L. M. (2014). Projecting cancer incidence and deaths to 2030: the unexpected burden of thyroid, liver, and pancreas cancers in the United States. Cancer Res. 74, 2913–2921. doi: 10.1158/0008-5472.CAN-14-0155
Robinson, M. D., McCarthy, D. J., and Smyth, G. K. (2010). edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140. doi: 10.1093/bioinformatics/btp616
Rozengurt, E., Sinnett-Smith, J., and Eibl, G. (2018). Yes-associated protein (YAP) in pancreatic cancer: at the epicenter of a targetable signaling network associated with patient survival. Signal Transduct Target Ther. 3:11. doi: 10.1038/s41392-017-0005-2
Song, J., Xu, Q., Zhang, H., Yin, X., Zhu, C., Zhao, K., et al. (2018). Five key lncRNAs considered as prognostic targets for predicting pancreatic ductal adenocarcinoma. J. Cell. Biochem. 119, 4559–4569. doi: 10.1002/jcb.26598
Stracquadanio, G., Wang, X., Wallace, M. D., Grawenda, A. M., Zhang, P., Hewitt, J., et al. (2016). The importance of p53 pathway genetics in inherited and somatic cancer genomes. Nat. Rev. Cancer 16, 251–265. doi: 10.1038/nrc.2016.15
Tesfaye, A. A., Kamgar, M., Azmi, A., and Philip, P. A. (2018). The evolution into personalized therapies in pancreatic ductal adenocarcinoma: challenges and opportunities. Expert Rev. Anticancer Ther. 18, 131–148. doi: 10.1080/14737140.2018.1417844
Wu, K., Hu, G., He, X., Zhou, P., Li, J., He, B., et al. (2013). MicroRNA-424-5p suppresses the expression of SOCS6 in pancreatic cancer. Pathol. Oncol. Res. 19, 739–748. doi: 10.1007/s12253-013-9637-x
Xie, Y., Zhu, S., Zhong, M., Yang, M., Sun, X., Liu, J., et al. (2017). Inhibition of aurora kinase a induces necroptosis in pancreatic Carcinoma. Gastroenterology 153, 1429–1443.e1425. doi: 10.1053/j.gastro.2017.07.036
Yan, X., Hu, Z., Feng, Y., Hu, X., Yuan, J., Zhao, S. D., et al. (2015). Comprehensive genomic characterization of long non-coding RNAs across human Cancers. Cancer Cell 28, 529–540. doi: 10.1016/j.ccell.2015.09.006
Yang, S., He, P., Wang, J., Schetter, A., Tang, W., Funamizu, N., et al. (2016). A novel MIF signaling pathway drives the malignant character of pancreatic Cancer by targeting NR3C2. Cancer Res. 76, 3838–3850. doi: 10.1158/0008-5472.CAN-15-2841
Zhang, Z., Che, X., Yang, N., Bai, Z., Wu, Y., Zhao, L., et al. (2017). miR-135b-5p Promotes migration, invasion and EMT of pancreatic cancer cells by targeting NR3C2. Biomed. Pharmacother. 96, 1341–1348. doi: 10.1016/j.biopha.2017.11.074
Zhou, X., Liu, J., and Wang, W. (2014). Construction and investigation of breast-cancer-specific ceRNA network based on the mRNA and miRNA expression data. IET Syst. Biol. 8, 96–103. doi: 10.1049/iet-syb.2013.0025
Zhou, Z., Cheng, Y., Jiang, Y., Liu, S., Zhang, M., Liu, J., et al. (2018). Ten hub genes associated with progression and prognosis of pancreatic carcinoma identified by co-expression analysis. Int. J. Biol. Sci. 14, 124–136. doi: 10.7150/ijbs.22619
Keywords: the cancer genome atlas, pancreatic adenocarcinoma, weighted gene co-expression network analysis, prognostic signature, competing endogenous RNA network
Citation: Wang J, Xiang J and Li X (2020) Construction of a Competitive Endogenous RNA Network for Pancreatic Adenocarcinoma Based on Weighted Gene Co-expression Network Analysis and a Prognosis Model. Front. Bioeng. Biotechnol. 8:515. doi: 10.3389/fbioe.2020.00515
Received: 16 January 2020; Accepted: 30 April 2020;
Published: 28 May 2020.
Edited by:Fengfeng Zhou, Jilin University, China
Reviewed by:Hauke Busch, University of Lübeck, Germany
Ka-Chun Wong, City University of Hong Kong, Hong Kong
Copyright © 2020 Wang, Xiang and Li. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Xueling Li, email@example.com