MiRNA-Mediated Subpathway Identification and Network Module Analysis to Reveal Prognostic Markers in Human Pancreatic Cancer

Background Pancreatic cancer (PC) remains one of the most lethal cancers. In contrast to the steady increase in survival for most cancers, the 5-year survival remains low for PC patients. Methods We describe a new pipeline that can be used to identify prognostic molecular biomarkers by identifying miRNA-mediated subpathways associated with PC. These modules were then further extracted from a comprehensive miRNA-gene network (CMGN). An exhaustive survival analysis was performed to estimate the prognostic value of these modules. Results We identified 105 miRNA-mediated subpathways associated with PC. Two subpathways within the MAPK signaling and cell cycle pathways were found to be highly related to PC. Of the miRNA-mRNA modules extracted from CMGN, six modules showed good prognostic performance in both independent validated datasets. Conclusions Our study provides novel insight into the mechanisms of PC. We inferred that six miRNA-mRNA modules could serve as potential prognostic molecular biomarkers in PC based on the pipeline we proposed.


INTRODUCTION
Pancreatic cancer (PC) is one of the most lethal digestive system tumors characterized by rapid onset, high malignancy and high mortality (Lei et al., 2017). In contrast to the steady increase in survival for most cancers, the 5-year survival of PC patients remains less than 5% (Siegel et al., 2017); however, there has been substantial progress in both diagnostic and therapeutic techniques. Prognosis for surgically treated patients is difficult, and detection of new biomarkers is urgently required in order to accurately predict PC patient outcome and better understand the associated molecular mechanisms.
MicroRNAs (miRNAs) are short, endogenous non-coding RNAs that participate in post-transcriptional gene regulation. Increasing research on PC has revealed that, miRNAs play an important role in the development of PC (Peng et al., 2015;Song et al., 2015;Imamura et al., 2017). Recent literature indicates miR-339-5p can suppress the invasion and migration of PC cells via direct regulation of ZNF689 (Yu et al., 2019). MiR-137-3p, as a direct target of circ-LDLRAD3k contributed to repressing the proliferation, migration and invasion of PC cells when circ-LDLRAD3 was downregulated (Yao et al., 2019). Although an increasing number of disease-relevant genes and miRNAs have been identified through microarray and next-generation sequencing, the precise functional mechanism that contributes to the pathology of this complex disease remains unclear.
Pathway analysis is the first choice to gain insight into biological processes and understand the underlying mechanisms of complex diseases. Several studies have shown that subpathways, rather than complete biological pathways, are abnormal in disease phenotypes . As such, many methods have been developed to identify biological pathways or subpathways (Feng et al., 2016Han et al., 2018Han et al., , 2020Liu et al., 2019;Ning et al., 2019;Yamaguchi et al., 2019;Li et al., 2020). For example, the Over-Representation Analysis (ORA) method (Breitling et al., 2004) identifies biological pathways by evaluating the extent to which the genes in a gene set of interest appear in given predefined pathway using a hypergeometric test. Another approach is topology enrichment analysis framework (TEAK) (Judeh et al., 2013). TEAK identifies linear and non-linear subpathways and scores them. Emerging evidence suggests that miRNAs play important roles in biological pathways, acting as regulators of pathway output or as important pathway targets (Ooi et al., 2011). Sidiropoulos et al. (2014) found Kallikrein-related peptidase 5 could induce miRNA-mediated anti-oncogenic pathways in breast cancer. Al-Ghezi et al. (2019) demonstrated that combination of 9-tetrahydrocannabinol and cannabidiol could induce attenuation of neuroinflammation through miRNAmediated signaling pathway. A series of studies focused on identifying miRNA-mediated subpathways for deciphering disease mechanisms (Vrahatis et al., 2015(Vrahatis et al., , 2016Ning et al., 2019). The Subpathway-Gmir  method identifies miRNA-mediated metabolic subpathways relevant to various diseases by integrating genes of interest, miRNAs, and pathway topologies through building miRNA-regulated metabolic pathways. Vrahatis et al. (2015) developed an effective method for capturing miRNA-mediated signaling subpathways by integrating paired miRNA/mRNA expression data. They subsequently developed time-vaRying enriCHment integrOmics Subpathway aNalysis tOol (CHRONOS) by integrating time series mRNA/miRNA expression data with KEGG pathway maps and miRNA-target interactions (Vrahatis et al., 2016). These studies either focused on identifying miRNA-mediated pathways/subpathways associated with complex diseases, or focused on the impact of miRNA on diseases as regulators or targets of these pathways/subpathways analysis of these pathways/subpathways. However, few studies have analyzed relationships between patient survival and modules extracted from the special network constructed by miRNA-mediated subpathways associated with complex diseases such as PC.
In this study, we propose a novel pipeline that can be used to identify prognostic molecular biomarkers by identifying miRNA-mediated subpathways associated with complex diseases and further extracting modules from a comprehensive miRNAgene network (CMGN). Briefly, 105 significant miRNA-mediated subpathways were found to be associated with PC. The CMGN was further constructed using these subpathways, and 10 miRNA-mRNA modules were extracted. Functional analyses revealed that the majority of these modules were enriched among cancerrelated gene ontology (GO) terms. Finally, an evaluation of the association between survival and the level of gene and miRNA expression in the modules found that six out of the 10 miRNA-mRNA modules were able to discriminate between two groups of PC patients with different clinical outcomes. Together, we provide an effective pipeline for analyzing the relationships of patient survival and modules extracted from the special network constructed by miRNA-mediated subpathways associated with complex diseases. The findings of our study provide novel insight into the mechanisms of PC and identify six modules that may have prognostic significance for predicting the survival of PC patients.

MATERIALS AND METHODS
The aim of this study was to identify significant subpathways associated with PC and evaluate the prognostic value of miRNA-gene modules in the CMGN constructed using these subpathways. Firstly, differentially expressed genes (DE genes) and miRNAs (DE miRNAs) in PC tissues were identified using microarray gene expression profile data of PC. These DE genes and DE miRNAs were subsequently mapped into undirected pathway graphs embedded by miRNA (UPEMs) to detect the miRNA-mediated subpathways associated with PC. The statistically significant subpathways were obtained and defined as PC-relevant subpathways. A comprehensive miRNA-gene network was built by merging common nodes and edges based on the PC-relevant subpathways. Finally, we detected functional modules in the network and further evaluated their value in relation to PC patient survival. An overview of the pipeline is presented in Figure 1.

PC Dataset
The gene and miRNA expression profiles, including 14 solid-pseudopapillary neoplasms samples, six PC samples, six neuroendocrine tumor samples and five non-neoplastic pancreatic tissue samples, were initially analyzed by Park et al. (2014). We downloaded the expression profile data from the Gene Expression Omnibus (GEO) database (GSE43795 for miRNAs and GSE43796 for mRNAs) (Park et al., 2014). PC and non-neoplastic pancreatic tissue samples were used to identify DE genes and DE miRNAs associated with PC, using the significance analysis of microarrays (SAM) (Tusher et al., 2001) method. An FDR < 0.001 as used for genes and FDR < 0.05 for miRNAs. We identified 1,586 DE genes and 107 DE miRNAs. Additionally, two independent datasets were used to evaluate the prognostic power of the miRNA-mRNA modules. For independent validation dataset 1, a microarray dataset including mRNA expression profiles and clinical information with early stage PC (GSE57495) from 63 patients was downloaded from the GEO database (Edgar et al., 2002). For independent validation dataset 2, an RNA-seq dataset of mRNAs and miRNAs for primary PC tumors, was retrieved from the Cancer Genome Atlas (TCGA) database 1 . The same TCGA barcode structure was used for both clinical data and RNA-seq data. A total of 177 samples with clinical follow-up information were retained for further analysis.

Disease-Associated Genes and miRNAs
Disease-associated coding genes were derived from OMIM (Amberger et al., 2011) and GAD (Becker et al., 2004). Diseaseassociated miRNAs were derived from the miRCancer (Xie et al., 2013), miR2Disease , and HMDD (Li et al., 2014) databases. Genes and miRNAs of the significant miRNAmediated subpathways associated with PC were considered as disease-associated coding genes and miRNAs according to the associations between them and PC recorded in at least one of these databases.

UPEM Construction
The UPEM was constructed based on KEGG pathways and the experimentally validated miRNA-target interaction data. In this study, 343 KEGG pathways involving 152 metabolic and 191 nonmetabolic pathways were used. First, each pathway was converted into an undirected graph with the genes serving as nodes using our previously developed R package "iSubpathwayMiner" . Second, we examined verified targets genes of each miRNA whether appeared in the converted pathway graphs. If its targets were contained in the pathway, we added the miRNA node into the pathway and linked it to their validated targets. Finally, we generated UPEMs that combined the miRNAs and miRNA-target gene interaction edges. We used the 343 pathways embedded by miRNAs to identify the PC risk subpathways.

Identification of miRNA-Mediated Subpathways and CMGN Construction
The "lenient distance" similarity method  was used to identify the miRNA-mediated 1 http://tcga-data.nci.nih.gov/tcga/ subpathways in the UPEMs according to coding genes and miRNAs as the input. Notably, the DE miRNAs and DE genes were mapped into UPEMs at the same time as the signature nodes. The shortest path between any two signature nodes in the mapped pathway was computed. Next, the two signature nodes and the molecules contained in the shortest path between them were divided into the same candidate node set, with the requirement that the length of the shortest path be less than or equal to n. The corresponding subgraph was extracted according to each candidate node, and set as a subpathway if the node number was no less than s. The parameters were set to n = 1 and s = 10. Statistically significant subpathways were further extracted by performing a hypergeometric test with a p-value < 0.001. The p-value was calculated according to the following equation: t g + t mir x m g + m mir − t g − t mir n g + n mir − x m g + m mir n g + n mir where m g (m mir ) represents the number of coding genes (miRNAs) in the entire human genome, of which t g (t mir ) represents the number of miRNAs involved in a given subpathway, and n g (n mir ) represents the number of DE genes (DE miRNAs), of which r g (r mir ) represents the number of DE miRNAs involved in the same subpathway. The CMGN was constructed using all the genes/miRNAs extracted from the statistically significant subpathways and all inherited edges. Finally, for each miRNA in the network, a Pearson correlation between the level of miRNA expression and all genes in the network were computed, and the edges representing the strong negative correlation (r < −0.7) between the miRNA and its target genes were specially marked.

Identification of Disease Related Modules in the miRNA-Gene Network
We extracted the modules from the miRNA-gene network using the Clique Percolation Method (CPM) implemented by CFinder (Adamcsek et al., 2006) software with a value of parameter k = 4 (k-clique size) for the richest modular structure. CFinder is a program that can rapidly locate and visualize overlapping, densely interconnected groups of nodes in undirected graphs. We interpreted a community constructed from adjacent cliques of the same size k in the CPM as a module. A k-clique represented a complete subgraph on k nodes, and two k-cliques were considered to be adjacent if they shared k−1 nodes. A community comprised a set of k-cliques in which all nodes could be reached via chains of adjacent k-cliques from each other.

GO and Cancer Hallmark Analysis
Gene ontology analyses were performed using an R package clusterProfiler (Yu et al., 2012) for the coding genes of a given module. A p-value threshold of 0.01 was used to indicate statistical significance. The GO categories obtained from a previous study (Plaisier et al., 2012) were used as proxies for the characteristic hallmark traits of cancer.

Functional Analysis of Modules With a Hypergeometric Test
A hypergeometric test was performed on each module to evaluate the extent to which the genes and miRNAs in the module overlapped with the nodes of statistically significant subpathways. The p-value was calculated according to the following equation: where m is the total number of unique nodes in the subpathways located by simultaneously mapping DE miRNAs and DE genes into UPEMs as the signature nodes; t is the number of nodes in the chosen subpathway of interest; n is the number of nodes in a given module; r is the number of common nodes between the chosen subpathway and the given module. A p-value threshold of 0.001 was used to indicate statistical significance.

Survival Analysis
For the specific molecules (genes/miRNAs) in a given module, a univariate Cox regression analysis was carried out to evaluate the association between survival and the expression levels of molecules in the module. A risk score formula was used to evaluate the association between survival and molecule combinations in the given module and calculated using a linear combination of the expression levels of molecules weighted according to their respective Cox regression coefficients from the univariate Cox regression analysis as follows: where b i is the Cox regression coefficient of molecule i from the given module, n is the number of molecules in the given module, and Exp(i) is the expression value of molecule i in the corresponding patient. Cancer patients were classified into high and low-risk groups according to the median risk score. For single gene survival prediction, the median expression value of each gene was used as a cut-off to distinguish two groups of PC patients as having either a low or high relative gene expression. A Kaplan-Meier survival analysis was performed for the two patient groups using the R survival package, and statistical significance was assessed using a two-tailed log-rank test. A p-value threshold of 0.05 was used to indicate significance.

Identification of PC-Relevant Subpathways Mediated by miRNAs
Using the SAM method to test the PC and non-neoplastic pancreatic tissue samples, 1,586 DE genes and 107 DE miRNAs were identified at the FDR level of 0.001 and 0.05, respectively. After mapping these DE genes and DE miRNAs into UPEMs at the same time as signature nodes, 105 significant subpathways were identified at a strict cut-off value of FDR < 0.001 (see section "Materials and Methods"). These significant subpathways, referred to as PC-relevant subpathways, varied from 10 to 85 genes/miRNAs (mean: 24 genes/miRNAs), and were associated with 105 distinct complete pathways (Supplementary Table 1).
The top 20 PC-relevant subpathways are listed in Table 1. The coverage rate of known cancer-associated genes and miRNAs were tested for each PC-relevant subpathway. These known cancer-associated genes and miRNAs were derived from the OMIM, GAD, miRCancer, miR2Disease, and HMDD databases. As a result, it was found that each PC-relevant subpathway was associated with an average of 36.7% known cancerassociated genes and miRNAs, some of which even reached 73.7% (Supplementary Table 2). We further focused on the two most significant PC relevant subpathways. The first was path: 04010_1 (FDR = 0), an important sub-region within the MAPK signaling pathway (Figure 2A), which was previously reported to be a highly conserved pathway that transfers extracellular signals into cellular proliferation signals (Aguirre-Ghiso et al., 2003). The mitogen-activated protein kinase (MAPK) is one such complex interconnected signaling cascade with frequent involvement in oncogenesis, tumor progression and drug resistance. One study demonstrated the role of MAPK signaling during the initial steps of pancreatic carcinogenesis (Collins et al., 2014). Based on the topological structure, miR -320a was a DE miRNA with a higher degree in this subpathway. Moreover, the overexpression of miR-320a strongly contributes to PC pathogenesis, including the characteristics of increased proliferation, invasion, metastasis, drug-resistance and the epithelial-to-mesenchymal transition . In addition, several DE genes were enriched in this subpathway, among which MYC, NRAS, RAC2 were known PC related genes, and most of them play key roles in PC. The study by Adrian et al. suggested that individuals with constitutively decreased TGFBR1 expression may have a decreased risk of PC (Adrian et al., 2009). MAPK9 has also been identified as a potentially promising biomarker in exploratory studies of PC (Bracci et al., 2012). More importantly, SNPs in the inflammatory pathway genes MAPK8IP1 and SOCS3 were associated with increased overall survival in patients undergoing potentially curative resection and may be used as markers to predict PC patient survival (Reid-Lombardo et al., 2013).
The second significant subpathway was path: 04110_1 (FDR = 0), which is one of the most important cell cycle signaling pathways ( Figure 2B) that regulates both cell division and apoptosis. DNA damage readily leads to dysregulation of the cell cycle, which is an essential step in the initiation and development of human cancer (Nan et al., 2016). The 04110_1 subpathway consisted of 64 genes/miRNAs, 15 of which were known PC-associated genes/miRNAs and 30 of which were DE genes/miRNAs. According to the topological structure of the pathway, miRNAs with a higher degree are more important in the subpathway region. It was found that miR-34a, a non-differential and disease-associated miRNA, exhibited the highest degree. Moreover, this miRNA has been reported to serve as a diagnostic biomarker for PC (Long et al., 2017). Hu et al. (2013) proposed  were known PC-associated genes, and eight were differentially expressed. Among these differentially expressed target genes, cell division cycle 20 (CDC20), an anaphase-promoting complex activator, has been observed to be over-expressed in a variety of human cancers and to play an oncogenic role in tumorigenesis . E2F3, as the target gene regulated by miR-217, has been shown to be involved in PC cell growth, invasion and apoptosis (Yang et al., 2017). In addition, miR-215 exhibited the second highest degree and is dysregulated in several human malignancies, including PC. It is speculated that miR-215 may act as a tumor suppressor in PC, and could serve as a novel therapeutic target for miRNA-based therapy (Li Q.W. et al., 2015). Interestingly, we found that a miRNA set consisting of miR-92a, miR-24, let-7a, let-7b, let-7c, miR-193b, and miR-31 was closely correlated with PC. Each of these miRNAs regulates at least one differentially expressed cell cyclerelated gene, involving cyclin B1 (CCNB1), cyclin B2 (CCNB2), and cyclin-dependent kinase 1 (CDK1). Among them, let-7b, let-7c, and miR-31 are known PC-associated miRNAs, and others have also been associated with PC in previous studies (Listing et al., 2015;Xiao et al., 2017). Our results demonstrate that the PC-relevant subpathways identified by integrating DE genes, DE miRNAs and pathway topologies were closely related to PC.

Global Properties of the CMGN
Since key genes and miRNAs might participate in multiple subpathways, the CMGN was constructed by merging common nodes and edges of PC-relevant subpathways (see section "Materials and Methods"). In the network, there were 3,030 edges between 91 miRNA nodes and 640 gene nodes, of which 48 edges had a strong negative correlation between the level of miRNA and target gene expression (r < −0.7) (Figure 3A). We evaluated the degree of node distribution in the network and observed a power-law and exponential distribution, respectively ( Figure 3B). Therefore, the CMGN displayed scale-free characteristics, indicating that it was not random but organized according to a core set of principles in its structure that distinguished it from randomly linked networks (Barabasi et al., 2011). The known PC-associated miRNAs and genes were further mapped to the network, and 93 PC-associated molecules (39 known PC-associated miRNAs and 54 known PC-associated genes) were found to be involved in the network. The topological characteristics of the network were examined, revealing that these known PC-associated miRNAs and genes showed a significantly higher degree and betweenness centrality than the other miRNAs and genes (Figures 3C,D). The specific topological patterns reflected the functional importance of the known PC-associated miRNAs and genes in the CMGN. A higher degree indicated that the nodes were likely to be hubs and had high probabilities of engaging in essentially biological functions. A higher betweenness centrality implied that they acted as bridges connecting different network components and controlling communication.
All of the nodes' topological features of the network were ranked and the top 10 genes and miRNAs of each dimension are listed in Table 2. It was observed that all top 10 genes were differentially expressed, three of which (NRAS, MYC, and EGFR) have been well-described as known PC-associated genes. MAPK3 (also known as ERK1/2) is a member of the MAP kinase family that exhibited the highest degree, and appeared in 55 PC-relevant subpathways. MAPK3 is activated by upstream kinases, which results in its translocation to the nucleus where it phosphorylates nuclear targets. Moreover, MAPK3 plays an important role in the MAPK/ERK cascade. Indeed, MAPK3 inhibitors can inhibit the growth of PC cells through the RAS-RAF-MEK-ERK pathway (Walters et al., 2013). MAPK9 exhibited the second highest degree and was inversely correlated with the expression of seven miRNAs in our study, including four known PC-associated miRNAs (miR-93, miR-20a, miR-17, and miR-320a). MAPK9 has previously been identified as a potentially promising biomarker in exploratory studies, and was observed to be overexpressed in PC patients (300 cases, 300 controls) (Bracci et al., 2012). In addition, miR-320a had the highest degree among miRNAs and appeared in 92 PC-relevant subpathways. The study by Wang et al. (2016) found that miR-320a over-expression promoted PC cell proliferation, migration and invasion, and demonstrated that miR-320a suppressed PDCD4 mRNA expression in 5-Fluorouracil-resistant human PC cells. These findings suggest that miR-320a may serve as a potential target for PC therapy . When applied to clinical serum samples, miR-320a could accurately predict late chronic pancreatitis (Xin et al., 2017). Together, these results demonstrate that the CMGN based on PC-relevant subpathways can provide insight into cancer-associated transcriptional regulatory networks. Thus, the CMGN may be able to identify key factors that participate in multiple pathways.

The Identification of Key Functional miRNA-mRNA Modules in the CMGN
Analysis of the CMGN can provide a global view of the regulatory relationships involved in PC-relevant subpathways. To reveal the detailed modular organization of the CMGN, the functional modules in the network were mined using the Clique Percolation Method implemented by CFinder software with a value of parameter k = 4 (see section "Materials and Methods"). A total of 30 modules were detected, among which greater attention was paid to the miRNA-mRNA modules. There were 12 modules that contained at least one miRNA and mRNA, two of which were absolutely included in the other modules. Therefore, the larger modules were retained. Ultimately, 10 non-redundant miRNA-mRNA modules were obtained, consisting of 4 ∼ 23 genes/miRNAs ( Figure 4A).
For each miRNA-mRNA module, a GO analysis of the coding genes from a given module was carried out based on the GO terms (see section "Materials and Methods"), and each module was annotated with the enriched functions of the corresponding gene set (Supplementary Figure 1). Processes for the maintenance of cell homeostasis (e.g., cell cycle regulation and cytosolic calcium ion homeostasis) and cancer development-related processes (e.g., cell-matrix adhesion and JAK-STAT cascade and glycolysis) were highly represented. Cancer is a complex disease characterized by select hallmarks of cancer, including resistance to cell death, tissue invasion and metastasis, as well as the induction of angiogenesis (Hanahan and Weinberg, 2011). These hallmarks provide a framework for understanding the remarkable diversity of various cancers. Thus, through comparing the GO categories used as proxies for Of these, 48 edges represented a strong negative correlation (r < -0.7) between the miRNA and its target genes marked in green. Disease-associated genes/miRNAs in the CMGN exhibited specific topological characteristics compared to the other nodes. (B) The network revealed a power law distribution with a slope of -1.343 and R 2 = 0.816. The X-axis indicates the degree of node distribution. The Y -axis indicates the number of nodes according to the X-axis. (C) and (D) Disease-associated nodes had a higher degree and betweenness centrality than the other nodes. Data are shown as the mean ± SEM. Disease-associated and other nodes are indicated in red and blue along the X-axis. The average degrees of the two groups of nodes are indicated by the Y -axis.   Figure 4B). We tested the extent to which the molecules overlap between each miRNA-mRNA module and PC-relevant subpathways using a hypergeometric test with a p-value < 0.0001 and annotation proportion ≥ 70% (see section "Materials and Methods"). As a result, nine out of 10 miRNA-mRNA modules were annotated in at least one PC-relevant subpathway (Supplementary Table 3). We found that the modules, that overlapped with other modules, were enriched in more PC-relevant subpathways, indicating that these modules may have multiple functions. For instance, module M5, which overlapped with M3, M4, and M6, was enriched in six PC-relevant subpathways: (1) path: 04510_1 that belonged to focal adhesion; (2) path: 05410_1 that belonged to hypertrophic cardiomyopathy; (3) path: 05414_1 that belonged to Dilated cardiomyopathy; (4) path: 05412_1 that belonged to arrhythmogenic right ventricular cardiomyopathy; (5) path: 04810_1 that belonged to regulation of actin cytoskeleton; and (6) path: 04520_1 that belonged to adherens junction. Interestingly, three of these subpathways enriched by module M5 were associated with cardiomyopathy. To our knowledge, advanced cancer can induce fundamental changes in metabolism and promote cardiac atrophy and heart failure. Thackeray et al. (2017) discovered systemic insulin deficiency in cachectic cancer patients and demonstrated that cancer-induced systemic insulin depletion contributes to cardiac wasting and failure. In addition, low-dose insulin supplementation was found to attenuate these processes in mice with advanced melanoma or colon carcinoma.
Module M9 contained four molecules (ERBB2, EGFR, PDGFD, and miR-21), three of which are known PC-associated genes and miRNAs. Studies have shown that increasing EGFR activity can over-activate downstream pro-oncogenic signaling pathways (e.g., RAS-RAF-MEK-ERK MAPK, and AKT-PI3K-mTOR pathways), which can then activate many biological outputs that may contribute to cancer cell proliferation (Wee and Wang, 2017). EGFR was also identified as a hub in the CMGN, indicating that EGFR is able to crosstalk with other molecules. Studies have demonstrated that a shorter EGFR intron 1 CA repeat length is associated with a worse PC clinical prognosis (Tzeng et al., 2007). ERBB2, also known as HER2, was functionally characterized by an extraordinarily strong catalytic kinase activity, and represents a key oncoprotein that can trigger cornerstone intracellular signaling events for cell growth and survival, further leading to increased signal transduction and activation of the MAPK and PI3K/Akt pathways (Wong and Hurvitz, 2014). Moreover, Cheng et al. (2017) identified the ERBB2 exon17 mutation as an independent factor associated with overall survival among metastatic PC patients. Dillhoff et al. (2008) demonstrated that miR-21 was significantly overexpressed in PC. Although we could not find any direct evidence to support an important role of PDGFD in PC, we found that PDGFD could regulate many cellular processes, including cell proliferation, apoptosis, transformation, migration, invasion, angiogenesis and metastasis (Wang et al., 2010). Recent studies (Walters et al., 2013;Ogawa et al., 2015) have also shown that PDGFD is closely related to various cancers.

miRNA-mRNA Modules Are Potential Prognostic Biomarkers for PC
To assess the prognostic performance of these miRNA-mRNA modules, a survival analysis was performed on each of the 10 miRNA-mRNA modules. After testing all the coding genes of the given module with the risk score model (see section "Materials and Methods"), eight out of 10 modules were found to be significantly related to the prognosis in independent validation dataset 1 (Figure 5A). In each of the eight modules, the patients in independent validation dataset 1 were divided into either a highrisk score or low-risk score group. Patients in the high-risk-score group had a significantly shorter overall survival than those in the low-risk score group. To test whether the whole miRNA-mRNA module showed better prognostic performance than a single coding gene, we calculated the Kaplan-Meier p-value of every single coding gene of the modules in independent validation dataset 1. Five out of eight modules (module M1, M3, M5, M7, and M9) showed better prognostic performance than any single coding gene in the module (Figure 5B). More interesting, each coding gene of module M3, M7, and M9 was almost nonsignificantly related to prognosis; however, the three modules were all significantly related to the prognosis in independent validation dataset 1. For example, each coding gene of module M3 showed bad prognostic performance according to non-significant Kaplan-Meier p-value. But the whole miRNA-mRNA module of M3 showed better prognostic performance. Another survival analysis was also carried out on each of the 28 non-redundant modules to test whether their expression levels were associated with PC prognosis. A total of 17 out of 28 non-redundant modules were related to the prognosis of the PC patients based on the level of gene expression. Furthermore, 13 out of 17 modules showed better prognostic performance with the whole module than any individual coding gene (Supplementary Table 4). These findings indicate that the miRNA-mRNA modules in the CMGN provide clinical guidance for cancer prognosis.
To test the reproducibility of the 10 miRNA-mRNA modules' prognostic performance, all coding genes of the given module in independent validation dataset 2 were tested using the same model and criteria as validation dataset 1. Two modules (modules M1 and M4) were excluded due to a matching of the genes to the mRNA expression profiles of less than 60% (Supplementary  Figure 2). Seven out of eight modules (M2, M3, M5, M6, M8, M9, and M10) were related with the prognosis of the PC patients. Six out of seven modules (M3, M5, M6, M8, M9, and M10) showed good prognostic performance in both validation datasets 1 and 2, indicating good reproducibility. To test whether the combination of the miRNAs and mRNAs could predict the survival of PC patients, the relationship of the PC patients' outcome and 10 miRNA-mRNA modules were further evaluated both coding genes and miRNAs using based on the same model and criteria. We observed a smaller Kaplan-Meier p-value based on the level of coding gene and miRNA expression levels compared to only the level of coding gene expression (Supplementary Figure 3). These findings indicate that the majority of miRNA-mRNA modules exhibit good reproducibility, and the involvement of miRNA expression levels may improve the prognostic performance of these modules for PC patients. Furthermore, the six good reproducible modules could serve as potential prognostic molecular biomarkers in PC.
Since the module consisted of connected genes and miRNAs in the network, there may be redundant genes/miRNAs in the module for predicting the survival of PC patients. To select the best prognostic signature, we compared the performances of all the gene combinations in each good reproducible module. A survival analysis was subsequently performed on every combination of genes in each good reproducible module using the same model as independent validation dataset 1. Considering the Kaplan-Meier p-value of all combinations in each module, one of these selected combinations was defined as the best signature. The most significant molecule combinations of the six good reproducible modules are listed in the Supplementary Tables 5-10. As expected, four out of six good reproducible modules (modules M3, M5, M9, and M10) exhibited the best signature, which could be represented by fewer molecules, for predicting the PC patients' survival. For example, module M5 could be used to separate PC patients into high and lowrisk groups using all genes and miRNAs to a greater extent than any single coding gene. In addition, a combination of two genes (ITGB4 and VCL) was associated with a better prognostic performance. Studies have shown that high levels of ITGB4 expression are significantly correlated with the hallmarks of epithelial-mesenchymal transition, high tumor grade, and the presence of lymph node metastasis, and also exhibit an independent prognostic effect (Masugi et al., 2015). Moreover, VCL has been identified as a potential novel oncogene in pancreatic adenocarcinoma (Loukopoulos et al., 2007). Taken together, four modules (modules M3, M5, M9, and M10) among the six good reproducible modules were found to play an important role for the prognosis of PC patients.

DISCUSSION
The dysregulation of miRNA expression has been widely observed in the development and progression of complex human diseases, such as cancer. In this study, DE genes and miRNAs were mapped into undirected pathway graphs embedded by miRNA as signature nodes. We obtained 105 significantly miRNA-mediated subpathways associated with PC as PC-related subpathways using a hypergeometric test. The PC-related subpathways provide biological insight for dissecting PC pathology. The key genes or miRNAs may participate in multiple pathways. Therefore, a comprehensive miRNA-gene network was built by merging common nodes and edges. Next, the topological characteristics of the network were analyzed and functional modules were detected. Finally, the functions of modules containing miRNAs were analyzed and their effect on the outcome of PC patients was evaluated. As a result, 105 subpathways were identified to be significantly associated with PC, in which more than a third of the nodes were The miRNA-mRNA module as a whole could better distinguish the two groups of patients compared to any individual coding gene within it, especially modules M3, M7, and M9. The whole module and associated coding genes are presented along the X-axis. The log value (base 10) of the overall survival rates are shown along the Y -axis.
found to be known cancer-associated genes and miRNAs. We focused on two PC-relevant subpathways belonging to the MAPK signaling pathway and cell cycle. The network analysis of the CMGN built by the PC-related subpathways revealed that the known PC-associated genes/miRNAs showed a significantly higher degree and betweenness centrality than other the nodes, indicating their functional importance. A total of 30 modules were detected in the CMGN using CFinder software, of which 10 miRNA-mRNA modules were further examined. A GO enrichment analysis revealed that the miRNA-mRNA modules were highly enriched in processes involved in the maintenance of cell homeostasis and cancer development. The survival analysis was performed on 10 miRNA-mRNA modules in two independent validation datasets. Of these, six of the modules showed good prognostic performance in both validation dataset 1 and 2. In particular, while each coding gene of modules M3, M7, and M9 was not significantly related with prognosis, the three modules were all significantly related to prognosis in independent validation dataset 1. These results revealed that the six good reproducible modules could serve as potential prognostic molecular biomarkers in PC.
The findings of our study provide a novel pipeline that can be used to identify prognostic molecular biomarkers from a comprehensive miRNA-gene network based on the identified miRNA-mediated subpathways associated with PC. However, the regulation between miRNA and mRNA is complex. The expression of miRNAs or mRNAs might be affected by other ncRNAs, including circRNAs and lncRNAs. For example, Zhao et al. (2017) demonstrated that lncRNA TUG1 could competitively sponge miR-382, thereby regulating EZH2. Their experiments further revealed that TUG1 overexpression promoted cellular proliferation and migration, and contributed to epithelial-mesenchymal transition (EMT) formation in PC cell lines . Similar studies have demonstrated that lncRNA H19 could partially promote PC cell invasion and migration by increasing HMGA2-mediated EMT through antagonizing let-7 in PC cell lines (Ma et al., 2014). Hsa_circ_0005785 was down-regulated in 20 sets of PC tissues and was inferred to interact with miR-181a and miR-181b based on the sequence analysis. MiR-181a plays an important role in regulating PC cell growth and migration and miR-181b has been shown to be related to PC cell resistance to gemcitabine . Identification of the upstream regulatory pathways of miRNAs contributes to research into the functional of miRNAs in various diseases. Information about the regulatory elements involved in the regulation of miRNA transcription is stored in recently published databases, including ENdb (Bai et al., 2020) and SEdb . Moreover, the information associated with these regulatory elements  can be integrated to construct a miRNA transcription regulatory network for a specific disease. Finally, the pipeline we proposed can work flexibly in practices. It supports other methods for identifying differentially expressed subpathways in complex diseases. Similarly, it also supports other methods for mining modules in the network formed by identified subpathways. In parallel, the pipeline is suitable for analyzing integrated networks and can be applied to other complex diseases.

DATA AVAILABILITY STATEMENT
The datasets generated for this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/ Supplementary Material.

AUTHOR CONTRIBUTIONS
YL analyzed data and wrote the manuscript. YL and YC collected the datasets. XH, YJ, and ML participated in the pre-processing of the datasets. XB, CF, ML, and JnZ performed the computational analysis. YC, JgZ, JH, BA, XL, JH, QP, FW, and MX performed the literature validation. CL and QW conceived the idea for the manuscript, provided guidance and critically revised the manuscript. All authors read and approved the final version to be published.