A Prognostic Prediction Model Developed Based on Four CpG Sites and Weighted Correlation Network Analysis Identified DNAJB1 as a Novel Biomarker for Pancreatic Cancer

Background The prognosis of pancreatic cancer, which is among the solid tumors associated with high mortality, is poor. There is a need to improve the overall survival rate of patients with pancreatic cancer. Materials and Methods The Cancer Genome Atlas (TCGA) dataset with 153 samples and the International Cancer Genome Consortium (ICGC) dataset with 235 samples were used as the discovery and validation cohorts, respectively. The least absolute shrinkage and selection operator regression was used to construct the prognostic prediction model based on the DNA methylation markers. The predictive efficiency of the model was evaluated based on the calibration curve, concordance index, receiver operating characteristic curve, area under the curve, and decision curve. The xenograft model and cellular functional experiments were used to investigate the potential role of DNAJB1 in pancreatic cancer. Results A prognostic prediction model based on four CpG sites (cg00609645, cg13512069, cg23811464, and cg03502002) was developed using TCGA dataset. The model effectively predicted the overall survival rate of patients with pancreatic cancer, which was verified in the ICGC dataset. Next, a nomogram model based on the independent prognostic factors was constructed to predict the overall survival rate of patients with pancreatic cancer. The nomogram model had a higher predictive value than TCGA or ICGC datasets. The low-risk group with improved prognosis exhibited less mutational frequency and high immune infiltration. The brown module with 247 genes derived from the WGCNA analysis was significantly correlated with the prognostic prediction model, tumor grade, clinical stage, and T stage. The bioinformatic analysis indicated that DNAJB1 can serve as a novel biomarker for pancreatic cancer. DNAJB1 knockdown significantly inhibited the proliferation, migration, and invasion of pancreatic cancer cells in vivo and in vitro. Conclusion The prognostic prediction model based on four CpG sites is a new method for predicting the prognosis of patients with pancreatic cancer. The molecular characteristic analyses, including Gene Ontology, Gene Set Enrichment Analysis, mutation spectrum, and immune infiltration of the subgroups, stratified by the model provided novel insights into the initiation and development of pancreatic cancer. DNAJB1 may serve as diagnostic and prognostic biomarkers for pancreatic cancer.

Results: A prognostic prediction model based on four CpG sites (cg00609645, cg13512069, cg23811464, and cg03502002) was developed using TCGA dataset. The model effectively predicted the overall survival rate of patients with pancreatic cancer, which was verified in the ICGC dataset. Next, a nomogram model based on the independent prognostic factors was constructed to predict the overall survival rate of patients with pancreatic cancer. The nomogram model had a higher predictive value than TCGA or ICGC datasets. The low-risk group with improved prognosis exhibited less mutational frequency and high immune infiltration. The brown module with 247 genes derived from the WGCNA analysis was significantly correlated with the prognostic prediction model, tumor grade, clinical stage, and T stage. The bioinformatic analysis indicated that DNAJB1 can serve as a novel biomarker for pancreatic cancer. DNAJB1 knockdown significantly inhibited the proliferation, migration, and invasion of pancreatic cancer cells in vivo and in vitro.

INTRODUCTION
Pancreatic cancer, which is one of the gastrointestinal tract malignancies associated with high mortality, is the fourth most common cause of cancer-related deaths in the United States of America (1). Due to the specific anatomical position and malignant phenotype of pancreatic cancer, most patients exhibit insidious onset and unspecific clinical symptoms at the earlier stage of pancreatic cancer. Large proportions of patients with pancreatic cancer are diagnosed at an advanced stage along with early distant metastasis and neural or vascular invasion. Thus, patients with pancreatic cancer exhibit a low survival rate with a 5-year survival rate of less than 5% (2,3). Currently, the classical TNM staging and blood tumor markers (CA 19-9, CA 125, and CEA) are used to assess the risk level in patients with pancreatic cancer and predict the prognosis, which are not highly efficient or accurate (4,5). There is an urgent need to devise strategies to increase the overall survival rate of patients with pancreatic cancer, which can be achieved by developing a sensitive and specific risk prediction model for prognosis. The novel biomarkers derived from the risk prediction model can serve as diagnostic, therapeutic, and prognostic biomarkers for pancreatic cancer.
The initiation and progression of various cancers are reported to be regulated by epigenetic alterations. DNA methylation, an important epigenetic regulation, silences tumor suppressor genes, and upregulates oncogenic genes through hypermethylation and hypomethylation of the corresponding CpG islands in the promoter regions, respectively. Several studies have demonstrated that numerous genes with deregulated methylation status, such as KRAS, CDKN2A, TP53, CD1D, MUC4, and MUC1 play vital roles in the progression of pancreatic cancer (6)(7)(8). Moreover, specific DNA methylation signatures in the circulating DNA from pancreatic juice and plasma can be used as novel biomarkers for pancreatic cancer (9,10). Several prognostic prediction models using DNA methylation data have been proposed for prostate, gastric, colorectal, and esophageal cancers (11)(12)(13)(14). These studies indicated that the DNA methylation status is closely associated with the prognosis of multiple cancers. The development of high-throughput sequencing and construction of large cancer genome databases, such as The Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC) have enabled access to massive sequencing data and the corresponding clinical data. This study aimed to explore the potential prognostic values of prognostic prediction model generated based on CpG sites in pancreatic cancer.
In this study, a prognostic prediction model based on four CpG sites was established using TCGA dataset. The conclusion of this model was verified in the external ICGC dataset. Next, this study demonstrated that the nomogram model generated using the independent prognostic factors can be used as an efficient tool for prognostic prediction. Additionally, the comparisons between molecular subgroups based on the prognostic prediction model identified novel biomarkers and therapeutic targets for pancreatic cancer. The molecular characteristic analyses among subgroups may aid in elucidating the mechanisms underlying pancreatic cancer.

Downloading and Preprocessing Data
The DNA methylation, RNA sequencing (RNA-Seq; HTSeq counts type), single nucleotide variation (MuTect type) data of patients with pancreatic cancer were downloaded from TCGA database 1 . The latest clinicopathological information and clinical follow-up data of patients with pancreatic cancer in TCGA were downloaded on 13 November 2019 (15). The DNA methylation status and clinical data of patients with pancreatic cancer in the ICGC database (the Australian Pancreatic Cancer Genome Initiative, https://icgc.org) were used as the validation cohort (16). In the DNA methylation data, the CpG sites with absent values in >70% of the samples were removed and the K-nearest neighbor algorithm was used to estimate and replace the missing values. The probes from upstream 2 kb to downstream 200 bp of the transcription start site region were used for further analysis. In the RNA-Seq data, the genes with missing values in >50% of the total samples were deleted. The silent mutation and mutation in the intron region of single nucleotide variation data were deleted (17). The detailed information of the pancreatic cancer samples obtained from TCGA and ICGC databases is shown in Supplementary Tables 1, 2, respectively.

Construction of the Prognostic Prediction Model Based on CpG Sites
The Cancer Genome Atlas dataset was used as the discovery cohort. The differentially methylated CpG sites between 10 normal and 185 tumor samples were identified from TCGA database. The R package "minfi" was used to normalize the β-values of methylation data. The Mann-Whitney U test was performed to select the differentially methylated CpG sites with adjusted p-value < 0.05 and | log2 fold-change| > 2 (11,18). The samples of patients with pancreatic cancer exhibiting survival time of less than 30 days were removed. In total, 153 samples were selected to identify the survival-related CpG sites. The CpG sites with p-value < 0.05 in both Cox and log-rank tests were used for the generation of prognostic prediction model. The least absolute shrinkage and selection operator (LASSO) regression was used to construct the prognostic prediction model using the R package "glmnet" (19,20). To verify the effectiveness of the model, the ICGC dataset with 235 pancreatic cancer samples was used as the validation cohort.

Nomogram Model Development
To comprehensively utilize the clinicopathological data to increase the predictive ability of the LASSO model, the independent prognostic factors were identified based on the univariate and multivariate Cox analyses. The nomogram was generated based on the selected independent prognostic factors and used to predict the 1-, 3-, and 5-year overall survival rates of patients with pancreatic cancer. The discriminative ability of the nomogram model was evaluated based on the calibration curve, concordance index (C-index), receiver operator characteristic (ROC) curve, and area under the curve (AUC). The decision curve analysis (DCA) was used to compare the clinical benefits among these models. The R packages "rms, " "survcomp, " "timeROC, " "survival, " and "stdca.R" were used for the analysis (21)(22)(23).

Molecular Characteristic Analyses of the Prognostic Prediction Model
To further explore the mechanisms underlying the prognostic prediction model, several molecular characteristic analyses were performed using the high-risk and low-risk groups depending on the model. The immunological infiltrations of six types of immune cells were calculated using TIMER (Tumor Immune Estimation Resource, https://cistrome.shinyapps.io/timer/) (24,25). The R package "maftools" was used to perform the mutation spectrum analysis (26). The R packages "clusterProfiler" and "ggplot2" were utilized to perform and visualize the results of Gene Ontology (GO) analysis and Gene Set Enrichment Analysis (GSEA) (27,28).

Weighted Correlation Network Analysis (WGCNA) to Identify the Hub Genes Associated With the Model
The prognostic prediction model effectively predicted the prognosis of the patients with pancreatic cancer and categorized the samples into high-risk and low-risk groups. The differentially expressed genes between the high-risk and low-risk groups may play a vital role in the progression of pancreatic cancer. The differentially expressed genes between different groups were calculated and selected using the R package "DESeq2" with an adjusted p-value < 0.05 and | log2 fold-change| > 1 (29). To identify the most relevant genes of the model, weighted correlation network analysis (WGCNA) was performed according to the official guideline of the R package "WGCNA" (30,31). The parameters used in the analysis were set as follows: best soft power threshold, 4; minimum module size, 30; merge cut height, 0.25. The Cytoscape software (version 2.8.3) was used to calculate and visualize the hub genes in the gene network (32,33).
Gene Expression Profiling Interactive Analysis (GEPIA), Kaplan-Meier (KM) Plotter, and TISIDB Databases The Gene expression profiling interactive analysis (GEPIA) website provided the differentially expressed genes between 179 pancreatic tumor samples and 171 normal samples based on the integrated RNA-Seq data from TCGA and Genotype-Tissue Expression (GTEx) databases (34). The KM plotter website provided the genes associated with overall survival and relapse-free survival of patients from TCGA dataset (35). The TISIDB website was used to analyze the relationship between clinicopathological information and gene expression (36).

Cell Culture and Transfection
The four human pancreatic cancer cell lines (AsPC-1, Capan-2, MIA PaCa-2, and SW1990) and one human normal pancreatic cell line (hTERT-HPNE) used in this study were purchased from the American Type Culture Collection (ATCC). The cells were cultured following the official guidelines provided in the ATCC website at 37 • C and 5% CO 2 . The pHBLV-U6-ZsGreenpuro lentiviral RNAi expression system containing the DNAJB1 shRNA sequence (5 -GGTGCCAATGGTACCTCTTTC-3 ) were designed and provided by Hanbio Biotechnology Co. Ltd. (Shanghai, China).

Western Blotting and Immunohistochemical Assay
The western blotting analysis was performed following the methods of a previous study (37). Equal amounts (30 µg) of protein were subjected to sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) using a 10% gel. The following primary antibodies used for the western blotting analysis were purchased from Proteintech Group (Rosemont, United States): anti-DNAJB1 (Catalog number: 13174-1-AP; 1:1000): and antialpha tubulin (Catalog number: 11224-1-AP; 1:3000) antibodies. The immunohistochemical assay was performed following the methods of a previous study (38).

Cell Proliferation, Invasion, and Migration Assays
The CCK-8 and colony formation assays were used to estimate the proliferative ability of different groups. For CCK-8 assay, 2000 cells of different groups were seeded into a 96-well plate. The cells in each well were incubated with 10 µL of CCK-8 solution (Beyotime biotechnology Co. Ltd., Shanghai, China) for 60 min. The optical density of the mixture was measured at 450 mm using a microplate spectrophotometer. The colony formation assay was performed using 1000 cells of different groups seeded in a 6-well plate. The culture medium was replaced every 3 days. After the appearance of visible colonies, 4% paraformaldehyde and crystal violet were used to fix and stain the colonies. The transwell and wound healing assays were used to analyze the cellular invasion and cellular migration, respectively. These assays were performed following the methods described in a previous study (37). The 96well plates, 6-well plates, transwell system, and cell culture flask were purchased from Guangzhou Jet Bio-Filtration Co., Ltd.

Xenograft Tumor Mouse Model
The subcutaneous tumor mouse model was used to assess the tumor cell proliferative ability in vivo following the method described in a previous study (39). Twelve BALB/c nude mice (4-week-old, female) were purchased from Huafukang Biotechnology Co. Ltd. (Beijing, China). The cells (1 × 10 6 ) of different groups in 100 µL phosphate buffer solution were injected into the right axillary area of each nude mouse. The subcutaneous tumor volume was measured and recorded once a week. The tumor volume was measured as follows: volume = 0.5 × L × W 2 , where L is the long axis of the tumor and W is the short axis of the tumor.

Statistical Analysis
The statistical analyses were performed in the R software (version 3.5.3) and RStudio software. The data were analyzed by twotailed Student's t-test and one-way analysis of variance (ANOVA). The difference was considered statistically significant when the p-value was less than 0.05.

Construction of the Prognostic Prediction Model Based on Four CpG Sites
To establish the prognostic prediction model based on CpG sites, 3173 differentially methylated CpG sites were identified among 10 normal and 185 tumor samples from TCGA dataset (Supplementary Table 3). Next, 1325 prognosis-related CpG sites with p-value < 0.05 in both Cox and log-rank tests were selected for further LASSO regression analysis (Supplementary Table 4). After the LASSO regression analysis, a prognostic model based on four CpG sites, namely cg00609645, cg13512069, cg23811464, and cg03502002, was developed (Figures 1A,B). The detailed information on the four CpG sites is shown in Supplementary Table 5. Based on the four CpG site β values and the corresponding risk coefficients, each patient was assigned a risk score according to the following formula: risk score = (cg00609645 × 1.461) + (cg13512069×1.226) + (cg23811 464 × 0.539) + (cg03502002 × 0.519). As shown in Figures 1C-E, the samples from TCGA dataset were separated into high-risk and low-risk groups based on the median of the risk scores (cutoff value: 0.694). In order to improve the universality of the prognostic model, the same cutoff value was used in the ICGC dataset. The analysis revealed that the risk score was significantly associated with the overall survival of patients with pancreatic cancer [Hazard ratio (HR), 11; 95% confidence interval (CI), 5.5-21; p < 0.001] in the TCGA discovery dataset. Similarly, the risk score also significantly predicted the overall survival of patients with pancreatic cancer (HR, 2; 95% CI: 1.4-3; p < 0.001) in the ICGC validation cohort (Figures 1D-F). These results suggested that the prediction model based on four CpG sites can be an effective tool to predict the prognosis of patients with pancreatic cancer.

Construction of Nomogram Model Based on the Independent Prognosis-Related Factors
To develop the nomogram model for predicting the prognosis of patients with pancreatic cancer, the univariate and multivariate Cox analyses were performed using the risk score and other clinicopathological factors. The univariate Cox analysis based on TCGA dataset revealed that the risk score (HR, 10 and p = 0.005), site of resection (HR, 0.50; 95% CI, 0.27-0.92; and p = 0.026), and radiation therapy (HR, 0.21; 95% CI; 0.07-0.68; and p = 0.009) can serve as prognosis-associated factors. According to the general rule, the multivariate Cox analysis was performed using these prognosis-associated factors to avoid the overfitting of the multivariable Cox model. The multivariate Cox analysis revealed that the risk score (HR, 24.68; 95% CI, 7.70-79.14; and p < 0.001), tumor grade (HR, 2.33; 95% CI, 1.16-4.65; and p = 0.017), and radiation therapy (HR, 0.14; 95% CI 0.04-0.50; and p = 0.003) were independent prognosis-related factors ( Table 1). The nomogram model was constructed using these independent prognosis factors to predict the 1-, 3-, and 5-year survival rates of patients with pancreatic cancer (Figure 2A).
To compare the predictive efficiency among the nomogram model, TCGA dataset, and ICGC dataset, the AUC of ROC curve was used to assess the discriminative ability. The calibration curve of the three models exhibited satisfactory consistency between the predicted survival rate and the actual survival rate. However, the C-index of the nomogram model (C-index, 0.83; 95% CI, 0.78-0.88) was higher than that of TCGA dataset (C-index, 0.79; 95% CI, 0.73-0.85) and ICGC dataset (C-index, 0.60; 95% CI, 0.56-0.65; Figures 2E-G). Moreover, the DCA curve revealed that the predicted clinical benefits of the nomogram model were better than those of TCGA dataset and ICGC dataset (Figure 2H). These results suggested that the prognostic prediction model based on four CpG sites can serve as an effective model for predicting prognosis in patients

Molecular Characteristics of the Subgroups Based on the Prognostic Prediction Model
The Cancer Genome Atlas dataset was divided into the highrisk and low-risk groups based on the risk score obtained from the prognostic prediction model based on four CpG sites. As the model was significantly associated with the prognosis, it is important to explore the underlying molecular mechanisms. The top 10 results of GO analysis of high-risk and low-risk groups, including molecular function (MF), biological process (BP), and cellular component (CC), are shown in Figure 3A. The GO terms were enriched in several important molecular mechanisms, such as regulation of ion transmembrane transport, regulation of trans-synaptic signaling, signal release, presynapse, ion channel complex, postsynaptic membrane, ion channel activity, cation channel activity, and potassium channel activity, which indicated a close relationship between cell signaling transduction and the model. As shown in Figure 3B, the GSEA revealed that glycolysis, MYC targets, Notch signaling, base excision repair, nucleotide excision repair, and p53 signaling pathway were significantly activated, whereas pancreas beta cells, ABC transporters, calcium signaling pathway, neuroactive ligand-receptor interaction, and type II diabetes mellitus were significantly inhibited in the high-risk group. The comparative mutation spectrum analysis identified genes with different mutational frequencies between the high-risk and low-risk groups (Supplementary Table 6). The top 10 genes are shown in Figure 3C. The classical genes associated with the progression of pancreatic cancer, such as KRAS, TP53, and CDKN2A exhibited increased mutational frequency in the high-risk group. Next, the immune cell infiltration was analyzed using the TIMER website. The immune scores of CD4 T cell, CD8 T cell, and macrophage in the high-risk group were significantly lower than those in the low-risk group. This indicated the immunological enhancement of the low-risk group ( Figure 3D and Supplementary Table 7).

Hub Genes Associated With the Prognosis Model Were Identified by WGCNA
The differentially expressed genes between high-risk and lowrisk groups were calculated based on the RNA-Seq data from TCGA dataset. In total, 1861 differentially expressed genes with adjusted p-value < 0.05 and | log2 fold-change| > 1 (Supplementary Table 8 and Figure 4A) were obtained. These differentially expressed genes were used as input data for WGCNA to identify the correlations between gene co-expression modules and clinical traits. The best soft power threshold of WGCNA was set as 4 to maintain the scale-free topology and competent connectivity (Figures 4B,C). The hierarchical clustering of WGCNA was utilized to construct five gene coexpression networks ( Figure 4D). As shown in Figure 4E, the brown module was significantly correlated with the risk score (correlation coefficient = 0.6, p = 6e-16). Moreover, the brown module was significantly positively correlated with tumor grade, clinical stage, and T stage ( Figure 4E). These results suggested that the 247 genes in the brown module played a significant role in the progression of pancreatic cancer. The detailed information on the genes of brown module is provided in Supplementary Table 9. To further identify the hub genes of the brown module, the correlation between module membership and gene significance for risk score (Figure 4F) was analyzed. The top 15 hub genes were obtained using the Cytoscape software and DNAJB1 served as the hub gene of the network (Figure 4G).

DNAJB1 Was Identified as a Novel Biomarker for Pancreatic Cancer
To comprehensively analyze the role of DNAJB1 and its family members in the progression of pancreatic cancer,  and relapse-free survival were analyzed. The detailed information is provided in Figures 5G-J and Supplementary Figure 2.
The results indicated that only DNAJB1 could serve as an unfavorable prognostic factor for overall survival and relapsefree survival. In contrast, DNAJB2, DNAJB5, and DNAJB7 served as favorable prognostic factors for overall survival and relapsefree survival. These results demonstrated that DNAJB1 might serve as a novel biomarker for pancreatic cancer. The diagnostic ROC curve revealed that DNAJB1 can be used as an effective diagnostic marker, which had a diagnostic value of 4.8 and AUC of 91.6% (95% CI: 82.5-93.3%, Figure 5C). To confirm whether DNAJB1 can be used as a novel biomarker in the plasma, the plasma exosomal RNA-Seq data of 6 healthy donors and 14 patients with pancreatic carcinoma from the GSE106804 (40) and GSE100232 (41) datasets were downloaded and integrated. The principal component analysis suggested that RNA-Seq data of healthy donors and patients with pancreatic carcinoma clustered separately ( Figure 5E). The expression level of exosomal DNAJB1 was upregulated in patients with pancreatic cancer. A large cohort study is needed to further investigate its diagnostic value ( Figure 5F). In addition to the top 15 hub genes, the other genes also deserved to be investigated. The expression levels of TMPRSS4, KCNN4, GJB3, ITGB4, PLEKHN1, TRIM29, GPRC5A, and NECTIN4 were also significantly upregulated in the pancreatic cancer tissues (Figure 6A). Moreover, the overall survival analysis indicated that these genes can be used as unfavorable prognostic factors (Figure 6B). These results suggested that WGCNA can effectively select survival-related genes. The detailed roles of these genes in pancreatic cancer should be investigated in future studies.

DNAJB1 Knockdown Inhibits Malignant Phenotype of Pancreatic Cancer in vitro and in vivo
To evaluate the specific role of DNAJB1 in pancreatic cancer, the relative expression level of DNAJB1 in the four pancreatic cancer cell lines (AsPC-1, Capan-2, MIA PaCa-2, and SW1990) and a hTERT-HPNE was analyzed. The expression level of DNAJB1 in the AsPC-1 and MIA PaCa-2 cell lines was higher than that in the other cell lines (Figure 7A). Therefore, the AsPC-1 and MIA PaCa-2 cell lines were chosen for further functional assays. The efficiency of DNAJB1 knockdown was detected by western blotting (Figure 7B). The results of CCK8 and colony formation assays indicated that DNAJB1 knockdown significantly inhibited the proliferation and colony formation rate of AsPC-1 and MIA PaCa-2 cells (Figures 7C-E). The results of transwell assay revealed that the AsPC-1 and MIA PaCa-2 cells exhibited markedly decreased invasion upon DNAJB1 knockdown (Figures 7F,G). The results of wound healing assay demonstrated that the knockdown of DNAJB1 significantly decreased the migration of AsPC-1 and MIA PaCa-2 cells (Figures 7H,I). The subcutaneous xenograft model was utilized to detect the cellular proliferation ability in vivo. The group injected with DNAJB1 knockdown AsPC-1 cells exhibited significantly smaller tumoral volumes than the negative control group (Figures 7J,K). The relative expression of DNAJB1 was detected by the immunohistochemical assay ( Figure 7L). These results indicated that DNAJB1 may be a novel promoter of pancreatic cancer. Further studies are needed to elucidate the underlying molecular mechanisms.

DISCUSSION
Pancreatic cancer, a malignancy associated with high mortality, has high heterogeneity. The patients with pancreatic cancer receiving similar therapies exhibit varied clinical outcomes (42). Therefore, the development of a risk stratification model may help clinicians to design personalized treatment programs for different patients. Previous studies have proposed various risk stratification models for the diagnosis, prognosis, and recurrence of pancreatic cancer, which exhibited better efficiency than the classical TNM stage (43)(44)(45). The rapid development of sequencing techniques has enabled the access to multi-omics data and high-quality clinical information through different databases, such as TCGA, ICGC, and Gene Expression Omnibus (17,46,47). These resources provide novel insights into the initiation and progression of multiple cancers. There is an urgent need to devise strategies to increase the overall survival of patients with pancreatic cancer. The development of a prognosis prediction model based on genome data can be a useful tool for molecular and precise medicine.
Several recent studies have demonstrated that the prognosis model based on DNA methylation data can be used to predict the prognosis of patients with various cancers with satisfactory efficiency. The circulating tumor DNA methylation markers can be utilized to generate a risk model for the diagnosis and prognosis prediction of ovarian cancer, colorectal cancer, and hepatocellular carcinoma (13,48,49). Additionally, the DNA methylation markers originating from tissues can also be employed to construct a prognosis prediction model for esophageal, gastric, and prostate cancers (11,12,14). These studies demonstrated that the DNA methylation status can serve as novel biomarkers to generate the prognosis prediction model. However, there are limited studies that have reported the significance of prognosis prediction model in pancreatic cancer. In this study, a prognostic prediction model based on four CpG sites, namely cg00609645, cg13512069, cg23811464, and cg03502002, was established. The model was generated based on TCGA dataset and the conclusion was verified in the external ICGC dataset. Next, a nomogram model was constructed based on the independent prognostic factors of pancreatic cancer. Chen H, et al. used three hypomethylated genes (SULT1E1, IGF2BP3, and MAP4K4) to construct a prognostic prediction model using the AUC (1-year: 0.62, 3-year: 0.69, and 5-year: 0.69) (50). Liao X, et al. had constructed a prognostic model comprising 9 hub genes and reported that the AUC for 1-, 3-, and 5-year was 0.641, 0.623, and 0.554, respectively (51). Compared to these two models, the nomogram model exhibited better prediction ability using the AUC (1-year: 0.81, 3-year: 0.91, and 5-year: 0.89). These results suggested that the nomogram model can be employed as an effective instrument for prognosis prediction in patients with pancreatic cancer. The model can be improved with increased access to sequencing data and clinical information.
To further elucidate the molecular mechanisms underlying the prognostic model, the GO, GSEA, mutation spectrum, and immune infiltration analyses were performed on the subgroups stratified by the prognostic prediction model. The low-risk group with improved prognosis exhibited less mutational frequency and high immune cell infiltration. The analysis of several important signaling pathways in the subgroups can aid in a better molecular understanding of the prognostic model. Furthermore, the WGCNA was performed using the clinical traits and differentially expressed genes. The brown module containing 247 genes was significantly correlated with the prognostic model, tumor grade, clinical stage, and T stage. Next, DNAJB1 was identified as the hub gene of the brown gene module. These results indicated that DNAJB1 can play a vital role in pancreatic cancer. Previous studies have reported that DNAJB1 expression, which is upregulated in the tissues, cell lines, and bile of cholangiocarcinoma, can serve as a new biomarker for cholangiocarcinoma (52). DNAJB1-PRKACA gene fusion is reported to play an oncogenic promoter role in fibrolamellar hepatocellular carcinoma (53,54). In addition, several researches have demonstrated that the DNAJB1-PRKACA gene fusion can also be found in the pancreatic and biliary intraductal oncocytic papillary neoplasm (IOPN), as well as in the intraductal papillary  (57). To identify the specific role of DNAJB1 in pancreatic cancer, a systematic analysis of DNAJB family members was performed. The analysis indicated that DNAJB1 may serve as a novel biomarker for the diagnosis and prognosis of pancreatic cancer. The role of DNAJB1 in the proliferation, migration, and invasion of pancreatic cancer cells was verified in vivo and in vitro. The molecular mechanisms of DNAJB1 in pancreatic cancer must be elucidated in future studies.

CONCLUSION
A novel prognostic prediction model was established based on four CpG sites for pancreatic cancer. The molecular characteristic analyses based on the model provided new insights into the initiation and development of pancreatic cancer. The WGCNA can serve as an excellent tool to identify the genes correlated with specific clinical traits. DNAJB1 can serve as a potential diagnostic and prognostic biomarker for pancreatic cancer.

DATA AVAILABILITY STATEMENT
All datasets presented in this study are included in the article/Supplementary Material.

ETHICS STATEMENT
The animal study was reviewed and approved by the Ethics Committee of Shengjing Hospital of China Medical University.

AUTHOR CONTRIBUTIONS
LK, PL, and XT contributed to conception and design of the study. XF, TW, and ZW organized the database. LK and PL performed the statistical analysis. LK and PL wrote the first draft of the manuscript. LK, PL, BZ, and JL wrote sections of the manuscript. All authors contributed to manuscript revision, read, and approved the submitted version.