A Novel DNA Replication-Related Signature Predicting Recurrence After R0 Resection of Pancreatic Ductal Adenocarcinoma: Prognostic Value and Clinical Implications

The aim of any surgical resection for pancreatic ductal adenocarcinoma (PDAC) is to achieve tumor-free margins (R0). R0 margins give rise to better outcomes than do positive margins (R1). Nevertheless, postoperative morbidity after R0 resection remains high and prognostic gene signature predicting recurrence risk of patients in this subgroup is blank. Our study aimed to develop a DNA replication-related gene signature to stratify the R0-treated PDAC patients with various recurrence risks. We conducted Cox regression analysis and the LASSO algorithm on 273 DNA replication-related genes and eventually constructed a 7-gene signature. The predictive capability and clinical feasibility of this risk model were assessed in both training and external validation sets. Pathway enrichment analysis showed that the signature was closely related to cell cycle, DNA replication, and DNA repair. These findings may shed light on the identification of novel biomarkers and therapeutic targets for PDAC.


INTRODUCTION
Pancreatic ductal adenocarcinoma (PDAC) accounts for over 90% cases of pancreatic malignancies, and is characterized by an overall 5 years survival rate of less than 9% (Hackeng et al., 2016;Siegel et al., 2019). Over the past decade, its incidence has been increasing in many countries, making PDAC a major global health challenge (Kamisawa et al., 2016). Surgery is currently the only therapeutic strategy that potentially cures this devastating disease (Kamisawa et al., 2016). Complete resection with no microscopic residual disease (R0 resection) is considered the main objective of surgery (Bockhorn et al., 2014;Baldwin et al., 2016). R0 margins give rise to significantly longer recurrence-free survival (RFS) compared with R1 margins (positive resection margins). However, even if R0 resection is achieved, more than half of patients subsequently experience local recurrence or distant metastases within 2 years of surgery (Kim et al., 2017;Strobel et al., 2017;Ghaneh et al., 2019;Tummers et al., 2019). Clinical features including but not limited to tumor size and tumor grade have been associated with tumor recurrence (Abe et al., 2020;Kim et al., 2020;Yoon et al., 2020); nevertheless, their predictive ability was not always satisfactory in R0-treated patients. Effective prediction of recurrence may help to tailor adjuvant chemotherapy and postoperative surveillance in this subgroup of patients.
DNA replication represents an under-explored source of prognostic markers that could be employed to predict tumor recurrence (Trenner and Sartori, 2019). The standard regimens after surgery, radiotherapy and chemotherapy, both kill cancer cells by amplifying genomic instability and by ultimately activating cell death pathways (Slade, 2020). Faithful execution of the DNA replication program through accurate replication and timely repair of damage to DNA is essential for cancer cell propagation and genomic stability, contributing to chemoresistance and radioresistance (Pilié et al., 2019;Kanellou et al., 2020;Wengner et al., 2020). Because of the cardinal importance of DNA replication in cancer, approaches targeting this biological process may provide a hopeful adjuvant therapy to improve PDAC prognosis. For example, poly ADP-ribose polymerase (PARP) inhibitors, which induce cell death via replication stress-induced mitotic catastrophe, have demonstrated anti-tumor efficacy in several clinical trials . For these reasons, studying the role of DNA replication-related genes in cancer prognosis may assist individualized patient management.
Therefore, in the present study, we analyzed R0-resected PDAC patients using the MTAB-6134 and The Cancer Genome Atlas (TCGA) datasets. We measured the ability of expression of DNA replication-related genes to predict recurrence, and constructed a 7-gene risk model with powerful predictive ability in both training and validation datasets. This signature is closely associated with cell cycle and DNA repair, two key biological processes that are involved in response to chemotherapy and radiotherapy. These findings may lay the foundation for the development of individual treatment strategies and may potentially be implemented in clinical management.

Data Sources
The gene expression profiles and corresponding clinical information of the MTAB-6134 dataset were downloaded from the ArrayExpress database 1 . Normalized RNA-sequencing data and matched clinical data of TCGA dataset were retrieved from 1 https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-6134/ TCGA hub at UCSC Xena 2 . The MTAB-6134 dataset was adopted as the training set, and TCGA dataset was employed for external validation. Samples with follow-up time <1 month or lack of clinical information were excluded from further analyses. After careful review, 282 cases (234 cases with R0 resection and 48 cases with R1 resection) from MTAB-6134, and 124 cases (77 cases with R0 resection and 47 cases with R1 resection) from TCGA were analyzed in this study. No patients received neoadjuvant treatment in both cohorts. Clinical information regarding adjuvant chemotherapy was only provided in TCGA cohort. A total of 44 patients, who received adjuvant chemotherapy and had corresponding drug response information in TCGA cohort, were selected to investigate the association of the gene signature with a response to adjuvant treatment. All clinical data from included patients are detailed in Table 1. The DNA replicationrelated gene set (DNA_REPLICATION) was extracted from the MSigDB database 3 . Expression profiles of selected genes in tumor and normal samples were downloaded from the Ualcan website 4 .

PDAC Tissues
A total of sixty PDAC samples were harvested at the Department of General Surgery of Ruijin Hospital from April 2012 to August 2018. The follow-up lasted until February 2019. The dissected tissues were stored in liquid nitrogen. Written informed consent was obtained from all patients. This study was conducted in accordance with the Declaration of Helsinki, and the Ethics Committee of Ruijin Hospital affiliated with Shanghai Jiao Tong University approved the study.

Construction of the DNA Replication-Related Gene Signature
In the MTAB-6134 dataset, DNA replication-related genes that were significantly associated with the RFS of R0 resected patients were identified using univariate Cox regression analysis and LASSO regression analysis. To avoid overfitting, a multivariate Cox regression analysis was further applied to determine an optimal risk signature with the minimum Akaike Information Criterion value. The risk score was computed as follows: Risk score = (coefficient * expression value of gene 1) + (coefficient * expression value of gene 2)+·+(coefficient * expression value of gene X). The best cut-off value for next survival analysis was determined by X-tile software (Camp et al., 2004). Prognostic performance of this signature was evaluated by Kaplan-Meier (K-M) survival analysis, receiver operating characteristic (ROC) analysis, and calibration curves in both training and validation cohorts.

Nomogram Based on the Signature
Nomogram is a quantitative method that is widely used for individual patient survival prediction. Survival-associated factors derived from the Univariate Cox regression analysis in MTAB-6134 cohort were identified. Based on the 234 patients in MTAB-6134 cohort, a nomogram comprised prognostic indicators was established to predict the 1-, 2-, and 3 years RFS by using the "rms" R package. Next, we used K-M survival analysis to assess the prognostic potential of this nomogram. In addition, we compared the predictive accuracy of this nomogram with that of clinical factors using the time-dependent area under the curve (AUC) generated from the "timeROC" R package.

Functional and Pathway Analyses
Correlated genes of the risk score were screened using Spearman correlation analysis in MTAB-6134 dataset (| Spearman's r| ≥ 0.3) and TCGA dataset (| Spearman's r| ≥ 0.4). The correlated genes were then submitted to the Metascape database (Zhou et al., 2019) for function annotation and pathway enrichment.

RNA Extraction and Quantitative Real-Time Polymerase Chain Reaction (qRT-PCR)
The total RNA from 60 PDAC samples (Ruijin cohort) was extracted using TRIzol reagent (Invitrogen, United States) and reverse-transcribed using an Evo M-MLV RT Kit (Accurate Biology, China). Real-time PCR was performed with an ABI 7900 instrument using ChamQ SYBR qPCR Master Mix (Vazyme, China). Quantitation was performed in triplicate. mRNA expression was calculated using the 2 − CT method and normalized to glyceraldehyde-3-phosphate dehydrogenase (GAPDH). The primers for the amplified mRNAs are summarized in Supplementary Table S1.

Statistical Analysis
The statistical analyses and graphic production were conducted using R software (version 3.5.2). K-M survival curves with logrank tests were generated using the "survival" package. ROC analyses were performed using the "survivalROC" package. Parameters in univariate and multivariate Cox analyses were derived from the "survival" package and were visualized using the "forestplot" package. The nomogram composed of independent prognostic indicators was established using the "rms" R package. Time-dependent area under the curve (AUC) values were calculated using the "timeROC" package. Boxplots depicting the distribution of gene expression and risk score were derived from the "ggpubr" package. P < 0.05 was considered significant.

Development of the DNA Replication-Related Signature
Univariate Cox regression analysis identified 20 genes that were significantly associated with RFS in the MTAB-6134 dataset. The LASSO Cox regression algorithm was applied to these 20 genes to select the most useful predictive features, and 14 candidate genes were identified for further analysis (Figures 2A,B). Then, multivariate Cox regression analysis was applied and a risk prediction model comprising seven DNA replicationrelated genes was created finally ( Figure 2C). According to the expression values and corresponding coefficients of the seven genes derived from multivariate Cox regression analysis, we established a risk-score formula: Risk score = 0.420   Based on the optimal cut-off value determined by X-tile, we classified patients into high-and low-risk groups. The distribution of survival status, risk scores, and heatmap for expression levels of seven genes in training samples are shown in Figure 3A. The survival curve illustrated that patients in the low-risk group had significantly longer RFS (HR = 4.55, P < 0.0001) than did patients in the high-risk group (Figure 3B). ROC analysis showed that the risk score had moderate predictive value for short-term recurrence, with an AUC of 0.759 for 1 year RFS (Figure 3C). In addition, the 7-gene signature had a higher AUC than grade, N stage and T stage in predicting 1 year RFS (Figure 3C). The calibration curves indicated that the predicted survival probabilities using this model accorded well with the observed survival probabilities (Figure 3D). Univariate Cox regression analysis revealed that histological grade, T stage, N stage, and risk score were risk factors for RFS in MTAB-6134 cohort ( Figure 3E). After further multivariate Cox regression of prognostic factors, risk score remained an independent predictor of patient survival ( Figure 3F). Figure 3G illustrates that the risk score distributed differently with respect to the histological grade. We applied this risk signature in patients with R1 resection and found that the predictive accuracy was lower in this subgroup of patients with AUC value predicting 1 year RFS decreasing to 0.683 in the MTAB-6134 cohort (Supplementary Figure S2). Figure 4A shows the distribution of the survival status, risk scores, and expression value of the seven genes in validation samples. K-M survival curves demonstrated that patients in the high-risk group had shorter survival times ( Figure 4B). The risk score showed a satisfactory 1 year AUC of 0.757 in TCGA dataset ( Figure 4C). The 7-gene signature exhibited better predictive performance for early recurrence than several traditional clinical indicators, including grade, N stage, and T stage (Figure 4C). The calibration plot revealed optimal agreement for prediction of 1-, 2-, and 3 years survival probability in TCGA cohort ( Figure 4D). Both univariate and multivariate Cox analyses demonstrated the independent prognostic role of the risk signature for predicting tumor recurrence (Figures 4E,F). Figure 4G shows that higher histological grade correlated with higher risk score. However, the risk signature was no longer an appropriate tool for predicting early recurrence of R1-treated patients in TCGA cohort (Supplementary Figures S3A,B), indicating that this model has good specificity. Adjuvant chemotherapy is the standard treatment for PDAC after surgery. However, PDAC is refractory to the chemotherapeutic agents and currently no effective biomarkers are indicative of the response to adjuvant chemotherapy. We then investigated whether our model could stratify postoperative patients with different response to chemotherapeutic treatment in TCGA cohort. 44 patients who received adjuvant chemotherapy with related drug response information were divided into low-and high-risk group based on the medium value of risk score. Patients whose drug response is "clinical progressive disease" or "stable disease" were classified into chemotherapy-resistant group, while patients whose drug response is "complete response" or "partial response" were classified into chemotherapy-sensitive group. As shown in Supplementary Figure S3C, patients in low-risk group were more sensitive to adjuvant chemotherapy compared with patients in high-risk group. It could partly explain the lower recurrence rate of patients in low-risk group compared with high-risk group.

Validation of the Prognostic Performance in a Local Cohort
In order to improve the credibility of this signature, we subsequently validated the prognostic ability in the Ruijin cohort.  K-M curves showed that the signature effectively captured the survival differences in RFS ( Figure 5A). As illustrated in Figure 5B, the signature showed remarkable accuracy in predicting 1 year RFS as the AUC value was 0.816. These biological findings further confirmed the prognostic accuracy of the 7-gene signature.

Comprehensive Analysis of EREG Expression in PDAC
To obtain an overview of the expression profiles of seven DNA replication-related genes in PDAC, we investigated the correlation between gene expression values and clinical features in both MTAB-6134 and TCGA datasets. It is worth noting that the expression of EREG was significantly higher in grade three patients than grades 2 or 1 patients in both MTAB-6134 and TCGA cohort (P < 0.05, Figures 6A,B). This result suggests that elevated expression levels of EREG might promote tumor malignancy. Expression levels of seven genes in patients categorized by T stage and N stage were also assessed in two cohorts. No common genes were distributed differently with respect. T stage and N stage (Supplementary Figure S4). The GEPIA database further confirmed the elevated expression levels of EREG in tumor tissues ( Figure 6C). Except for tumor tissues, EREG was also found to be highly expressed in PDAC cell lines compared with HPNE cells, which are derived from normal human pancreatic duct ( Figure 6D). In addition, elevated EREG expression levels were associated with shorter RFS in TCGA validation set ( Figure 6E). These results jointly demonstrated that EREG may be a novel therapeutic target in PDAC treatment.

Nomogram Construction
To facilitate clinical application, a graphic nomogram was developed based on the 234 patients in the MTAB-6134 cohort. Independent prognostic factors, including histological grade, T stage, N stage, and risk score, which were derived from the abovementioned univariate Cox analysis in MTAB-6134 cohort, were variables in the nomogram (Figure 7A). K-M analysis showed that this nomogram effectively discriminated patients with poor outcome from patients with favorable outcome (Figure 7B). The predictive efficacy of this nomogram was confirmed using AUC plots ( Figure 7C). The risk score exhibited higher dynamic AUC value than grade, T stage, and N stage over time, suggesting that our risk model outperformed clinical indicators in terms of survival prediction.

Functional Annotation and Pathway Enrichment
Pearson correlation analysis identified 193 and 621 genes that were co-expressed with the risk score in the MTAB-6134 and TCGA cohorts, respectively. Functional analysis revealed that 193 correlated genes in the MTAB-6134 cohort were involved in pathways related to cell cycle and DNA repair, suggesting an indispensable role in tumor recurrence ( Figure 8A). Similar results were observed in TCGA cohort; 621 correlated genes in this cohort were found to be enriched in pathways associated with cell cycle, DNA replication, and DNA repair ( Figure 8B).

DISCUSSION
Despite recent advances in the development of surgical techniques and perioperative management for patients with resectable PDAC, rates of postoperative morbidity, and mortality remain high (Ikuta et al., 2019). Unfortunately, due to the high degrees of heterogeneity, clinicopathological features including age, smoking history, TNM staging system, and histological grade provide limited information for estimating recurrence risk and outcome after surgery . Considering the current situation of this devastating disease, any attempts help to predict tumor recurrence and guide the selection of reasonable treatment options after surgery should be welcomed. In this study, we systematically assessed the prognostic performance of DNA replication-related genes for predicting tumor recurrence and constructed a novel and robust signature including EREG, KCTD13, MCM3AP, MCM3, POLD2, TERF2, and TP73. The AUC values for 1 year RFS of this model were 0.759 and 0.757 in the MTAB-6134 and TCGA cohorts, respectively, indicating moderate predictive accuracy. Univariate and multivariate Cox analyses demonstrated that this signature was an independent and powerful predictor of shortterm recurrence. Pathway enrichment analysis revealed that the signature was closely associated with DNA repair, cell cycle, and DNA replication.
Advancements in gene expression profiles derived from various high throughput technologies have improved our understanding of genetic and molecular alternations in PDAC. In recent years, many prognostic gene signatures have been developed by assessing crucial biological processes in PDAC progression, including autophagy (Yue et al., 2020), cell stemness (Feng et al., 2020) and immunological pathways (Wu et al., 2020). These models provided information for predicting OS. RFS prediction models are rare, to say nothing of risk models for the subgroup of R0 resected patients. In the present study, R0 margins were found to be associated with longer OS and RFS than were R1 margins in MTAB-6134 and TCGA cohorts. This finding is consistent with findings of earlier studies and indicates that survival analysis should be accurate for subgroup analysis. R0 resection is the aim of any curative resection, and the rate of R0 resection is considered a quality indicator for adequate oncological resection (Merkow et al., 2014). Thus, recurrence prediction for patients of this subgroup may have more clinical significance. Stratifying patients with resectable disease based on the predicted recurrence risk may facilitate personalized treatment and surveillance imaging. For patients with potentially resectable disease, if recurrence risk remains high even after complete resection, neoadjuvant therapy instead of upfront surgery is recommended to avoid futile surgery (Ikuta et al., 2019). For localized but unresectable disease, applying this model to biopsies obtained through a EUS-guided needle may help to tailor individualized treatments. That is to say, if the predicted recurrence risk is low, patients should be encouraged to receive neoadjuvant chemotherapy and potential complete resection. However, for localized but unresectable patients with high recurrence risk even after neoadjuvant chemotherapy and subsequent R0 resection, palliative treatments with the aim to improve life quality and ameliorate cancer complications may be more preferable.
Compelling evidence has revealed that DNA replication is closely associated with chemotherapy resistance, and several DNA replication-related genes have been confirmed as potential therapeutic targets in PDAC (Dunlop et al., 2020;Lloyd et al., 2020). Nevertheless, it remains unclear as to how DNA replication-related genes affect patient outcome, and this phenomenon has not been well reported. It would be significant to identify DNA replication-related biomarkers to predict recurrence risk and to explore novel targets of chemotherapy. In the present study, we systematically analyzed prognostic values of DNA replication-related genes for predicting tumor recurrence and integrated seven genes into a single signature using LASSO algorithm and multivariate Cox analysis. This signature effectively distinguished patients with significantly different postoperative survival in MTAB-6134 and TCGA cohorts. ROC analyses revealed that the signature had better predictive ability than clinical indicators, suggesting that the signature may serve as a complement to the current TNM staging system.
Among the seven genes, EREG, MCM7, and TP73 were previously reported to be involved in PDAC initiation and progression. EREG encodes a 46-amino acid protein that belongs to the epidermal growth factor receptor tyrosine kinase family (Toyoda et al., 1995). EREG is up-regulated in PDAC and stimulates pancreatic cancer cell growth in vitro (Zhu et al., 2000). MCM2-7 family members interact with one another to trigger the initial step of DNA replication (Evrin et al., 2009). High MCM7 expression results in poor outcomes in patients with PDAC (Peng et al., 2016;Liao et al., 2018). TP73 is activated in response to DNA damage and regulates many downstream biological processes including cell cycle and apoptosis (Barbhuiya et al., 2019). Ex2 + 4G > A genotypes of TP73 are significant predictors for tumor response, tumor resectability and overall survival in PDAC (Dong et al., 2009). TERF2 has remarkable tumorspecific effects; however, its role in PDAC remains unknown. TERF2 is a robust predictor of patient survival in cervical cancer (Benhamou et al., 2016) and oral carcinoma (Ozden et al., 2014). The role of TERF2 in cancers is primarily to promote angiogenesis (El Maï et al., 2014;Zizza et al., 2019). KCTD13, MCMAP3, and POLG2 have been rarely reported in published studies.
Despite our remarkable findings, this study has some limitations. First, detailed data of patient therapy were not available from the MTAB-6134 dataset. Although this signature effectively distinguished patients with various recurrence risks, individual survival benefits of chemotherapy and radiotherapy in each risk group remain unclear. Second, the current study was performed on retrospective data, and should therefore be tested using prospective data. Third, there are differences with respect to the definition of R0 resection: 0 mm tumor distance from resection margin in the United States and >1 mm in many centers in Europe and Australia (Demir et al., 2018). Although this model has been validated in both European (the MTAB-6134 cohort) and American populations (TCGA cohort), care should be taken to apply this model in patients of different nationalities. Fourth, more in vivo and in vitro experiments are needed to clarify biological function of the seven genes in terms of tumor progression.
In conclusion, we first reported a practical seven-gene signature based on the DNA replication-related genes and demonstrated that this signature could serve as a powerful predictor of tumor recurrence for patients with PDAC following R0 resection. The signature may help to provide reliable guidance and improved accuracy for treatment administration and surveillance imaging after surgery. However, the considerable variability with respect to the definition of R0 may impair accuracy and the predictive efficacy of this signature needs to be investigated in prospective studies.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/ Supplementary Material.

AUTHOR CONTRIBUTIONS
ZF: conceptualization, methodology, software, validation, formal analysis, and writing-original draft preparation. ZF and KL: investigation and data curation. ZF and JL: resources and visualization. CP and YW: writing-review and editing and supervision. CP: project administration and funding acquisition.
HC: numerical correction and statistical review. All authors contributed to the article and approved the submitted version.

FUNDING
This work was supported by grants from the National Natural Science Foundation of China (81672325 and 81802316).