Development and Validation of a 7-Gene Prognostic Signature to Improve Survival Prediction in Pancreatic Ductal Adenocarcinoma

Background: Previous prognostic signatures of pancreatic ductal adenocarcinoma (PDAC) are mainly constructed to predict the overall survival (OS), and their predictive accuracy needs to be improved. Gene signatures that efficaciously predict both OS and disease-free survival (DFS) are of great clinical significance but are rarely reported. Methods: Univariate Cox regression analysis was adopted to screen common genes that were significantly associated with both OS and DFS in three independent cohorts. Multivariate Cox regression analysis was subsequently performed on the identified genes to determine an optimal gene signature in the MTAB-6134 training cohort. The Kaplan–Meier (K-M), calibration, and receiver operating characteristic (ROC) curves were employed to assess the predictive accuracy. Biological process and pathway enrichment analyses were conducted to elucidate the biological role of this signature. Results: Multivariate Cox regression analysis determined a 7-gene signature that contained ASPH, DDX10, NR0B2, BLOC1S3, FAM83A, SLAMF6, and PPM1H. The signature had the ability to stratify PDAC patients with different OS and DFS, both in the training and validation cohorts. ROC curves confirmed the moderate predictive accuracy of this signature. Mechanically, the signature was related to multiple cancer-related pathways. Conclusion: A novel OS and DFS prediction model was constructed in PDAC with multi-cohort and cross-platform compatibility. This signature might foster individualized therapy and appropriate management of PDAC patients.


INTRODUCTION
Pancreatic ductal adenocarcinoma is an insidious and aggressive malignancy with a 5-year survival rate not exceeding 10% (Siegel et al., 2021;Garrido-Laguna and Hidalgo, 2015). Unfortunately, its incidence and mortality rates continue to increase, especially among women and people aged 50 years and over . Owing to the deep-seated location of the pancreas and the lack of available screening approaches of PDAC, most patients are diagnosed at an advanced, unresectable stage (Ducreux et al., 2015;Gillen et al., 2010), leading to a dismal prognosis (De La Cruz et al., 2014). Even after R0 resection, the most effective treatment for cure (Kamisawa et al., 2016), most patients will develop local recurrence or distant metastases despite adjuvant treatments, impairing a dramatic improvement of patient outcomes by surgical resection (Ghaneh et al., 2019;Strobel et al., 2017;Tummers et al., 2019). Therefore, novel identification of reliable biomarkers and predictive models is of significant need for PDAC patients, which will help to guide appropriate therapy and tailor postoperative surveillance in clinical management.
Accurate prediction of patient prognosis in PDAC has been the subject of many studies. Traditional clinicopathological features, including but not limited to tumor size (Ansari et al., 2017), resection margin status (Maeda et al., 2020), histological grade (Macías et al., 2018), and lymph node metastasis (Sho et al., 2015), have been investigated in previous studies. Recently, with the remarkable progress in bioinformatics and high-throughput sequencing technology, gene expression signatures considering the genetic and genomic differences of patients have emerged as a practical tool for survival assessment in human cancers (Yu et al., 2019;Supplitt et al., 2021;Doultsinos and Mills, 2021;Ahluwalia et al., 2021). In the context of PDAC, multiple prognostic gene models have been established with robust predictive performance (Luo et al., 2021;Gu et al., 2021;Xu et al., 2021a). However, most of these models are constructed to predict the overall survival (OS), and few predict disease-free survival (DFS). Given that postoperative recurrence is a feature of PDAC and it results in a poor prognosis, DFS prediction is of equivalent significance. Thus, development of a robust signature for both OS and DFS prediction is a laudable attempt.
To the best of our knowledge, only two previous gene signatures have the capability to predict both OS and DFS (Kim et al., 2019;Feng et al., 2020), and their predictive accuracy remains to be improved. In the current study, we identified a credible 7-gene signature for both OS and DFS prediction with moderate accuracy and cross-cohort compatibility. The area under the curve (AUC) values of this signature for survival prediction were no less than 0.7 in three independent cohorts. The relationship between gene signature, immune cell infiltration, and therapeutic effects was also investigated in this study.

PDAC Cohorts
Three public PDAC cohorts with both clinical data and gene expression data including , PACA-CA (N 181), and TCGA (N 141) were adopted in this study. Among them, the MTAB-6134 cohort was used as the training set, while PACA-CA and TCGA cohorts were used for external validation. Information about OS and DFS events and time was available from all of these three cohorts. Expression profiles and clinical data of MTAB-6134 cohort were downloaded from the ArrayExpress database (https://www.ebi.ac.uk/ arrayexpress/). Normalized RNA-sequencing (RNA-seq) data and clinical data of the PACA-CA cohort were retrieved and downloaded from the International Cancer Genome Consortium (ICGC, https://icgc.org/) database. Processed RNA-seq data and clinical information of the TCGA cohort were obtained from the TCGA hub at UCSC Xena (https://tcga.xenahubs.net). Samples with an OS or DFS less than one month were excluded for survival analyses. Information regarding chemotherapy was provided in PACA-CA and TCGA cohorts. Patients whose response to chemotherapy is "clinical progressive disease" or "stable disease" were defined as chemotherapy-resistant, while patients whose response to chemotherapy is "complete response" or "partial response" were defined as chemotherapy-sensitive. In addition, MTAB-6690, a cohort that contained gene expression profiles of 108 PDAC samples and 70 normal samples, was downloaded from the ArrayExpress database.

Construction of the 7-Gene Signature
To identify candidate genes for model construction, we initially applied univariate Cox regression analysis to screen genes associated with both OS and DFS through the Venn diagram (https://bioinfogp.cnb.csic.es/tools/venny/) in each cohort. Consistently, survival-related genes in these three cohorts were identified and subsequently submitted to multivariate Cox regression analysis using OS events and time in order to determine an optimal signature in the training MTAB-6134 cohort. Based on the gene expression values and corresponding coefficients, the risk score of each sample was calculated as follows: risk score (coefficient 1 * expression value of gene 1) + (coefficient 2 * expression value of gene 2) + ... + (coefficient N * expression value of gene N).

Prognostic Validation of the 7-Gene Signature
Patients in each cohort were divided into low-and high-risk groups based on the medium value of the risk score. The Kaplan-Meier (K-M) survival curves and calibration curves were used to evaluate the predictive performance of this signature. Receiver operating characteristic (ROC) curves were utilized to compare the predictive accuracy of the gene signature and clinical features.

Estimation of Tumor Immune Infiltrates
The deconvolution algorithm CIBERSORT (Chen et al., 2018) was used to estimate the relative proportions of 22 different immune cell infiltrates. The number of permutations was set to 1,000, and a p-value < 0.05 was considered as successful.

Functional Annotation and Pathway Enrichment of the 7-Gene Signature
To shed light on high-risk score-resulted unfavorable prognosis, we performed the Pearson correlation analysis to identify correlated genes with risk scores in the MTAB-6134 training cohort. According to the correlation coefficients, the top 1,000 positively and negatively correlated genes were submitted to Gene Ontology-Biological Process (GO-BP) analysis and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis on the DAVID online Web site (Huang et al., 2007), respectively.

Statistical Analysis
Statistical analysis and graphical work were finished in the R environment (version 3.5.2). Cox regression analyses were performed by the "survival" package and visualized by the "forestplot" package. K-M survival curves with log-rank tests were plotted by the "survminer" package. The ROC curves were depicted by the "survivalROC" package. Boxplots were generated from the "ggpubr" package. Calibration curves were derived from the "rms" package. p < 0.05 was considered significant.

Construction of the 7-Gene Signature
The research workflow of model construction is illustrated in Figure 1A. Univariate Cox analysis and Venn diagram identified 1,036 risky genes (hazard ratio >1) and 1,067 protective genes (hazard ratio <1) that were associated with both OS and DFS in the MTAB-6134 training cohort. These genes were further screened in the PACA-CA cohort and TCGA cohort by the same method, respectively. Eventually, 12 credible risky indicators and five protective indicators were identified. These 17 genes were incorporated in the stepwise multivariate Cox hazard ratio regression to select the best model for predicting OS of PDAC patients in the MTAB-6134 cohort. Multivariate Cox analysis resulted in an optimal 7-gene signature containing ASPH, DDX10, NR0B2, BLOC1S3, FAM83A, SLAMF6, and PPM1H ( Figure 1B). The PPI (protein-protein interaction) network was constructed using the STRING database (http:// www.string-db.org/) to investigate the interactions of these seven genes. As illustrated in Supplementary Figure S1, there is no interaction between these proteins. According to the expression values and corresponding coefficients of the seven genes derived from multivariate Cox regression analysis, we constructed a riskscore formula: risk score 0.234165*expression value of ASPH +0.284159*expression value of DDX10-0.29051*expression value of NR0B2-0.61974*expression value of BLOC1S3 + 0. The 161992*expression value of FAM83A-0.40561*expression value of SLAMF6-0.25676*expression value of PPM1H. The risk scores of PDAC patients were significantly higher than those of normal patients, indicating diagnostic potential of the signature ( Figure 1C). Moreover, the risk scores were markedly elevated in patients with a high histological grade in MTAB-6134 and TCGA cohorts, suggesting that the 7-gene signature was associated with tumor malignancy ( Figure 1D).

Prognostic Performance of the 7-Gene Signature for OS Prediction
Distribution of the risk scores, survival status, and expression level of the seven genes in three independent cohorts is FIGURE 2 | Prognostic performance of the 7-gene signature for OS prediction. (A) From top to bottom are the risk score distribution, survival status distribution, and heatmap of seven genes in three independent cohorts, respectively. (B) K-M curves estimating the OS difference between low-and high-risk groups in three independent cohorts, respectively. (C) Calibration curves of the signature in three independent cohorts. Calibration curves represent the relationship between observed (the data markers represent the mean, and the error bars represent the 95% CI) and predicted (gray line) OS using the 7-gene risk score.
Frontiers in Molecular Biosciences | www.frontiersin.org May 2021 | Volume 8 | Article 676291 4 illustrated in Figure 2A. The results showed that patients in the high-risk group had higher mortality rates. K-M survival curves demonstrated that patients in the low-risk group had a significantly longer OS in three independent cohorts ( Figure 2B). Calibration curves suggested that the predicted survival probabilities by the signature were in good agreement with the observed survival probabilities ( Figure 2C). Predictive Accuracy of the 7-Gene Signature and Clinical Predators for OS Univariate Cox regression analysis proved that the proposed gene signature and several clinical features were independent risk factors for OS in three cohorts ( Figure 3A). In order to clarify whether our signature could provide improved survival prediction, we conducted ROC analyses. As shown in Figures  3B-D, the AUC values of this signature were 0.769, 0.709, and 0.751 in MTAB-6134, PACA-CA, and TCGA cohorts, respectively, which were higher than those of clinical predictors. These findings demonstrated that the 7-gene signature outperformed clinical predictors in predicting OS.
Prognostic Performance of the 7-Gene Signature for DFS Prediction Figure 4A shows the distribution of the DFS event and time in MTAB-6134, PACA-CA, and TCGA cohorts. The results illustrated that patients in the low-risk group had remarkably lower recurrence rates and a significantly longer DFS. K-M survival curves indicated that the high-risk group had a significantly shorter DFS than the low-risk group in all of the three cohorts (p < 0.05, Figure 4B). Calibration curves indicated that the predicted DFS was in good accordance with the observed DFS in three independent cohorts ( Figure 4C).

Relationship Between the 7-Gene Signature and Response to Adjuvant Chemotherapy
Currently, adjuvant chemotherapy is administered empirically, and the individual survival benefit of this approach is still questionable in PDAC (Kleeff et al., 2016). Thus, we wondered whether the 7-gene signature could precisely predict chemotherapy sensitivity and provide references for clinical practice. As shown in Figure 6A, patients in the low-risk group had significantly higher response rates to adjuvant chemotherapy than patients in the high-risk group in the PACA-CA cohort (94 vs 70%, p < 0.001). A similar trend was seen in the TCGA cohort (55 vs 32%, p 0.001, Figure 6B). In patients who received adjuvant chemotherapy, the proposed signature could efficaciously capture the DFS differences between low-risk and high-risk groups in both cohorts ( Figures 6C,D). However, the predictive power of the model was weakened when applied to patients who had not received adjuvant chemotherapy. As shown in Figure 6E, DFS difference between high-risk and low-risk groups of this subset of patients was not significant in the PACA-CA cohort (p 0.14). In the TCGA cohort, DFS difference was also not strongly trustworthy (p 0.034, Figure 6F).

Relationship Between the 7-Gene Signature and Immune Cell Infiltration
The level of immune cell infiltration is closely related to the clinical efficiency of immunotherapy and the prognosis of PDAC patients (Zheng et al., 2013). Therefore, we explored the relationship between this signature and immune cell infiltration and inquired into the potential of our signature as a reliable predictor of immunotherapy response. In cases of the MTAB-6134 cohort, the abundance of macrophage M0, macrophage M2, and activated NK cells was significantly elevated in the high-risk patients, while CD8 + T cell infiltration was remarkably decreased in the high-risk patients ( Figure 7A). In the case of the TCGA cohort, the abundance of regulatory T cells, activated NK cells, and macrophage M0 was significantly elevated in the high-risk patients, while naïve B cells, CD8 + T cell, naïve CD4 + T cells, activated memory CD4 + T cells, and monocyte infiltration were remarkably decreased in the highrisk patients ( Figure 7B).

Biological Process and Pathway Analyses of the 7-Gene Signature
With the purpose to preliminarily illuminate how the risk signature affected patient prognosis, chemotherapeutic response, and immune cell infiltration, we performed functional annotation and pathway enrichment analyses on genes correlated with risk scores in the MTAB-6134 training cohort. For biological processes, positively correlated genes were primarily involved in cell proliferation, cell migration, and glycolysis ( Figure 8A), while negatively correlated genes were mainly related to T-cell activation, immune response, and apoptotic process ( Figure 8B). For KEGG pathway enrichment, genes with positive correlation were chiefly associated with the PI3K-AKT signaling pathway, HIF-1 signaling pathway, glycolysis, and cell cycle ( Figure 8C), while genes with negative correlation were principally enriched in primary immunodeficiency and the chemokine signaling pathway ( Figure 8D).

DISCUSSION
PDAC is a fatal disease featured with high molecular heterogeneity. The current prognosis assessment of PDAC patients is mainly dependent on the TNM (tumor, node, and metastasis) staging system. However, it is not adequate for individual survival prediction, especially in patients at the same stage   van Roessel et al., 2018). This intra-stage discrepancy is due to tumor heterogeneity (Juiz et al., 2019). As cancer treatment has entered into the area of precision medicine, overcoming molecular heterogeneity has become a hallmark in cancer research (McGranahan and Swanton, 2017). Hence, prognostic gene signatures that translate the increased understanding of tumor genetic and genomic alternations into clinical application are urgently needed. In this study, we developed and validated a 7gene signature with powerful prognostic performance for both OS and DFS prediction in PDAC. These seven genes were not overlapped to other prognostic gene signatures for PDAC.
In the process of signature construction, we initially identified and overlapped genes associated with OS and DFS in three independent large cohorts. A total of 17 genes were screened, and multivariate Cox regression analysis determined an optimal 7gene signature for OS prediction in the MTAB-6134 training cohort. In both training and external validation cohorts, its robustness was supported by the reproducibility of moderate predictive accuracy with the AUC values close to or exceeding 0.7. The superior accuracy of the 7-gene signature in both OS prediction and DFS prediction suggested that it could serve as a supplement to the existing staging system for prognosis evaluation and treatment decision. In addition to its prognostic value, the clinical implications of the 7-gene signature were also compelling.
Surgical resection combined with adjuvant chemotherapy has been the standard therapy for resectable PDAC patients (Von Hoff et al., 2013). Unfortunately, the clinical benefit response rates of chemotherapy are extremely low (Burris et al., 1997), and nonresponsive patients may experience a variety of adverse effects, including asthenia and nausea (Phua et al., 2018). Developing predictive biomarkers that can maximize survival benefits and minimize side effects is of significant importance for PDAC patients (Oshi et al., 2020). In this study, we found that the risk score was significantly related to chemotherapeutic sensitivity. In the cases who had received adjuvant chemotherapy, this signature could efficiently distinguish patients with different DFS time. We hypothesize that the 7-gene score may have a role in fostering personalized oncology to exempt PDAC patients from unnecessary cytotoxicity and heavy financial burden brought by overtreatment.
Recently, immunotherapy has drastically increased patient survival in multiple cancers, but it failed to elicit responses in the vast majority of patients with PDAC (Leinwand and Miller, 2020). The decreased number of tumor-infiltrating lymphocytes in a tumor microenvironment likely allows anticancer immunity to be overwhelmed in PDAC (Vonderheide and Bayne, 2013). In this study, CD8 + T cells were less infiltrated in high-risk patients, while M2 macrophages were more infiltrated in high-risk patients. CD8 + T cells recognize and kill the cancer cell (Farhood et al., 2019) and indicate a favorable prognosis in PDAC (Zhang et al., 2018;Carstens et al., 2017). M2 macrophages facilitate PDAC progression (Kurahara et al., 2011) and indicate an unfavorable prognosis in PDAC (McGuigan et al., 2021;Hu et al., 2016). These findings confirmed the hazardous role of this signature and suggested that it could be used to predict the immunotherapy response.
To preliminarily clarify the underlying mechanism of the risk signature-mediated poor prognosis, we investigated the biological function of this signature. Multiple oncogenic pathways that were strongly associated with tumor progression, chemotherapeutic resistance, and immune cell infiltration were enriched. Biological processes and pathways, including but not limited to the PI3K-AKT signaling pathway, HIF-1 signaling pathway, and glycolysis, were all implicated in the regulation of malignant behavior and anticancer immunity (Xu et al., 2021b;Kobayashi et al., 2020;Sun and Meng, 2020;You et al., 2020). These findings could partly explain how the risk score affected patient survival and immune cell infiltration.
Despite improved OS prediction and DFS prediction compared with previous models (Kim et al., 2019;Feng et al., 2020), this study is still based on retrospective data and presents several limitations. First, the clinical utility of the 7-gene signature in PDAC management should be reviewed and determined in more prospective studies. Second, all cohorts used in the current study were relatively small, probably because of the low surgical resection rates of PDAC. Thus, it needs to be validated in larger cohorts. Third, further in vivo and in vitro experiments are needed in order to clarify the biological roles of seven genes in PDAC tumorigenesis. Finally, due to the lack of significant data, we are unable to adequately assess the relationship between seven gene expression and clinical feature. With the development of follow-up studies, we hope to supplement them in future studies.
In conclusion, we proposed a 7-gene signature that provided improved OS prediction and DFS prediction. The clinical implications and biological relevance of this signature have been completed explored. However, the predictive efficacy of this signature needs to be tested in more larger cohorts and prospective studies.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. These data can be found here: Expression profiles and clinical data of the MTAB-6134 cohort and MTAB-6690 cohort were downloaded from the ArrayExpress database (https://www.ebi.ac.uk/arrayexpress/). Normalized RNA-sequencing (RNA-seq) data and clinical data of the PACA-CA cohort were retrieved and downloaded from the International Cancer Genome Consortium (ICGC, https://icgc.org/) database. Processed RNA-seq data, clinical information, and somatic mutation data of the TCGA cohort were obtained from the TCGA hub at UCSC Xena (https://tcga.xenahubs.net).

AUTHOR CONTRIBUTIONS
ZF and CP designed the study and wrote the manuscript. ZF, KL, HQ, and JL participated in data analysis, discussion, and language editing. HQ corrected grammatical errors of this manuscript. YW reviewed the manuscript. All authors contributed to the article and approved the submitted version.

FUNDING
This work was supported by grants from the National Natural Science Foundation of China (81672325 and 81802316).