The Impact of Immune Microenvironment on the Prognosis of Pancreatic Ductal Adenocarcinoma Based on Multi-Omics Analysis

Pancreatic ductal adenocarcinoma (PDAC) is a malignant tumor characterized by rapid progression, early metastasis, high recurrence, and limited responsiveness to conventional therapies. The 5-year survival rate of PDAC is extremely low (<8%), which lacks effective prognostic evaluation indicators. In this study, we used xCell to analyze infiltrating immune cells in a tumor and through the univariate and multivariate Cox analyses screened out two prognosis-related immune cells, CD4+TN and common lymphoid progenitor (CLP), which were used to construct a Cox model and figure out the risk-score. It was found that the constructed model could greatly improve the sensitivity of prognostic evaluation, that the higher the risk-score, the worse the prognosis. In addition, the risk-score could also identify molecular subtypes with poor prognosis and immunotherapy sensitivity. Through transcriptome and whole-exome sequencing analysis of PDAC dataset from The Cancer Genome Atlas (TCGA), it was found that copy number deletion and low expression of CCL19 might be crucial factors to affect the risk-score. Lastly, validation of the above findings was confirmed not only in Gene Expression Omnibus (GEO) datasets but also in our PDAC patient samples, Peking2020 cohort.


INTRODUCTION
Pancreatic ductal adenocarcinoma (PDAC) is one of the most fatal malignancies, characterized by rapid progression, early metastasis, high recurrence, limited responsiveness to conventional therapies, and a low 5-year overall survival (OS) rate (<8%) (1). As with other cancers, PDAC is considered as a complex genetic disease. The prognostic evaluation of PDAC is based on the characteristics of the tumor itself, such as TP53 deletion and KRAS mutation (2,3). Thus, only a handful have finally entered clinical applications. There is a growing realization that tumor microenvironment (TME) plays an important role in tumorigenesis and development, where infiltrating immune cells are indispensable factors affecting treatment and prognosis (4). Compared with tumor cells, immune cells as normal somatic cells in the tumor immune microenvironment have less mutation burden, have less heterogeneity within the tumor and among patients, and are easier to serve as a reliable and stable prognostic evaluation scale. For example, the prognosis of triple-negative (TN) breast cancer can be assessed by measuring the amount of lymphocytic infiltration, and the greatest survival benefit from each 10% increase in lymphocytic infiltrate can be derived (5), while high immune cell infiltration is negatively correlated with the prognosis of brain lower-grade glioma, glioblastoma multiforme, and uveal melanoma (6). These all imply that immune cell infiltration has different effects on the prognosis in different tumor backgrounds. For the prognosis of PDAC, there is no consensus on the impact of immune cell infiltration.
PDAC has always been considered as a "cold" tumor with extensive myeloid-derived suppressor cells (MDSCs; for example, tumor-associated macrophage and myeloid-derived suppressor cells) (7,8) and dense stromal tissue, which further hinders the flow of immune cells and makes it less sensitive to immunotherapy. On the other hand, this natural barrier may be more conducive to the homeostasis of local unique immune microenvironment in PDAC, making the infiltrated immune cells not prone to large fluctuations in a short time, which increases possibility and stability to assess the prognosis of PDAC by the infiltration of immune cells. In support of these inspiring discoveries, there are quite a few researchers predicting the prognosis of PDAC by analyzing immune-related genes, lncRNA and miRNA. Nevertheless, it is infiltrating immune cells that perform the final function. The prognosis assessment method based on immune-related genes [such as 4-chemokine (9) and 15-gene immune, stromal, and proliferation gene signature strategies (10)] all rely on whether chemokines or their related receptors happen to be expressed on the surface of infiltrated immune cells or not. This indirect evaluation method limits the accuracy of its evaluation to a certain extent. Similarly, lncRNA (11) and miRNA (12) often have multi-target characteristics downstream that may perform different functions under different inflammation backgrounds and lack sufficient predictive stability. Recently, there has been a rise in the molecular subtype study of PDAC combining TME and transcriptome analysis, for example, basal-like and activated stromal subtype proposed by Moffitt et al. (13) and pure basal-like and stromal activated subtypes identified by Puleo's group (14), which all suggest that the prognosis of patients is bad. On the other hand, the panel of genes applied for molecular subtype has not yet reached a consensus. All data are based on the transcriptome, which increases the cost of prediction strategy, and complex algorithms hinder its clinical application. Therefore, we urgently need to find a more innovative, simpler, and more economical evaluation model to measure the prognosis of PDAC based on infiltrating immune cells directly.
With the promotion and popularization of second-generation sequencing technology, we can obtain sequencing data from multiple omics and multiple database centers, and we have a more integrated and accurate understanding of the PDAC immune microenvironment landscape. xCell is a very popular R package, launched in 2017 by Aran's group, to infer the proportion of cell types in bulk RNA samples (15,16). In our study, we use the xCell algorithm to predict 64 types of nontumor cell in PDAC. Among them, there are 36 kinds of immune cells, of which only two are related to prognosis, CD4 + naïve T cell (CD4 + T N ) and common lymphoid progenitor (CLP), where CD4 + T N was positively correlated with prognosis, while CLP was contrasting. Then we combined them to construct Cox model, which can significantly improve the sensitivity of prognostic evaluation. The risk-score obtained by the model is negatively correlated with the prognosis. The feasibility has also been further confirmed in the Gene Expression Omnibus (GEO) database and our clinical PDAC patients, Peking2020 cohort. Lastly, by combining transcriptome and whole-exome sequencing analysis of PDAC dataset from The Cancer Genome Atlas (TCGA), it is found that CCL19 may be a crucial factor affecting the model.

Datasets and Clinical Information
PDAC level 3 expression profiles of RNA sequencing data [fragments per kilobase of transcript per million mapped reads (FPKM)], somatic mutation, copy number variation (CNV) data, and the corresponding clinical information of 143 PDAC patients were downloaded from TCGA (https://tcga-data.nci.nih.gov/tcga/). The patients who lack survival time or who had survival time of less than 30 days were removed, and only 136 PDACs were finally enrolled. Validation cohort, i.e., GSE71729, GSE102238, and GSE57495, was extracted from GEO datasets (https://www.ncbi. nlm.nih.gov/gds/). The gene expression information of normal pancreatic tissue (FPKM) was derived from Genotype-Tissue Expression (GTEx) via University of California, Santa Cruz, Xena (https://xenabrowser.net/datapages/?cohort=GTEX). All RNA sequencing data were normalized and transformed into log2(FPKM + 1); then unexpressed or extremely low-expressed genes in most of the samples (average log2(FPKM + 1) < 0.01) were filtered out. Matched clinical validation cohort, namely, Peking2020 Cohort, with 40 cases in total, originated from Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, wherein written informed consent was obtained from all patients ( Table 1).

Estimation of Tumor-Infiltrating Immune Cells
The xCell online tool (https://xcell.ucsf.edu/) was used to perform enrichment analysis of tumor-infiltrating immune cell (TIIC) level, Immune Score, Stroma Score, and Micro environment score based on gene expression data. The data analysis results of TCGA database were downloaded directly from the xCell portal website, while the GEO data needed to use the "sva" package in the R software 4.0.0 to remove the batch differences, and then we uploaded the standardized data to the xCell website for analysis. According to the median of the level of TIIC, 136 cases of PDAC patients in TCGA were divided into the high and low TIIC groups. Survival analysis in R software 4.0.0 was used to determine the cell types with prognostic value, and then the GEO datasets and Peking2020 Cohort were used for verification. Immunotherapy sensitivity was predicted through the ImmuneCellAI online perform (http://bioinfo.life.hust.edu.cn/ImmuCellAI#!/) based on gene expression data.

Development of a Risk-Score
The risk-score of each patient was calculated based on the survival-related level of TIIC multiplied by the multivariate Cox regression coefficient (see the following formula for details), and the patients were divided into the high risk-score group and low risk-score group based on the median value of risk-score. The Kaplan-Meier (KM) survival analysis was performed on the two groups using log-rank test; receiver operating characteristic (ROC) curves of 1-year, 2-year, and 3-year survival prediction were drawn; then the area under the ROC curve (AUC) value was calculated.

Analysis of Differentially Expressed Genes
Differentially expressed genes (DEGs) of the high risk-score group and low risk-score group were analyzed by R software 4.0.0 "edgeR" package. The screening threshold was set to log2 (fold change) >1 and false discovery rate (FDR) <0.05, and then we used "pheatmap" package and "plot" package in R software 4.0.0 to draw heatmaps and volcano maps, respectively.

Protein-Protein Interaction Network
Through Search Tool for the Retrieval of Interacting proteins (STRING, https://string-db.org) database, the protein-protein interaction (PPI) network of DEGs, namely, PPI, was established. The comprehensive score>0.4 was used as the criterion for the existence of interaction. Then Cytoscape software was used to rebuild the PPI network, and the Cytoscape MCODE plug-in was used to find core clusters that located densely connected areas and calculated the connectivity of network nodes.

Tumor Mutation Burden Analysis
Tumor mutation burden (TMB) was defined as the total number of somatic mutations (including single-nucleotide variants (SNVs), missense mutation, and insertion-deletion (INDELs) per million bases in the coding region of exons). Somatic mutation data were analyzed using VarScan2, and then TMB was calculated as the total number of somatic mutations/all bases, and the unit was mutations/ Mb. In this study, the "Maftools" package in R software 4.0.0 was used to calculate the TMB of each sample.

Copy Number Variation Analysis
The copy number information of PDAC from TCGA was annotated according to the position information of grch38 genome; set the normal copy change as 0, then single copy amplification as 1, double copy even multiple copy amplification as 2, single copy deletion as −1, and double copy or multiple copy deletion as −2. Then, the chi-square test was used to select the differential copy number between the two groups in the high-risk group and low-risk groups (p < 0.05); the Kruskal test was used to determine the copy number differential genes related to gene expression; and the Gene Set Enrichment Analysis (GSEA) was used for functional analysis.
We also calculated the number of positive cells from at least three high magnification fields chosen at random as well as their mean intensities. As described above, samples with <25% positive expression were scored 1, samples within the expression range of 25%-50% scored 2, samples within the expression range of 50%-75% scored 3, and samples with expression ≥75% scored 4.
The final CCL19 expression was determined by multiplying the intensity score with the positive expression value.

Statistical Analysis
Statistical analysis was performed using GraphPad Prism 8.0 and R software 4.0.0. Measurement data were expressed as mean ± standard deviation, using Student's t-test. Counting data were expressed as percentage (%), using chi-square test. Survival curve was drawn by the KM method, using log-rank test. The correlation between immune cells was evaluated by Spearman's correlation coefficient. p < 0.05 or FDR < 0.05 was considered statistically significant in all tests.

A Cox Model Constructed by Combining the Intratumoral Infiltration of CD4 + T N and Common Lymphoid Progenitor Can Improve the Sensitivity of Predicting Overall Survival in The Cancer Genome Atlas Pancreatic Ductal Adenocarcinoma Discovery Cohort
Based on the xCell algorithm, the type and level of immune cell infiltration in 136 PDAC cases were evaluated, and a total of 36 immune cells were predicted. Then KM survival analysis showed that only CD4 + naïve T cell (CD4 + T N ), CLP, and CD4 + Th2 cell (Th2) were closely related to survival, where CD4 + T N was positively correlated with survival (p < 0.001, Figure 1A), while CLP and CD4 + Th2 cell were negatively correlated with survival (p = 0.009, Figure 1B; p = 0.022, Supplementary Figure S1A). Furthermore, combined with clinical factors for univariate and multivariate analyses, only CD4 + T N and CLP could be regarded as independent risk factors, but the p-value was only slightly significant (Figures 1E, F and Table 2). However, when combining CD4 + T N and CLP to construct a proportional hazards model (Cox model), its clinical significance could be dramatically improved (p < 0.001, Table 2). Based on the individual point of risk-score calculated through the Cox model, PDAC patients were divided into the low risk-score and high riskscore groups (median set as cutoff). KM curves confirmed that the high risk-score group had much worse prognosis than the low risk-score group (p < 0.001, Figure 1C). Additionally, the riskscore could estimate OS of patients much sensitively, as the AUC was 0.713, 0.667, and 0.609 for 1-year, 2-year, and 3-year OS, respectively, and the discrimination index (C-index) was 0.706 ( Figure 1D). Furthermore, this result was also verified in GSE71729 dataset (p = 0.027, Supplementary Figures S1B, C).

Performance of the Risk-Score to Identify Molecular Subtypes and Immunotherapy Sensitivity
In succession, through the scatter plot of patient risk assessment, we found that risk-score was positively correlated with CLP level and negatively correlated with CD4 + T N level ( Figure 2A). Meanwhile, with the increase of risk-score, the survival time of patients was gradually shortened, and the death events increased. Consistent with that, the risk-score was significantly higher in the short-term survival than the long-term survival ( Figure 2B, p = 0.01). Interestingly, this risk-score also happened to be negatively correlated with the Immune Score, Stroma Score, and Microenvironment score. Previous studies had shown that molecular classification of pancreatic cancer was closely related to prognosis, but its complex algorithm has also become one of the important factors limiting its clinical application. Therefore, we were trying to explore whether the risk-score of our model could well identify high-risk molecular subtypes and replace its prognostic evaluation value. According to Moffitt's algorithm (13) for molecular classification of PDAC, we found that with the increase of the risk-score, the proportion of basal-like and activated stroma corresponding to the molecular subtype with the worst prognosis increased. Also, through ROC curve analysis, it was found that Cox model was also highly sensitive to the prediction of basal-like molecular subtype (AUC = 0.728, Figure 2C). Furthermore, it was more sensitive to predict the joint basal-like and activated stroma molecular subtype (AUC = 0.733, Figure 2C). In addition, we also performed molecular classification of PDAC according to Puleo's algorithm (14), and the results were consistent with those of the former: with the increase of risk-score, the proportion of pure basallike and stroma-activated molecular subtypes associated with the worst prognosis increased as well as high prediction sensitivity (AUC = 0.696, Figure 2C). Further studies suggested that risk-score could well predict the cytotoxicity of T cells in the microenvironment of PDAC and the sensitivity of PDAC patients to immune checkpoint inhibitor. The risk-score of the high cytotoxicity group and immunotherapy sensitive group was lower (p < 0.001 and p < 0.001, respectively; Figures 2D, E). Moreover, PDAC patients in whom molecular targeted therapy is applicable had a lower riskscore value (p = 0.004, Figure 2F). Besides that, the risk-score was positively correlated with the degree of dysplasia and tumor size; that is, the higher the degree of dysplasia, or the larger the tumor size, the higher the risk-score (p = 0.005, p = 0.045, Figures 2G, H). Meanwhile, patients with higher risk-score usually survived with tumors, and surgery to remove the tumor totally was more difficult (p = 0.019, Figure 2I).

Risk-Score for Predicting Overall Survival Was Validated in Clinical Pancreatic Ductal Adenocarcinoma Patients
In order to further verify that the Cox model we constructed could indeed assess the prognosis of PDAC patients, we selected 40 PDAC patient samples, Peking2020 cohort, and followed the above algorithm to analyze the level of intratumoral infiltration of CD4 + T N (CD3 + , CD4 + , and CD45RA + ) and CLP (CD127 + and CD135 + ) to calculate the risk-score of each case by multiplex immunofluorescence staining. As expected, the results of Peking2020 cohort were consistent with the above results. Contrary to the PDAC with high risk-score, the level of CD4 + T N was higher in the low risk-score group ( Figure 3A), while the level of CLP tended to be lower ( Figure 3B). Also, compared with the short-term survival group, the level of CD4 + T N in the long-term survival group was also higher ( Figure 3C), while the level of CLP and risk-score was in reverse (Figures 3D, E). Furthermore, we conducted KM survival analysis and found that CD4 + T N was positively correlated with PDAC patient survival (p < 0.001, Figure 3F), but CLP was negatively correlated with patient survival (p < 0.001, Figure 3G), and after the joint analysis, the significance of the negative correlation between risk-score and prognosis was further confirmed (p < 0.001, Figure 3H). In addition, the same as the results of PDAC dataset from TCGA, the larger the tumor volume, the higher the risk-score (p < 0.001, Supplementary Figure S1D). Interestingly, PDAC in the tail of the pancreas had a higher risk-score than the head of the pancreas (p = 0.0316, Supplementary Figure S1E).

Gene Expression and Function Profiled in High and Low Risk-Score Groups
The above studies confirmed the role of risk-score in the prognostic evaluation of PDAC patients, so we tried to explore its potential molecular mechanism. First of all, principal component analysis (PCA) was conducted on the transcriptome data of healthy pancreas (combined with pancreas tissue from GTEx and paracancerous tissue of PDAC dataset from TCGA) and tumor tissue (PDAC dataset from TCGA). The results showed that the mRNA expression was significantly different between normal tissues and cancer tissues, while there was no significant difference between the low risk-score and high riskscore groups (Supplementary Figures S1F, G). However, through the analysis of heatmaps and volcano maps, we found that there were still 172 DEGs between the low risk-score and high risk-score groups, among which 162 genes were downregulated and 10 genes were upregulated in the high riskscore group (Figures 4A, B). Then, GO and KEGG analysis of these 172 DEGs revealed that they were mainly concentrated in immune-related functions and pathways, for example, Complement activation by classical pathway, Immunoglobulinmediated immune response, and Cytokine-cytokine receptor interaction (Figures 4C, D). Furthermore, the 172 DEGs were analyzed for PPI. Then, the most important core interaction protein network was screened out through MCODE algorithm. It was found that CCL19 occupied one of the most core positions and interacted closely with other genes ( Figure 4E). Moreover, the function of core interaction protein was mainly related to regulation of lymphocyte activation and differentiation, chemokine-mediated signaling pathway, etc. ( Figure 4F).

Whole-Exome Sequencing Data Showed the Difference Between High and Low Risk-Score Groups
Since previous studies had shown that TMB could affect tumor immune infiltration (18,19), we further analyzed the  TMB in PDAC. It was found that PDAC was a "cold" tumor with low TMB as reported in the literature (2, 3), but compared with the low risk-score, the high risk-score group still has higher TMB, such as conventional TP53 and KRAS mutation ( Figures 5A, B), However, these mutations did not make the corresponding gene expression different at the transcriptome level, so it did not affect the expression of DEGs obtained by transcriptome analysis. Compared with TMB, CNV between patients was more constant, and the frequency of variation was higher, which was one of the key events leading to tumor development (20)(21)(22). However, there were fewer studies on the tumor immune microenvironment. Therefore, we tried to find the main CNV that could affect the value of risk-score. By analyzing the wholeexome data of CNV, it was found that compared with the low risk-score group, the high risk-score group had more characteristic chromosome amplification and deletion ( Figures 5C, D), such as 8q, 9p, 17, 18, 19q, and 20q chromosome amplification, and 2q, 6, 7, and 8p chromosome deletion. At the same time, a total of 1,593 differential CNVs related to expression were generated, mainly enriched in immune-related functions, such as activation and positive regulation of immune response, participating in innate immune response and antigen receptor-mediated signaling pathway ( Figure 5E) by GSEA, where there was a total of six intersection genes with DEGs generated by the transcriptome, that is, CA9, TNNT1, FABP4, CCL21, LY86, and CCL19 ( Figure 5F).

CCL19 Was a Potential Factor Affecting Risk-Score
Compared with those in the normal pancreas (GTEx + TCGA), CA9, TNNT1, CCL21, LY86, and CCL19 were all expressed higher in tumor tissues, except for FABP4, which was expressed lower in a tumor (p < 0.001, Figures 6A-F). In contrast with that in the low risk-score group, the expression of CA9 and TNNT1 increased in the high risk-score group, while FABP4, CCL21, LY86, and CCL19 were contrasting (p < 0.001, Figures 6G-L). Except for FABP4, their expression was positively correlated with the change of copy number. Unlike in the low risk-score group, in the high risk-score group, CA9 showed a variety of copy number changes; that is, in addition to the normal double copy number, there was also single copy number deletion, single copy number, or even multiple copy number amplification (p = 0.023, Figure 6M). TNNT1 and FABP4 mainly showed single copy number amplification (p = 0.002, p = 0.005, Figures 6N, O), while CCL21, LY86, and CCL19 mainly showed single copy deletion in the high risk-score group (p < 0.001, p < 0.001, and p = 0.001, Figures 6P-R). Furthermore, we analyzed the correlation between these six genes and risk-score, and we found that except for CA9 ( Figure 6S), the other five genes were significantly correlated with risk-score. Among them, only the expression of TNNT1 was positively correlated with risk-score (R = 0.492, p < 0.001, Figure 6T), FABP4, CCL21, LY86, and CCL19 were negatively correlated with  Figures 6U-X), while there was no statistical correlation between CA and risk-score. Furthermore, we analyzed the survival of these five genes and found that only TNNT1, CCL21, and CCL19 were closely related to survival, of which TNNT1 was negatively related to survival (p = 0.026, Figure 7A), while CCL21 (p = 0.038, Figure 7B) and CCL19 (p = 0.019, Figure 7C) were positively correlated with prognosis. Then univariate and multivariate analyses combined with clinical factors found that only CCL19 was positively correlated with patient survival (p = 0.003, Figure 7D) and could play a role as an independent prognostic factor (p = 0.005, Figure 7E). It was also verified in GSE57495 dataset concurrently (p = 0.034, Figure 7F). Similarly, in Peking2020 cohort, we also found that CCL19 expression was significantly lower in the high risk-score group ( Figure 7G). Compared with the long-termsurvival patients, CCL19 also showed a downward trend in the short-term-survival patients group (p = 0.008, Figure 7H). Further survival analysis showed that CCL19 was positively correlated with the prognosis (p < 0.001, Figure 7I), consistent with the above result, which suggested that CCL19 may be an important factor affecting CD4 + T N and CLP infiltration.

DISCUSSION
Using tumor immune microenvironment to assess tumor prognosis has attracted increasing attention, such as breast cancer, lung cancer, gastric cancer, and glioma. However, there was no consensus in PDAC. Unlike traditional prognostic evaluation model, we boldly and innovatively combined infiltrated immune cells to construct Cox model for prognostic evaluation in PDAC. In order to eliminate the bias caused by a single xCell algorithm, we used the TIMER and MCPCOUNTER algorithms to predict infiltrating immune cells. Due to the differences in the characteristic genes and calculation methods selected by different algorithms, they cannot accurately predict 36 different immune cells as finely as xCell, which impelled us to choose cell populations close to or containing CD4 + T N and/or CLP as much as possible. We found that the CD4 + T cells predicted by the TIMER algorithm and T cells predicted by MCPCOUNTER algorithm were positively correlated with the prognosis (Supplementary Figures S1H, I), which indirectly supported the reliability of our results by xCell algorithm. Furthermore, the feasibility of our model to predict prognosis has also been deeply verified in GEO datasets and our Peking2020 cohort. We found that the risk-score originated Cox model was not affected by the demographic characteristics of age, gender, and race; nor was it interfered by conventional high-risk factors for PDAC such as smoking, drinking, diabetes, chronic pancreatitis, and family history of tumors. Besides, there was no significant correlation between the TNM stage, tumor stage, surgical method, and risk-score (Supplementary Figure  S2). At the same time, as for the commonly used bio-markers of PDAC, such as CA199, CA125, and carcinoembryonic antigen (CEA), there was no significant difference among them in the expression of tumor tissues between the high risk-score and low risk-score groups, which suggested that these traditional indicators to predict prognosis had a certain lag. Based on the above, the risk-score could be used as an independent prognostic factor. Nevertheless, the risk-score had nothing to do with the anatomical site of tumor occurrence in TCGA database, but tumor in the tail of the pancreas had a higher risk-score than the head of the pancreas in our Peking2020 cohort unexpectedly, where the reasons needed to be further explored. In addition, chemotherapy was closely related to the prognosis of patients. In our study, we found that chemotherapy can significantly improve the prognosis of patients, whether in the high riskscore or low risk-score group (Supplementary Figures S3A-C).
There was no significant difference in risk-score among patients who received or did not receive chemotherapy (Supplementary Figure S3D). However, the high risk-score group always had a worse prognosis than the low risk-score group regardless of whether the patient received chemotherapy or not (Supplementary Figures S3E, F). Interestingly, our risk-score has higher prognostic prediction sensitivity in the patient without chemotherapy group than with chemotherapy group (without chemotherapy vs. with chemotherapy AUC (1 year) = 0.743 vs. 0.657, without chemotherapy vs. with chemotherapy AUC (3 years) = 0.782 vs. 565; Supplementary Figures S3G, H).
It was well known that PDAC was a "cold" tumor, although the TMB of the high risk-score group was higher than that of the low risk-score; based on the background of low TMB overall, this difference may not be sufficient to assess its sensitivity to immunotherapy. Fortunately, the risk-score still worked well to predict the sensitivity to immunotherapy, and the patients possessing higher cytotoxicity or using molecular targeted therapy all tended to have a lower risk-score. Unfortunately, due to the current lack of immunotherapy sensitivity indicators, PDAC patients seldom received immunotherapy. Therefore, although we can predict the sensitivity of immunotherapy, we failed to collect enough patients who had received immunotherapy, and the detailed impact of risk-score on the prognostic performance in immunotherapy cohort cannot be accurately assessed. The population must be further expanded, and more complete information must be collected for in-depth exploration. CNV played a fundamental role in tumorigenesis and development. It could be used not only as a prognostic indicator but also as a therapeutic target; for example, Herceptin and Iressa were developed for HER2 and EGFR copy number amplification, respectively. In our research, we analyzed the differences of CNV to explore the potential mechanism of why our model can evaluate the prognosis. Finally, CCL19 was found from 1,593 differential copy numbers that can be transcribed. Copy number deletion or low expression of CCL19 may be key to potential impact on risk-score.
Previous studies had shown that CCL19 (23,24) was one of the most significant chemokines, produced by dendritic cells (DCs), lymphocytes, and some non-lymphocytes, including tumor cells, consistent with our results. Moreover, CCL19 could specifically bind to its receptor, chemokine receptor 7 (CCR7), a class A subtype 7-span transmembrane G-protein coupled receptor, which was expressed on DCs, natural killer (NK) cells, macrophages, and lymphocytes including CD4 + T N and CLP. Therefore, we speculated that PDAC may promote the infiltration of immune cells, especially CD4 + T N and CLP, by secreting CCL19. The effect of CD4 + T N on tumor prognosis has not been unified. Some research groups had pointed out that CD4 + T N often indicated poor prognosis of breast cancer (25). However, some scholars hold the view that CD4 + T N level in smoking lung cancer is correlated with favorable prognosis (26). These contradictory conclusions may depend on which kind of cells CD4 + T N was finally differentiated into. In our study, CD4 + T N supported favorable prognosis, also possibly because CD4 + T N differentiated into CD4 + Th1 cells induced by the unique microenvironment of PDAC, which further enhanced CD8 + T cell cytotoxicity synergistically and conditioned B cells to promote the secretion of corresponding antibodies in line with our prediction in GO annotation. Although CLP may differentiate into various types of lymphocytes, it may be induced by the unique microenvironment of PDAC, whose outcome was similar to that of common myeloid progenitor (CMP) differentiated into MDSC (27), and finally differentiated into lymphoid-derived suppressor cells, which is correlated with poor prognosis.
In addition, apart from CCL19, chemokines such as CCL2, CCL3, CCL4, CCL5, CCL8, CXCL13, CCL18, and CCL21 (Supplementary Figure S4A) in the low risk-score group also increased significantly, suggesting that there were more other immune cell infiltration. Although these were not essential factors affecting our model and the prognosis, at least they reflected that the overall immune status of the low risk-score group was much better than that of the high risk-score group, on the other hand. Simultaneously, in our study, it was also found that the low risk-score group had more expression of T cellactivated co-stimulatory factors (such as ICOSLG, CD40LG, TNFRSF9, TNFRSF4, TNFSF18, TNFRSF18, ICOS, CD28, CD27, and CD40) and co-inhibitory factors (such as CTLA4, PDCDL1G2, PDCD1, VSIR, LAG3, HAVCR2, and TIGIT) (Supplementary Figures S4B, C), suggesting that PDAC TME was not a simple binary state of activation or inactivation, which may be related to the asynchronous activation and inactivation of infiltrating immune cells. At the same time, each molecule was expressed continuously rather than in isolation during the process from activation to inactivation.
Although studies around PDAC traditional molecular subtype have paid attention to the indispensable role of TME and made great efforts to predict the prognosis of pancreatic cancer, there was a limitation for these findings in clinical transformation in that the genes applied to prediction had not yet reach consensus, as well as the need for complex algorithms and transcriptome sequencing to classify them. Instead, our model was mainly based on xCell and Cox algorithms; not only that the calculation was more concise, but also the high-risk molecular subtypes can be distinguished well. What is more, besides evaluation through transcriptome sequencing data, our model can also use economical multiplex immunofluorescence staining to achieve prognostic evaluation alternately. Additionally, previous studies had also suggested that high perineural invasion was correlated with poor prognosis of PDAC (28,29), whereas high neural density tended to have a better prognosis (30). Consistent with the latter, in our model, the activation of nerve fibers mainly occurred in the low risk-score group by GSEA ( Supplementary  Figures S5A-D), which may drop a hint that more neuroimmune cell units (NICUs) (31), that is, the co-localization of nerve fibers with immune cells, may be formed in the low riskscore group. As it happened, NICUs and their interaction can inhibit tumor progression by driving tissue protection and immune regulation, which further supported the prominent position of our model from another dimension.
In future clinical work, we can also learn from this research idea by analyzing transcriptome sequencing data and then using the xCell algorithm to predict the level of CD4 + T N and CLP infiltration, or using multiplex immunofluorescence staining to evaluate the degree of CD4 + T N and CLP infiltration per ×100 magnification field and then using the algorithm of our model for reference to calculate the risk-score and establish a complete reference interval of prognostic evaluation scoring based on our model. However, due to the low detection rate of early PDAC, it was mostly in the advanced stage when the opportunity for surgery was lost, which was a huge obstacle for us in collecting the early-or late-stage samples. In TCGA PDAC datasets, there were only 11 cases with stage I and only three cases with stage IV in PDAC (136 cases in total). Similarly, there were only four cases with stage I and only one case with stage IV in our Peking2020 cohort (40 cases in total). This led to the lack of sufficient early-and late-stage data in our model for training and validation, which may limit the accuracy of our model in evaluating the prognosis of such patients and reduce the sensitivity to predict tumor stage to some degree. In the future, it was expected to further expand population data, especially the early-and late-stage patients, for verification and to promote the clinical application of our model.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Ethics Committee of the Cancer Institute (Hospital), CAMS, and PUMC (17-168/1424). The patients/ participants provided their written informed consent to participate in this study.   (T1-T2 vs. T3-T4),  different tumor stage group (stage I, stage II vs. stage III-IV), different surgery performed type group (no whipple vs whipple), different family history of cancer group (no vs. yes), respectively. The pathologic TNM staging system was based on seventh edition of the American Joint Committee on Cancer staging system in TCGA PDAC dataset.

AUTHOR CONTRIBUTIONS
Supplementary Figure S3 | Interaction between chemotherapy on risk-score in TCGA PDAC dataset. (A-C) OS in the patients with Chemotherapy vs. without Chemotherapy in all-patient, high risk-score group and low risk-score group from TCGA PDAC dataset depicted by KM plots respectively. (D) Violin plot for analysis risk-score value between patients with Chemotherapy and without Chemotherapy group in TCGA PDAC dataset. (E-F) OS in high risk-score vs. low risk-score patients with chemotherapy or without chemotherapy from TCGA PDAC dataset depicted by KM plots, respectively. (G-H) ROC curves to depict the accuracy of risk-score in identifying poor OS of patients with chemotherapy or without chemotherapy in TCGA PDAC discovery cohort for 1-year and 3-year, respectively. KM, Kaplan Meier survival analysis; Chemotherapy, the patients with chemotherapy, no Chemotherapy, the patients without chemotherapy; ns, no statistical significance.
Supplementary Figure S5 | Differential CNV function annotation and prediction by GSEA enrichment analysis in TCGA PDAC dataset. (A) Bar chart showing the top 10 CC-GO terms of differential CNV for high risk-score vs. low risk-score. (B, C) GSEA-based GO analysis-enrichment plots of representative gene sets: axon part, neuron projection enrichment in low risk-score, respectively. (D) GSEA-based KEGG-enrichment plots of representative gene sets from activated pathway: neurotrohin signaling pathway in low risk-score. In the GSEA plots, the vertical axis represents enrichment score. The enrichment score increased with the number of enriched genes and vice versa. CNV, copy-number variation; GSEA, Gene Set Enrichment Analysis; ES, enrichment score; NES, normalized enrichment score; GO, Gene Ontology; CC, Cellular Component; KEGG, Kyoto Encyclopedia of Genes and Genomes.