Immunity-Related Gene Signature Identifies Subtypes Benefitting From Adjuvant Chemotherapy or Potentially Responding to PD1/PD-L1 Blockage in Pancreatic Cancer

Tumor microenvironment comprises of a variety of cell types, which is quite complex and involved in chemotherapy and immune checkpoint blockage resistance. In order to explore the mechanisms involved in tumor immune microenvironment in pancreatic ductal adenocarcinoma (PDAC), we first constructed an immunity-related 18-gene signature using The Cancer Genome Atlas (TCGA) PDAC project data. Then we applied the 18-gene signature to divide PDAC patients into low score and high score groups. Patients in high score group showed inferior prognosis, which was validated in another four independent cohorts, including Ruijin cohort. High score group showed significant enrichment of pathways involved in cell division and cell cycle especially in G1/S phase transition. In high score group, IHC analysis revealed higher levels of the proliferative indexes of Ki67 and PCNA than that in low score group. Prognostic analysis confirmed that patients in high score group could benefit from the gemcitabine-based adjuvant chemotherapy. In low score group, the programmed cell death 1 ligand 1(PD-L1) (+) cases showed worse prognosis but higher T cell infiltration than PD-L1(−) cases. Our immunity-related 18-gene signature could effectively predict PDAC prognosis, and it might be a practical predictive tool to identify PDAC subtype benefitting from gemcitabine-based adjuvant chemotherapy or potentially responding to PD1/PD-L1 blockade therapy.


INTRODUCTION
Pancreatic cancer is a fatal malignancy with extremely poor prognosis (Sodir et al., 2020), accounting for an estimated 57,600 new cases and 47,050 deaths annually (Siegel et al., 2020). Owing to its anatomical and pathological features, pancreatic cancer occurs occultly and grows rapidly. Most patients have lost their chance of surgery when diagnosed (Strobel et al., 2019). Although significant improvement has been achieved in pancreatic cancer treatment, the 5-year survival rate of pancreatic ductal adenocarcinoma (PDAC) is still rather low .
Immunity is the essential component of the tumor microenvironment, which plays a pivotal role in tumor initiation, progression, and metastasis (Qian and Pollard, 2010;Berraondo et al., 2016;Keren et al., 2018). Each immune subtype harbors different functions and can be used to predict the states of tumors (Mollaoglu et al., 2018). In the previous study, researchers showed that interfering with immune conditions by targeting specific molecules could suppress tumor progression and improve the effectiveness of chemotherapy (Galluzzi et al., 2015). Among the specific molecules, those mainly expressed on the cell membrane could maintain self-tolerance and modulate immune responses by triggering immunosuppressive signaling pathways, defining them as immune checkpoints (Wykes and Lewin, 2018). Novel therapy targeting immune checkpoints, such as programmed cell death 1 (PD1)/programmed cell death 1 ligand 1 (PD-L1), cytotoxic T-lymphocyte associated protein 4 (CTLA4) and T-cell membrane protein 3 (TIM3) have made breakthrough in many types of cancer treatment (Sharma and Allison, 2015;Topalian et al., 2016). However, only a fraction of patients could acquire significant effect from immune checkpoint blockade (Zappasodi et al., 2018). PDAC patients responding little to immune checkpoints blockade may be due to the insensitive immune microenvironment . In a phase II trial, the advanced PDACs could not benefit from anti-CTLA4 (Ipilimumab) treatment (Royal et al., 2010). In addition, anti-PD-L1 immunotherapy also showed limited effects in PDAC in a phase I trial (Brahmer et al., 2012). Many studies have revealed some factors involved in the sensitivity of immune checkpoint blockade, such as the expression of targeted molecules, microsatellite instability, mutation load, and immune infiltration (Catalano et al., 2019;Mandal et al., 2019). For example, the tumors with CD3(+) and CD8(+) T cell infiltration were sensitive to anti-PD-L1 and anti-CTLA4 immunotherapy (Foy et al., 2017;Wu et al., 2018). Thus, exploring the tumor immune microenvironment could help us make the personalized treatment of immune checkpoint blockade in PDAC.
The method of combining immunity-related genes with clinical characteristics has been applied to predict prognosis, recurrence and response to therapy in multiple cancers (Wu et al., 2018). In a previous study, researchers constructed a stromal immunotype to predict patients' overall survival (OS) and diseases free survival (DFS) in bladder cancer (Foy et al., 2017). In gastric cancer, a least absolute shrinkage and selection operator (LASSO) Cox regression model was established to predict patients' prognosis and identify the subgroup suitable for adjuvant chemotherapy (Cheadle et al., 2003). These studies revealed the detailed mechanisms of the tumor immune microenvironment and provided significant indications for clinical therapy.
In the present study, we aimed to develop an immunityrelated gene signature based on LASSO Cox regression to predict patients' outcomes using data from TCGA, which was further validated in another four independent cohorts from the International Cancer Genome Consortium (ICGC), the Gene Expression Omnibus (GEO), and Ruijin cohort. Furthermore, the signature could be used to identify PDAC subtype benefitting from gemcitabine-based adjuvant chemotherapy or possibly responding to anti-PD1/PD-L1 immunotherapy.

PDAC Datasets Extraction and Data Processing
Raw count data and corresponding clinical characteristics of 173 patients with PDAC were downloaded from the TCGA database 1 . ICGC CA (Canada) raw RNA sequencing dataset and corresponding clinical characteristics of 115 patients, and the ICGC AU (Australia) gene expression microarray dataset and corresponding clinical characteristics of 68 patients were downloaded from the ICGC database 2 . The GEO dataset GSE57495 and corresponding clinical characteristics of 63 patients were downloaded from GEO database 3 . RNA-sequencing data was normalized by transcript per million (TPM), and gene expression was calculated as log2 (TPM + 1). For the gene expression microarray, if one gene was detected using multiple probes, the probe with the maximum average used. Then, gene expression values were normalized by log2 transformation. To remove the batch effects of different platform in this study, the expression values of each gene were z-score transformed.

Identification of Immunity-Associated Genes
By interrogating the ImmPort database 4 , we obtained a total of 1,811 immunity-related genes. After matching with genes in the TCGA database, 1,308 immunity-related genes were used for further analysis.

Patients
A total of 101 fresh frozen primary PDAC samples were utilized as the validation cohort (Ruijin cohort), which were collected consecutively at Ruijin Hospital from April 2012 to November 2014. The inclusion and exclusion criteria were as follows: (1) Pathologically diagnosed as having pancreatic ductal adenocarcinoma (PDAC) without any other types of pancreatic cancer; (2) without other FIGURE 1 | Flowchart presenting the process of construction of immunity-related 18-gene signature, validation and clinical significance relevance in this study. GAPDH (encoding glyceraldehyde-3-phosphate dehydrogenase) was applied as an internal control. Gene expression was normalized as − CT = − (CT gene -CT GAPDH). Finally, expression values of each gene were z-score transformed (Cheadle et al., 2003).

Identification and Validation of the Immunity-Related Gene Signature
First, we used the TCGA PDAC dataset as the training cohort. Using univariate Cox analysis, immunity-related genes that were significantly associated with good or poor prognosis were identified using the "Survival" package in the R software 5 . The prognostic genes were displayed by a forest plot using "forestplot" in R. Then, LASSO Cox regression was performed to generate a prognostic signature with the immunity-related genes using the "glmnet" package in R (Liu et al., 2019). Finally, a nomogram based on 18 immunity-related genes was plotted using the "rms" package in R, and the corresponding formula was extracted using the "nomogramEx" package in R. According the risk score and survival status of every patient, optimal cutoff values were set and all patients could be assigned to a high score or low score group.

Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathway Analyses
By comparing the differentially expressed genes (DEGs) between the high score or low score group using the "limma" package in R, we obtained the significantly changed genes between the two groups. The gene names were imported into the Metascape database, and GO and KEGG pathway analyses were performed. The significantly enriched pathways were displayed in a histogram (p < 0.01).

Gene Set Variation Analysis (GSVA) and Gene Set Enrichment Analysis (GSEA)
The GSVA analysis was performed using "GSVA" package in R. The gene sets using in GSVA analysis were downloaded from GSEA molecular database 6 . T cell immunoreaction and PD1related immunosuppressive pathways were extracted and used for GSVA analysis. As for GSEA analysis (Mootha et al., 2003;Subramanian et al., 2005), we construct a gemcitabine resistance related gene set by comparing the DEGs between the gemcitabine resistance population and main tumor cell population in GSE 36563 dataset and using genes with p value < 0.05 and | Log2 fold change | > 1. The gemcitabine resistance related gene set was shown in Supplementary Table 4.

Statistical Analysis
All statistical analyses were performed using GraphPad Prism 7.0 (GraphPad Software Inc., La Jolla, CA, United States). The Kaplan-Meier method was used to plot survival curves, and the FIGURE 3 | The distribution and Kaplan-Meier survival curves of the immunity-related 18-gene signature depends on the patients' risk score. The risk score for all patients with PDAC were plotted in ascending order and marked as low score (blue) or high score (red). The survival status of the patients is marked as dead (red) and alive (blue). The proportion of patients who died in the high score group is obviously higher compared with that in the low score group.  log-rank test was used to assess intergroup differences. Difference between two groups was assessed using Student's t-test. The χ 2 test was carried out to analyze the relationship between the 18-gene signature and clinical characteristics. P < 0.05 were considered statistically significant.

Characteristics of Patients With PDAC
The schematic flow chart of this study is shown in Figure 1.
Our study totally enrolled 520 patients diagnosed with PDAC. Among them, the TCGA (n = 173) patients with PDAC were assigned as the training cohort, and the ICGC AU (n = 68), ICGC CA (n = 115) and GSE57495 (n = 63) PDAC patients were assigned as validation cohorts. Furthermore, patients (n = 101) with PDAC from Ruijin hospital, Shanghai Jiao Tong University School of Medicine were used as another independent validation cohort. The characteristics of the patients from the TCGA database and Ruijin cohort are shown in Tables 1, 2, respectively.

Identification and Validation of Immunity-Associated Gene Signature in PDAC
We matched immunity-related genes from the ImmPort database with genes in the TCGA database and 1308 immunity-related genes (Supplementary Table 1) were obtained for further analysis. By performing univariate Cox regression analysis, 53 genes whose p values were less than 0.001 were chosen as candidates (Figure 2A). We then used the LASSO Cox regression algorithm and a total of 18 genes were identified to develop a risk score classifier ( Figure 2B). A nomogram of the 18 genes predicting the probability of overall survival is shown in Figure 2C. The formula of risk score calculation was illustrated in Supplementary Table 2. We used a receiver operating characteristic (ROC) curve ( Figure 2D) to test the effectiveness and determine the best cutoff value of the risk scores. The area under the curve (AUC) was 0.733, and 20.91 was identified as the optimal cutoff value. We divided patients into high score and low score group using the 18 immunityrelated classifier and plotted the Kaplan-Meier survival curves. We found that patients in the low score group had more favorable prognosis than patients in the high score group, both in training and validation cohorts. In TCGA database (Figure 3A), patients in the low score group had significantly longer OS than patients in the high score group (HR = 4.48 (2.93-6.86), p < 0.0001). In the other four validation cohorts (Figures 3B-E), patients in the high score group also had inferior outcomes compared with those in the low score group  We also found that the immune signature was significantly related to TNM stage and histological grade in the TCGA database (Table 1), and similar results were observed in Ruijin cohort ( Table 2).

The Immunity-Related 18-Gene Signature Predicts Patients' Response to Adjuvant Chemotherapy in PDAC
To interrogate potential signaling pathways involving in our immune signature in PDAC, we compared differentially expressed genes between high score group and low score group, and selected the genes with fold-change > 1.2 and p.adjust value < 0.05 to perform the GO and KEGG enrichment analyses (Figures 5A-C and Supplementary Figures 2A-C). Similar enriched pathways from the 3 datasets were displayed in Figures 5D,E. Cell cycle, cell division, p53 signaling pathways were significantly enriched in high score group, which indicated that it harbored higher proliferative potential. It is well known that gemcitabine exerts antitumor activity mainly by targeting G1/S phase. In GO term of G1/S phase transition, 40 genes were significantly enriched in high score group in TCGA cohort, and similar results were observed in another three independent cohorts ( Figure 6A). Then H&E staining was used to detect the proliferation related markers (Ki67 and PCNA) in 81 PDAC patients (low score group, n = 31; high score group, n = 50) in FIGURE 7 | Benefit of gemcitabine-based chemotherapy among patients in high and low score groups. (A) The gemcitabine resistance related gene set was significantly enriched in the low score group of TCGA cohort by GSEA analysis. The green curve represents enrichment score (ES). The highest point was used to represent ES value in gemcitabine resistance pathway. The ES value indicates the correlation between gene and gemcitabine resistance pathway. The black bar codes represent genes in gemcitabine resistance pathway and these genes were ordered according to their expression levels. The left end or right end genes are leading edge subset strongly contributed to ES value. The bottom numbers represent the order of expression levels in the genome from highest to lowest. (B) The 198 gemcitabine resistance-related genes were significantly attenuated in high score group of TCGA cohort (p < 0.05). Patients from the (C) TCGA database (n = 173) and (D) Ruijin cohort (n = 101) are divided into four groups (TCGA, Low score non-chemotherapy, n = 17, Low score chemotherapy, n = 36, high score non-chemotherapy, n = 45, high score chemotherapy, n = 75; Ruijin, Low score non-chemotherapy, n = 23, Low score chemotherapy, n = 15, high score non-chemotherapy, n = 34, high score chemotherapy, n = 29) and then Kaplan-Meier survival curves were plotted.
Ruijin cohort. The IHC analysis verified that higher levels of Ki67 and PCNA in high score group than that in low score group (Figures 6B,C). These results indicated that patients in high score group might be more responsive to chemotherapy targeting cell cycle (Venkatasubbarao et al., 2013).
As the first-line medicine for chemotherapy, gemcitabine has proven its effectiveness in PDAC (Fuchs et al., 2015). Cells possessing a vigorous proliferation ability are more sensitive to gemcitabine therapy (Zheng et al., 2015). Thus, we suspected that patients might be more sensitive to gemcitabine-based chemotherapy in high score group. We further constructed a gemcitabine resistance related gene set by using GSE36563 dataset (Van den Broeck et al., 2012), which contained 484 genes upregulated in gemcitabine resistance group, named as GEMCITABINE_RESISTANCE_UP. GSEA analysis showed that the gene set was significantly enriched in low score group in TCGA cohort (Figure 7A), and heatmap was further plotted to display 198 significantly attenuated genes of this gene set in high score group (Figure 7B and Supplementary Table 5). Prognosis analysis showed that the patients in high score group could benefit from chemotherapy in both the TCGA and Ruijin cohorts [ Figures 7C,D  and (B) Ruijin cohort (High score: PD-L1 high, n = 52; PD-L1 low, n = 11. Low score: PD-L1 high, n = 12; PD-L1 low, n = 26). (C) The relative RNA level of immune infiltration markers in the TCGA database, including PD-L1, CD3D, CD3E, CD3G, CD247, CD4, CD8A, CD8B. (D) Heatmap showed that higher GSVA scores of T cell immunoreaction pathways and PD1-related immunosuppressive pathways in PD-L1(+) than in PD-L1(-) cases in low score group. (E) IHC profile and corresponding scores of PD-L1, CD3 and CD8 in four groups (High score_PD-L1 high, n = 40; High score_PD-L1 low, n = 10; Low score_PD-L1 high, n = 10; Low score_PD-L1 high, n = 21). *p < 0.05; **p < 0.01; ***p < 0.001; n.s., p > 0.05.

Identification of a Lymphocyte-Infiltrated PD-L1(+) PDAC Subgroup Associated With Poor Prognosis
The immunotherapy targeting PD-L1 showed impressive antitumor activity. In this study, PD-L1 expression could successfully predict the patients' outcome in PDAC in the TCGA data ( Figure 8A), but not in the other four interdependent cohorts ( Figure 8B, Supplementary Figures 3A-C). However, we found that low expression of PD-L1 was significantly related to longer OS in low score group in TCGA and Ruijin cohorts. In TCGA cohort (Figure 8A), patients with high PD-L1 expression had poorer OS than patients with low PD-L1 expression in low score group, with an HR of 3.79 (1.14-12.59, p = 0.033). In the Ruijin cohort (Figure 8B), patients with low PD-L1 expression had favorable outcomes compared with patients with high PD-L1in low score group, with an HR against low PD-L1 expression of 2.93 (1.01-8.45, p = 0.0092). Although the results did not show significant differences, we discovered the same trend in low score group of the ICGC AU cohort and GSE 57495 cohort (Supplementary Figures 3B,C). However, in high score group, PD-L1 expression showed the reverse function in the ICGC CA and ICGC AU cohorts [ICGC CA: PD-L1 high vs. PD-L1 low, p = 0.035, HR = 0.58(0.32-1.03); ICGC AU: PD-L1 high vs. PD-L1 low, p = 0.032, HR = 0.37(0.09-1.52)]. The mRNA level of PD-L1 expression did not show significant differences between low score and high score groups (Supplementary Figure 3D). In TCGA database, the CD3D, CD3E, CD3G, CD247, CD4, CD8A, and CD8B RNA levels were the highest in Low score_PD-L1 high group ( Figure 8C). Moreover, we observed not only the activation of T cell immunoreaction pathways in low score group, but also the activation of PD1-related immunosuppressive pathways by GSVA analysis (Figure 8D). In the Ruijin cohort, the IHC result showed higher numbers of CD3(+) T cells and CD8(+) T cells in PD-L1(+) cases in low score group (Figure 8E, all p < 0.05). The above results indicated that PD-L1(+) cases in low score group displayed stronger immune infiltration and may be suitable for PD1/PD-L1 blockade immunotherapy.
We also performed prognosis of other immune checkpoints in PDAC, such as IDO1, LAG3, TIM3, and CTLA4. As shown in Figures 9A-D Figure 9B, p = 0.15, HR = 0.25(0.07-0.97)]. High level of IDO1 also tended to be related to short OS in low score group [ Figure 9A, IDO1: p = 0.074, HR = 2.82(0.84-9.50)], but did not show significant difference, which may be due to small sample size (n = 53). Because TIM3 and CTLA4 expression could significantly predict OS in low score group in TCGA database, we further performed the above analyses of TIM3 and CTLA4 in ICGC CA, ICGC AU and GSE57495 databases (Supplementary Figure 4). Although TIM3 and CTLA4 expression showed similar trends generally, but did not have the uniformity as PD-L1.

DISCUSSION
The immune microenvironment plays a pivotal role in tumor progression (Biswas, 2015;Chen and Mellman, 2017). Brooks used a 54-gene hypoxia-immune signature to identify subtype associated with prognosis and potentially responsive to targeted immunotherapies in head and neck cancer (Brooks et al., 2019). In breast cancer, Oshi et al. (2020) also generated a 4-gene score to determine the subtype which response to neoadjuvant chemotherapy and show with high expression of T cell exhaustion marker genes. The clinical significance of immune classification has been demonstrated in many diseases (Brooks et al., 2019;Li et al., 2019;Oshi et al., 2020); therefore, we attempted to explore the relationship between immunityrelated genes and the clinical significance of PDAC in this study. Kandimalla et al. (2020) constructed an immune, stromal and proliferation (ISP) gene signature to predict patient outcome in PDAC. They obtained a 15-gene signature from a 170 ISPrelated genes panel. However, we focused on the immune-related genes and acquired the 18 immunity-related gene signature from an 1,811 immunity-related genes panel. IL32 appeared both in ISP related signature and our immune signature, indicating that IL32 could be a pivotal molecule in PDAC progression. ISP signature and our 18 immunity-related gene signature could serve as independent predictors to predict OS of the PDAC patients, and our immune signature also showed stronger prediction capability than TNM stage. Furthermore, our results further indicated that patients in high score group PDACs could benefit from gemcitabine-based chemotherapy and patients in low score group may potentially response to PD1/PD-L1 blockade.
In this study, we firstly developed an 18 immunity-related gene classifier to predict patient outcome from TCGA data. PDAC patients with high immune score ≥20.91 were defined as high score group, and the others were defined as low score group. High score group showed shorter overall survival, which was validated in another four independent cohorts (a total of 347 patients). GO and KEGG pathway analysis revealed that the GO terms of cell cycle, cell and mitotic nuclear division were significantly enriched in high score group, such as G1/S phase transition. Besides, high score group also displayed higher levels of the proliferative indexes of Ki67 and PCNA and low expression of the gemcitabine resistance related genes. As is well known, gemcitabine mainly blocks cell cycle G1/S phase transition to exert anti-tumor activity (Fu et al., 2018). The above results indicated that high score group may be the candidate who benefited from gemcitabine-based chemotherapy. However, patients in low score group receiving chemotherapy showed no benefits and even worse prognosis.
Our signature construction was based on immunity; therefore, we explored the relationship between the subtypes and the response to immunotherapy. PD-L1 expressed on the surface of tumor cells could recognize and bind PD1 expressed on effector T cells, which transmit inhibitory immune signals to induce T cell apoptosis and inhibit T cell activation and proliferation (Gibney et al., 2016;Emens, 2018). However, the anti-PD1/PD-L1 agents in PDAC have limited efficacy (Lu et al., 2017;Mace et al., 2018). In addition to establishing effective combination therapy, it is also necessary to identify subtypes suitable for anti-PD1/PD-L1 immunotherapy (Topalian et al., 2015). In this study, patients with low PD-L1 expression suggested a favorable prognosis in low score group in two cohorts (TCGA and Ruijin). In the ICGC CA data, the result was different from other four datasets, perhaps because of microdissection which resulted in removal of immune component. In the ICGC AU and GSE 57495 cohorts, the lack of statistical differences might have been caused by the small sample size. However, in high score group, PD-L1 did not perform this function or even showed the reverse results. Furthermore, the RNA levels of CD3, CD4, and CD8 in low score group with high PD-L1 expression showed the highest level among the four subtypes in the TCGA database. GSVA analysis also indicated T cell immunoreaction activation and PD1-related immunosuppression in PD-L1(+) cases in low score group. These results were also supported by the IHC results in the Ruijin cohort, in which CD3(+), CD8(+) T cells displayed a distinct enrichment in PD-L1(+) low score group. The infiltration of CD3(+), CD8(+) T cells was evidence that could be used to predict a patient's response to anti-PD1/PD-L1 immunotherapy (Ribas et al., 2017;Danilova et al., 2019). In this study, the proportion of patients in low score group with high PD-L1 expression was approximately 20%. This finding was consistent with previous studies that only 10-30% of patients respond to anti-PD1/PD-L1 therapy (Page et al., 2014;Ott et al., 2019). Taken together, the patients with high PD-L1 expression in low score group might be a potential subtype suitable for anti-PD1/PD-L1 immunotherapy in PDAC.
In conclusion, by analyzing genomic data from the TCGA database, we constructed an immunity-related signature to divide PDACs into two subtypes: low score and high score groups. Patients in high score group showed inferior prognosis, but could benefit from gemcitabine-based chemotherapy. Furthermore, results also indicated that PD-L1(+) tumors in low score group might respond to PD1/PD-L1 blockade therapy.

DATA AVAILABILITY STATEMENT
Raw count data and corresponding clinical characteristics of 173 patients with PDAC were downloaded from the TCGA database (https://cancergenome.nih.gov/, Project ID: TCGA-PAAD). ICGC CA (Canada) raw RNA sequencing dataset and corresponding clinical characteristics of 115 patients, and the ICGC AU (Australia) gene expression microarray dataset and corresponding clinical characteristics of 68 patients were downloaded from the ICGC database (https://icgc.org/, Code: PACA-CA and PACA-AU). The GEO dataset GSE57495 and corresponding clinical characteristics of 63 patients were downloaded from GEO database (https://www.ncbi.nlm.nih.gov/geo/).

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Ethical Committee of Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
HQ, HzL, and JX developed the concept and designed the experiments. HQ wrote the manuscript. HzL, JX, FL, XT, and MS performed the experiments. WW and XL collected the patients' information. HwL, HC, CP, ZX, XD, and BS performed the surgery. LJ, ZX, and XD helped to interpret results. BS coordinated the study and corrected the manuscript.