Identification and Validation of Immune Molecular Subtypes in Pancreatic Ductal Adenocarcinoma: Implications for Prognosis and Immunotherapy

Background Pancreatic ductal adenocarcinoma (PDAC) remains treatment refractory. Immunotherapy has achieved success in the treatment of multiple malignancies. However, the efficacy of immunotherapy in PDAC is limited by a lack of promising biomarkers. In this research, we aimed to identify robust immune molecular subtypes of PDAC to facilitate prognosis prediction and patient selection for immunotherapy. Methods A training cohort of 149 PDAC samples from The Cancer Genome Atlas (TCGA) with mRNA expression data was analyzed. By means of non-negative matrix factorization (NMF), we virtually dissected the immune-related signals from bulk gene expression data. Detailed immunogenomic and survival analyses of the immune molecular subtypes were conducted to determine their biological and clinical relevance. Validation was performed in five independent datasets on a total of 615 samples. Results Approximately 31% of PDAC samples (46/149) had higher immune cell infiltration, more active immune cytolytic activity, higher activation of the interferon pathway, a higher tumor mutational burden (TMB), and fewer copy number alterations (CNAs) than the other samples (all P < 0.001). This new molecular subtype was named Immune Class, which served as an independent favorable prognostic factor for overall survival (hazard ratio, 0.56; 95% confidence interval, 0.33-0.97). Immune Class in cooperation with previously reported tumor and stroma classifications had a cumulative effect on PDAC prognostic stratification. Moreover, programmed cell death-1 (PD-1) inhibitors showed potential efficacy for Immune Class (P = 0.04). The robustness of our immune molecular subtypes was further verified in the validation cohort. Conclusions By capturing immune-related signals in the PDAC tumor microenvironment, we reveal a novel molecular subtype, Immune Class. Immune Class serves as an independent favorable prognostic factor for overall survival in PDAC patients.


INTRODUCTION
Pancreatic ductal adenocarcinoma (PDAC) is a fatal disease with a 5-year overall survival rate of approximately 9% (1,2). Surgical resection remains the only curative method, and evolving adjuvant chemotherapy regimens have shown limited efficacy in improving long-term outcomes (3). The emergence of immune checkpoint blockade therapies has shed light on the treatments of PDAC patients. However, according to recent clinical trials, only a minority of PDAC patients benefit from immunotherapy (4). Moreover, although various predictive biomarkers for immunotherapy have been developed for solid tumors, none have proven their efficacy in PDAC patients (5)(6)(7). Thus, it is necessary to develop new biomarkers for immunotherapy with particular emphasis on PDAC.
Recently, a series of other molecular subtype classifications based on high-throughput expression profiling data were developed for PDAC, with the aim of prognostic stratification. These classifications included a three-subtype classification [classical, quasimesenchymal (QM), and exocrine-like] based on microdissected samples (8) and a four-subtype classification [squamous, pancreatic progenitor, immunogenic, and aberrantly differentiated endocrine exocrine (ADEX)] based on bulk samples (9). However, given that these classifications were developed using different sources of samples and different techniques, their prognostic values need to be validated in more datasets (10,11). Moreover, these classifications were based on tumor cells rather than microenvironment compartments of PDAC. The tumor microenvironment of PDAC comprises an admixture of multiple cell types within the extracellular matrix, including cancerassociated fibroblasts (CAFs) and various kinds of immune cells (12,13). As a robust method for unsupervised class discovery, nonnegative matrix factorization (NMF) has shown the capability to detect context-dependent molecular signals from these distinct compartments (14). Moffitt et al. used NMF to virtually microdissect bulk RNA sequencing data and identify tumor subtype classification (classical and basal-like) and stroma subtype classification (normal and activated) (15). Nevertheless, few molecular classifications have focused on the immune compartments of PDAC or been correlated with the treatment efficacy of immunotherapy. Thus, further research should focus on identifying immune molecular subtypes based on the virtual microdissection of immune-related signals within the tumor microenvironment to facilitate prognostic stratification and discover effective biomarkers for immunotherapy (16).
The current research applied the NMF method to virtually dissect immune-related signals from gene expression data of PDAC samples. We identified an immune molecular subtype, Immune Class, based on the tumor immune microenvironment of PDAC. Detailed immunogenomic profiling showed that Immune Class had several characteristics, including more active immune cytolytic activity, higher immune cell infiltration, higher tumor mutational burden (TMB), and lower copy number alterations (CNAs) than Nonimmune Class. Immune Class also served as an independent favorable prognostic factor for overall survival. In addition, our immune molecular subtypes might complement the current classification systems and facilitate personalized immunotherapy. Our findings provide a comprehensive understanding of the immunological landscape in PDAC and deserve further validation in PDAC patients treated with immunotherapy.

PDAC Datasets and Samples
We analyzed the mRNA gene expression data from a cohort of 764 patients with pancreatic cancers (Figure 1). A cohort of 149 PDAC samples from The Cancer Genome Atlas (TCGA) was used as the training cohort. Public level 3 HT-seq fragments per kilobase of exon model per million mapped fragments (FPKM) data were downloaded from the TCGA data port (https://portal.gdc.cancer. gov/, accessed September 16, 2020) (17). The corresponding clinicopathological information was collected at the same time, including survival time, survival status, age, sex, TNM stage, histological grade, and etc. Only primary PDAC tumor samples were included for downstream analyses. A total of 5000 genes with the highest median expression in the samples were retained for NMF analysis. Five publicly available datasets with a total of 615 samples were further used for validation (series: GSE85916, GSE71729, GSE57495, GSE21501, and E-MTAB-17951). In these datasets, gene expression was profiled using different microarray platforms ([HG-U219] Affymetrix Human Genome U219 Array, Rosetta/Merck Human RSTA Custom Affymetrix 2.0 microarray, Agilent-014850 Whole Human Genome Microarray 4x44K G4112F, and Illumina human WG6 BeadChip v3). The gene expression data were downloaded from the Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/) and Array Express (www.ebi. ac.uk/arraryexpress). The probe IDs were transformed to gene symbols with GEO platform files, and probes mapping to the same gene symbol were collapsed by mean expression. Samples were normalized using to each other using quantile normalization with the R package "limma" (18). The key information of these five datasets is summarized in Supplementary Table 1.

Virtual Dissection of Immune-Related Gene Expression Signals and Unsupervised Class Discovery
The tumor, stromal, and immune cell gene expression signals in the training TCGA-PDAC cohort were deconvoluted and virtually microdissected using NMF as previously described (14,15) with the R package "NMF" (19). k = 9 was selected as the number of factorization factors because it could achieve high cophenetic coefficients and provide effective deconvolution of the TCGA-PDAC cohort in terms of immune-related signals. The coefficient matrix and basis matrix are displayed in Supplementary Table 2. We applied a previously reported immune enrichment score calculated by single-sample gene set enrichment analysis (ssGSEA) (20) to obtain the immune-related NMF factor. The nine NMF factors were compared to the immune enrichment score, and the NMF factor with the highest level of immune enrichment score was subsequently referred to as the immune factor (Supplementary Figure 1A). The top-ranked genes by their loadings of the immune factor are herein referred to as exemplar genes (Supplementary Table 3). The top 100 exemplar genes of the immune factor were subjected to Gene Ontology (GO) enrichment and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses (21)(22)(23). A false FIGURE 1 | Flow chart of the study. A total of 764 PDAC samples were included in this study. A training cohort (TCGA-PDAC) including 149 samples was virtually microdissected to identify immune molecular subtypes. Detailed immunogenomic characterization was performed between the two immune molecular subtypes. The Immune Classifier was adopted in five independent validation datasets to validate the immune molecular subtypes. discovery rate (FDR)-adjusted P-value < 0.05 was considered as the criterion for significant enrichment for GO terms and KEGG pathways. Subsequently, the top 50 exemplar genes of the immune factor were selected for unsupervised consensus clustering to divide the TCGA-PDAC cohort into two immune molecular subtypes: Immune Class and Nonimmune Class ( Figures 2C, 3). Finally, the subtypes obtained from consensus clustering were further refined with the R package "Random Forest" (24); thus, the final Immune Class and Nonimmune Class were identified. The multidimensional scaling (MDS) plot and confusion matrix are displayed in Figures 2A, B. Sets of previously reported immune-and stroma-related gene  expression signatures representing immune cell infiltration and  immune responses are summarized in Supplementary Table 4. We applied these gene sets to characterize the immune molecular subtypes using ssGSEA and nearest template prediction (NTP). ssGSEA was conducted using the R package "GSVA" (25), and NTP was conducted using an R version of the GenePattern module NTP (26,27). The Estimation of Stromal and Immune A B C FIGURE 2 | Immune molecular subtypes determined after consensus clustering and random forest refinement. (A) Consensus clustering of the TCGA-PDAC cohort using the exemplar genes was further refined using random forest as illustrated in the multidimensional scaling (MDS) plot. Purple dots indicated patients classified as Immune Class according to consensus clustering, and blue dots indicates patients classified as Nonimmune Class. (B) Heatmap of confusion matrix exhibited the correction rate of random forest classifier compared with consensus clustering. (C) Heatmap displayed the overlap between NMF factors, immune factor weight, immune enrichment score, consensus clustering using exemplar genes, and immune molecular subtypes. The expression of exemplar genes was illustrated at the bottom heatmap. cells in MAlignant Tumor tissues using Expression data (ESTIMATE) algorithm was used to calculate the immune enrichment score and the stromal enrichment score with the R package "ESTIMATE" (20). Differential analyses between the two immune molecular subtypes revealed differentially expressed genes (DEGs). Linear models were used to identify DEGs with the R package "limma" (18). An FDR < 0.05 combined with |log2(fold change)| ≥ 1.5 was set as the threshold for DEG identification. The DEGs reaching the threshold were considered the Immune Classifier. Genes whose expression was higher in Immune Class than in Nonimmune Class were considered as classifier genes of Immune Class, and vice versa. Gene set enrichment analysis (GSEA) was utilized to identify differential enrichment of the immune-related pathways, infiltrating immune cells, and immune responses (28). GSEA was performed using GSEA software version 4.1.0 from the Broad Institute, and gene sets (as gene symbols version 7.2) were downloaded from the Molecular Signatures Database (http://software.broadinstitute.org/gsea/msigdb). The normalized enrichment score (NES) was obtained by 1000 permutations. Gene sets with a p-value < 0.05 and an FDR < 0.25 were considered significantly enriched. All heatmaps were generated using the R package "pheatmap".

Correlations of the Immune Molecular Subtypes With Immunogenomic Features
CNA data generated by GISTIC2.0 were obtained from the Broad Institute GDAC FireBrower (http://firebrowse.org). Arm-level amplifications and deletions were defined by gains or loss in each chromosome. The numbers of both arm-and focal-level CNAs were compared between Immune Class and Nonimmune Class using the Wilcoxon rank-sum test. The mutation data of the TCGA-PDAC cohort were downloaded from the TCGA data portal (https://tcga-data.nci.nih.gov/tcga), and TMB was calculated as the number of nonsynonymous mutations per million bases. We used the MutSig2.0 approach (29) to identify and visualize significantly mutated genes (SMGs) with the R package "Maftools" (30), and mutations in known driver genes of PDAC and genes in the WNT/b-catenin pathway were visualized in an oncoplot. Illumina Infinium human methylation 450K array data was downloaded from UCSC Xena (http://xena.ucsc.edu/). Pathological tumor cellularity, ABSOLUTE purity, DNA hypermethylation purity, and the DNA methylation-estimated leukocyte fraction were obtained from a previous study on the genomic characterization of PDAC (31). Twenty-two subpopulations of tumor-infiltrating lymphocytes (TILs) were analyzed using the CIBERSORT algorithm (https://cibersort. stanford.edu/) in R (32). A list of immunomodulatory genes was obtained from a previous publication (33), and the mRNA expression profiles of these genes were compared between Immune Class and Nonimmune Class.

Combination of the Immune Molecular Subtypes and Other PDAC Molecular-Subtype Classifications
The correlations between our immune molecular subtypes (Immune Class and Nonimmune Class) and previously  (15). The tumor classification contained the classical and basal subtypes, whereas the stromal classification contained the activated and normal subtypes. We defined these four molecular subtype classifications in each sample from the TCGA-PDAC cohort using the published classifier genes. The distribution of the aforementioned classifications in immune molecular subtypes was compared using Fisher's exact test. A Sankey diagram was generated using the R package "networkD3". Cramer's V statistic was applied to measure the similarity between two categorical variates, herein different PDAC molecular subtype classifications. A Venn diagram comparing the classifier genes of different classifications was plotted using the R package "VennDiagram". The association between clinicopathologic characteristics and overall survival in the TCGA-PDAC cohort was analyzed using uni-and multivariate Cox proportional hazards (CoxPH) regression models. Kaplan-Meier survival analysis was employed to visualize the overall survival, and the log-rank test was used to compare differences among different curves. A forest plot was plotted using the R package "forest plot". PDAC molecular classifications in some recent studies were also reproduced in the TCGA-PDAC cohort. Hierarchal clustering was performed using the function "hclust" in R. Pancreatic adenocarcinoma molecular gradient was generated using the R package "pdacmolgrad". Consensus clustering was performed using the R package "ConsensusClusterPlus". Mutation signatures were downloaded from COSMIC (https://cancer.sanger.ac.uk/signatures/) and identified using the R package "deconstructSigs".

Validation of the Immune Molecular Subtypes in Independent External Datasets
The expression of a customized 795-gene NanoString panel in 32 patients receiving sequential cytotoxic T lymphocyte-associated antigen-4 (CTLA-4) inhibitors and programmed cell death 1 (PD-1) inhibitors was profiled in a previous study (34). Subclass mapping was performed via a bioinformatic approach to identify common subtypes between independent cohorts (35). The similarity of the expression of these genes between patients in the TCGA-PDAC cohort and immune checkpoint blockade responders was evaluated using subclass mapping in the GenePattern SubMap module. The Immune Classifier genes were used to predict immune molecular subtypes in five independent external validation datasets using NTP. Immunerelated gene signatures (Supplementary Table 4) further validated and characterized the presence of immune molecular subtypes in these validation datasets. Treatment response to immunotherapy was also predicted in the validation datasets using SubMap.

Statistical Analysis
All statistical analyses were conducted in R software (version 4.0.1) (http://www.r-project.org). Correlations between continuous variables and immune molecular subtypes were analyzed using Student's t-test and the Wilcoxon rank-sum test for normally distributed and nonnormally distributed data, respectively. Correlations between categorical variates and immune molecular subtypes were analyzed using the chi-square test or Fisher's exact test. Survival analysis, including CoxPH regression, Kaplan-Meier survival analysis and log-rank tests, was performed using the R packages "survival" and "survminer". A two-sided P value < 0.05 was considered statistically significant.

Identification of the Immune Molecular Subtypes Through Virtual Microdissection
With the aim of virtually microdissecting immune-related signals from bulk gene expression data, we performed unsupervised NMF analysis of 149 PDAC samples in the TCGA cohort (training cohort, Figure 1). Among the different expression patterns determined by NMF, one was correlated with a previously reported immune enrichment score reflecting the presence of infiltrating immune cells in tumor tissues (Supplementary Figure 1A) (20). Thus, this expression pattern was regarded as the immune NMF factor, and the top-ranked genes with the highest weight contributing to the immune NMF factor were considered as exemplar genes. Enrichment analyses of GO terms and KEGG pathways on exemplar genes provided additional evidence of immune-related functions and signaling (Supplementary Figures 1B, C and Supplementary  Table 5). For example, enriched biological processes included the response to interferon-gamma (IFN-g) and positive regulation of T cell activation. By utilizing consensus clustering on exemplar genes and random forest refinement (Figures 2A, B), immune molecular subtypes were identified and further referred to as "Immune Class" and "Nonimmune Class" ( Figure 2C). Immune Class accounted for 30.8% (46/149) of the training cohort and exhibited higher expression of exemplar genes and higher immune enrichment score than Nonimmune Class (Figures 2, 3). ssGSEA revealed significant enrichment of a series of gene sets associated with innate and adaptive immune cell subpopulations, including B cells, cytotoxic T cells, and natural killer cells (NK cells), in Immune Class (all P < 0.0001) (Figure 3). Significant enrichment of tumor-suppressing Th1 cells, not tumor-promoting Th2 cells (P = 0.38), was also observed in Immune Class (P = 1.5e-07). Similarly, we found enrichment of a proinflammatory M1 macrophage signature (P = 6.3e-04) rather than an antiinflammatory M2 macrophage signature (P = 0.61) in Immune Class. A six-gene IFN-g signature that was reported to induce programmed death ligand 1 (PD-L1) expression and predict the therapeutic efficacy of the PD-1 inhibitor pembrolizumab in head and neck squamous cell carcinoma (34) was also significantly enriched in Immune Class (P = 1.6e-04). Other signatures significantly enriched in Immune Class included tertiary lymphoid structure, immune cytolytic activity, WNT/b-catenin and transforming growth factor (TGF-b) pathway, and stromal enrichment score (all P < 0.0001). Class comparison analysis revealed 95 genes that were significantly overexpressed in Immune Class and 5 genes that were significantly overexpressed in Nonimmune Class (Supplementary Table 6). The Immune Classifier was further built based on the expression of this set of 100 genes (Supplementary Table 7). The Immune Classifier was mainly composed of immune-related genes, for example, B cell surface markers such as CD19, membrane spanning 4-domains A1 (MS4A1, CD20), CD79A and CD79B, and T cell surface markers such as CD2, CD3D, CD3E, and CD5. Several immunoglobulin genes were also overexpressed in Immune Class and included in the Immune Classifier, such as the Fc fragment of IgE receptor II (FCER2), Fc receptor-like 1/2/3 (FCRL1/2/3) and joining chain of multimeric IgA and IgM (JCHAIN). Furthermore, chemokine receptor and ligand genes, such as C-X-C motif chemokine ligand 12/13 (CXCL12/13), C-C motif chemokine ligand 19/21 (CCL19/ 21), and C-C motif chemokine receptor 4/7 (CCR4/7) were presented in the Immune Classifier. Granzyme genes (granzyme K/E, GZMK and GZME) were also overexpressed in Immune Class, indicating the cytotoxic activity of T cells and NK cells. Similarly, GSEA was employed to analyze the enrichment of immune cells, IFN-a and IFN-g responses, tumor necrosis factor-a (TNF-a) signaling, Janus kinase/signal transducer and activator of transcription (JAK/STAT) signaling, and WNT/b-catenin signaling (all p < 0.05 and FDR < 0.25, Supplementary Figure 2 and Supplementary Table 8).
In conclusion, by performing virtual microdissection in PDAC, we identified an immune molecular subtype named Immune Class and demonstrated the potential of Immune Class to capture signatures of immune cell infiltration, innate and adaptive immune responses, and immune-related pathways such as interferon signaling and WNT/b-catenin signaling.

Correlations Between the Immune Molecular Subtypes and Immunogenomic Characteristics
Several previous studies have correlated certain immunogenomic characteristics with immune cell infiltration and the antitumor immune response. In a recent study, the number of major histocompatibility complex (MHC) Class I-associated neoantigens and driver gene mutations reflected the cytolytic activity of local immune infiltration (35). In particular, a higher neoantigen load and more abundant CD8+ T cell infiltration stratified pancreatic cancer patients who survived longer survival and might guide the application of immunotherapies (36). It was also demonstrated that TMB and CNAs were associated with CD8+ T cell infiltration and immune cytolytic activity and served as independent predictive factors for the immune checkpoint blockade response (35,37,38). To further demonstrate the biological relevance of Immune Class, we carried out detailed immunogenomic profiling including CNAs, TMB, tumor neoantigens, TILs, etc. Immune cytolytic activity was higher in Immune Class than in Nonimmune Class (P = 4.2e-13), as were the immune enrichment score (P = 1.8e-14) and the methylationestimated leukocyte fraction (P = 9.3e-14) (Figure 3). For CNAs, it is worth noting that patients classified as Immune Class had relatively fewer both arm-level amplifications and deletions. In particular, Immune Class had a median of 0 arm-level amplifications (range 0-18) and 2 arm-level deletions (range 0-21) versus a median of 3 arm-level amplifications (range 0-23) and 7 armlevel deletions (range 0-22) in Nonimmune Class (P = 0.00012, and P = 1.9e-06 respectively, Figure 4A). It was also demonstrated that Immune Class harbored a median of 0 focal amplifications (range 0-5) and a median of 0 focal deletions (range 0 to 24), both of which were lower in Immune Class than in Nonimmune Class with a median of 0 focal amplifications (range 0-22) and a median of 9 focal deletions (range 0-25) (P = 6.8e-09, and P = 9.4e-09 respectively, Figure 4B). Notably, TMB (P = 8.7e-05, Figure 4C) but not neoantigen count (P = 0.37, Figure 4D) was higher in Immune Class than in Nonimmune Class. These results demonstrate that patients within Immune Class have several immunogenomic characteristics, such as higher immune cytolytic activity, a higher leukocyte fraction, a higher TMB, and fewer arm-and focal-level CNAs than patients within Nonimmune Class.
We next sought to correlate the immune molecular subtypes with TIL subpopulations, immunomodulatory gene expression, and mutations in known driver genes of PDAC and genes in the WNT/b-catenin pathway. After deploying the CIBERSORT approach, macrophages accounted for the highest proportion of infiltrating immune cells in PDAC (Supplementary Figure 3B). Immune Class exhibited higher infiltration of memory B cells, CD8 + T cells, gd T cells, activating NK cells, and activating dendritic cells (all P < 0.001, Figure 4E), which are critical in the adaptive and innate immune responses. In contrast, Nonimmune Class was enriched in M0 and M2 macrophages, yet no significant difference in M1 macrophages was observed. Moreover, we analyzed the expression of immunomodulatory genes, including both immunostimulatory and immunoinhibitory molecules that were critical for immunotherapy by supporting the immune response (39). The expression of immunomodulatory genes varied between immune molecular subtypes (Supplementary Figure 3A). The expression of immune checkpoint molecules such as CTLA4 and CD274 (PD-L1), as well as IFNG in IFN-g signaling, was higher in Immune Class. Furthermore, the immune molecular subtypes were correlated with mutations in known driver genes of PDAC and genes in the WNT/b-catenin pathway ( Figure 5). Immune Class had significantly fewer mutations in WNT/b-catenin pathway genes than Nonimmune Class (3/ (45×12) versus 19/(101×12), P = 0.079). Additionally, there were significantly fewer mutations in SMAD4 in Immune Class than in Nonimmune Class (7/45 versus 33/101, P = 0.032). Nevertheless, there was no difference in the mutation rates between Immune Class and Nonimmune Class in terms of other PDAC driver genes, such as KRAS, TP53, and CDKN2A. Altogether, these findings imply that immune molecular subtypes showed differences in TIL subpopulations, immunomodulatory gene expression, and certain oncogenic pathway mutations.

Correlations of the Immune Molecular Subtypes With Clinicopathological Characteristics and Survival Analyses
The clinicopathological characteristics of the TCGA-PDAC cohort were summarized and compared between Immune Class and Nonimmune Class (Supplementary Table 9).
Immune molecular subtypes were not associated with most clinicopathological characteristics, including age, gender, lymph node invasion, and distant metastasis. Nonetheless, Immune Class was more likely to be classified as stage T1/T2 (7/46 versus 14/103) and less likely to be classified as stage T3 (35/46 versus 89/103) (P = 0.035). Tumor purity was correlated with immune and stromal cell infiltration as well as immune cytolytic activity. Tumor purity could also confound the interpretation of genomic profiling and classifications based on bulk tumor samples (40,41). Low tumor purity was associated with Bailey's ADEX and immunogenic subtypes and might also serve as a prognostic factor (31,42,43). Thus, we collected pathologist-reviewed tumor cellularity data and adopted different tumor purity estimation methods in silico (Supplementary Table 10). ABSOLUTE purity, evaluated by the whole-exome sequencing algorithm, ranged from 9.0% to 89.0% (median, 33.5%) in the whole cohort. The ABSOLUTE purity of Immune Class [median (range), 18.5% (9.0%-70.0%) was significantly lower than that of Nonimmune Class [median (range), 38% (10%-89%)] ( Figure 6A, P = 6.4e-10) (44). Tumor purity was also estimated using DNA methylation profiles and ranged from 13.5% to 68.1% (median, 40.2%) in the whole cohort and was strongly correlated with ABSOLUTE purity (Spearman's r = 0.87, p < 1e-15, Figure 6B) (31). A binary purity classification based on regional copy number burden indicated that Immune Class was more likely to be classified as low purity ( Figure 6C, P < 0.001) (31). In summary, Immune Class had a lower tumor grade and lower tumor purity than Nonimmune Class.
We next sought to explore the prognostic values of the immune molecular subtypes along with other clinicopathological characteristics ( Table 1). In univariable Cox regression analyses, immune molecular subtypes, together with age, lymph node invasion status, and histological grade, were significantly associated with overall survival. Immune Class was a favorable prognostic factor, with a hazard ratio (HR) of 0.56 [95% confidence interval (95% CI) 0.33-0.95, P = 0.033]. The median survival time of Immune Class was 34.8 months (95% CI = 16.4-not reached), which was longer than that of Nonimmune Class (17.9 months, 95% CI = 15.1-21.4). Kaplan-Meier curves also showed that Immune Class was associated with better overall survival ( Figure 6D). Moreover, the HR of age was 1.02 (95% CI = 1.01-1.05, P = 0.036), and the HR of lymph node invasion was 1.78 (95% CI = 1.02-3.09, P = 0.008). In addition, the HR of poor versus moderate histological grade was 1.73 (95% CI = 1.09-2.75, P = 0.02). These four prognostic factors were presented in a forest plot ( Figure 6E) and subsequently examined using multivariable Cox regression analysis. Older age [HR (95% CI) = 1.03 (1.01-1.05), P = 0.026] remained an independent unfavorable prognostic factor, whereas Immune Class remained an independent favorable prognostic factor for overall survival [HR (95% CI) = 0.56 (0.33-0.97), P = 0.037] ( Table 1). Additionally, various metrics of tumor purity and immune infiltration, including ABSOLUTE/methylation purity, the immune enrichment score, and the methylation-estimated leukocyte fraction, were not prognostic ( Table 1). Our results indicated that Immune Class could serve as an independent prognostic factor in PDAC.
To further explore the relationship with other transcriptomebased PDAC classifications, we included two another PDAC classifications, pancreatic adenocarcinoma molecular gradient (PAMG) (45) and Dijk's 4-tier classification (46). The PAMG was a summary of all previous epithelial molecular classification of PDAC, while Dijk's 4-tier classification intend to build a unifying transcriptome-based classifications. We reproduced these two tumor epithelial classifications in the TCGA-PDAC cohort. Immune Class had lower molecular gradient compared to Nonimmune Class (t-test, P = 5.1e-5, Figure 3). As for Dijk's 4-tier classification, we found that the Immune Class had a higher proportion of secretory subtypes compared to Nonimmune Class (Fisher's exact P = 0.002, Figure 3). Several PDAC classifications based on genome and methylome were also built recently, including mutation signature subtypes (47), homologous recombination deficiency (HRD) (48), and methylation clusters (49). We also reproduced mutation signature subtypes and methylation clusters in the TCGA-PDAC cohort. The mutation signature subtypes used NMF and hierarchical clustering to define four major subtypes. Nevertheless, we failed to discover correlation between the immune molecular subtypes and mutation signature subtypes (Fisher's exact P = 0.91, Figure 3). We observed a higher proportion of Methylation Cluster2 (Methylation low / IFNsignature high ) in the Immune Class (Fisher's exact P = 3.2e-7, Figure 3), which was consistent with our findings that Immune Class had higher enrichment of IFN-a and IFN-g signaling. In conclusion, these results highlighted the potential mechanisms of DNA methylation in modulating tumor immune microenvironment. And the correlation between immune molecular subtypes and alterations in genome and methylome needs further research.

Combination of the Immune Molecular Subtypes With PDAC Tumor and Stroma Classifications for Prognostic Stratification
Four molecular classifications of PDAC based on gene expression profiles that were biologically and clinically relevant in different sets of patients showed concordance to some extent (8,9,15). We evaluated the correlations of the immune molecular subtypes with these classifications and further explored the integration of immune molecular subtypes with tumor and stroma classifications in prognostic stratification.   Figure 4A).
In conclusion, it was demonstrated that Immune Class was correlated with a higher proportion of the QM/ADEX subtypes, immunogenic subtype, and normal stroma subtype. The cumulative effect of different classifications based on the tumor epithelium, stromal, and immune cells on prognostic stratification was next explored (10). In univariable Cox regression analyses, Moffitt's stroma classification, instead of Moffitt's tumor, Bailey's and Collison's classifications, had prognostic value in the TCGA-PDAC cohort (Supplementary Table 12). The normal stroma subtype was associated with significantly longer overall survival than the other stroma subtypes, with an HR of 0.46 (95% CI = 0.24-0.93, P = 0.03) (Supplementary Figure 4B). Integration of Moffitt's tumor/stroma classification in survival analyses did not hamper the prognostic value of the immune molecular subtypes ( Figure 7G and Supplementary Figures 4C, D). Patients within Immune Class and the classical tumor subtype had the longest median survival time of 34.8 months, whereas patients within Nonimmune Class and the basal tumor subtype had the shortest median survival time of 12.9 months (log-rank P = 0.009, Figure 7G). Since the difference between the activated stroma subtype and the absent stroma subtype was not significant for overall survival (HR [95% CI], 0.85 [0.52-1.41], P = 0.54), we combined these two stroma subtypes into other stroma subtypes and compared them with the normal stroma subtype. After integrating the immune molecular subtypes and Moffitt's stroma subtypes, we found that patients within Immune Class and the normal stroma subtype had the best survival rate, whereas patients within the Nonimmune Class and the other stroma subtypes had the worst survival rate (P = 0.012, Supplementary Figure 4C). Finally, we combined the immune molecular subtypes with tumor and stroma classifications for prognostic stratification. Patients within Immune Class and the classical stroma and basal tumor subtypes had the best overall survival rate (P = 0.024, Supplementary Figure 4D). Together, these results showed that the combination of immune molecular subtypes with tumor and/(or) stromal subtypes achieved a cumulative effect on PDAC prognosis prediction.

Validation in Independent Datasets
The presence of immune molecular subtypes was further evaluated in five independent datasets using NTP analyses with the 100 gene-expression-based Immune Classifier (n = 615, Figure 1 and Supplementary Table 1). Gene expression profiling of the validation datasets was conducted with different microarray platforms (Illumina, Affymetrix, or Agilent Gene chip systems) and in different types of tissue material (flash frozen or formalin fixed paraffin embedded).
The proportion of patients classified as Immune Class showed consistency among the validation datasets, with an average of 36.4% (range 30.0%-42.8%) ( Figure 8A and Supplementary  Figures 6-9). Patients in validation cohort GSE57495 were allocated to Immune Class at a higher frequency of 42.8%, potentially due to the different microarray platforms used (Custom Affymetrix 2.0 microarray). Overall, the immune molecular subtypes were successfully reproduced in the validation datasets regardless of the platform and type of tumor tissue used.

Exploration of Potential Immunotherapy Response
The ability of the immune molecular subtypes to predict immunotherapy response was explored using subclass mapping analysis. We assessed the similarity of immune-related gene expression profiles between the TCGA-PDAC cohort and a cohort of 32 melanoma patients receiving sequential CTLA-4 and PD-1 inhibitors ( Figures 8B, C) (37,50). Our results showed similarities between patients within Immune Class and melanoma patients responding to PD-1 checkpoint inhibitors (Bonferroni corrected p-value = 0.04). The similarity of immune-related gene expression profiles between Immune Class and immunotherapy responders was also shown in the validation cohorts ( Supplementary Figures 6-9).
To further explore this similarity, we included a cohort of 65 patients with non-small cell lung cancer (NSCLC), head and neck squamous cell carcinoma (HNSCC) and melanoma who were treated with PD-1 inhibitors (51). Significant similarity between Immune Class and PD-1 inhibitor responder was observed in the TCGA-PDAC cohort and the E-MTAB-17951 validation cohort (Supplementary Figures 5A,  B, E, F). Thus, we showed the discrepant responses of immunotherapy in two immune molecular subtypes, which needs to be strengthened in PDAC patients receiving immune checkpoint inhibitors.

DISCUSSION
Immunotherapy, especially immune checkpoint inhibitors, has emerged as a new era of cancer treatment. Nevertheless, immune checkpoint inhibitors could only benefit a minority of PDAC patients (12). The limited clinical benefit of immune checkpoint inhibitors achieved in PDAC patients necessitates the identification of suitable PDAC patients. Deep understanding of the tumor immune microenvironment was also necessary in identifying such patients. In the current study, the NMF method was applied to deconvolute the gene expression profiles and identify immune molecular subtypes. We then discovered a robust immune molecular subtype, Immune Class, which comprised 30.8% of the cohort. Detailed immunogenomic profiling was conducted, and a comprehensive description of the tumor-, stromal-, immune-compartments was provided. The presence of Immune Class reflected an active immune response and correlated with current immunotherapy biomarkers.
In this study, we provided an immune molecular classification that, similar to current PDAC molecular classifications, has prognostic value in PDAC. Immune Class was an independent favorable prognostic factor, as confirmed in both the training and validation cohorts. Furthermore, in-depth survival analyses confirmed that integration of the immune molecular subtypes with Moffitt's tumor and stroma classifications had a cumulative effect on prognosis prediction. According to Moffitt et al., the tumor and stroma classifications were similarly based on virtual microdissection of the tumor epithelium and stromal components with the NMF method (15). These findings suggest the complex interplay among the tumor, stromal and immune compartments and support combination therapeutic strategies targeting the tumor microenvironment. Upon comparison to a melanoma cohort, we found that our Immune Class was associated with melanoma patients responding to PD-1 inhibitors, suggesting its potential immunotherapy efficacy. Successful reproduction in five independent datasets suggested the robustness of the immune molecular subtypes. Liu et al. also identified immune classification of PDAC, but used a method of consensus clustering rather than NMF (52). NMF can separate tumor, stromal and immune gene expression from transcriptomic data to deconvolute contextdependent signals. Moreover, compared to their study, we used twice the sample size and conducted a more comprehensive analysis including other current immunotherapy biomarkers, such as TMB and neoantigen count. Our findings indicate the prognostic value of our classification, but further validation in PDAC patients receiving immune checkpoint blockade therapies is required.
Given that the tumor microenvironment of PDAC is comprised of an admixture of abundant stromal cells and immune cells, it is critical to consider tumor purity when interpreting the genomic and transcriptional profiles. Because of the modest concordance among intraplatform tumor purity estimates (40,53), we compared the gold-standard pathologist-reviewed tumor cellularity with bioinformatic estimates, including ABSOLUTE purity, DNA methylation-estimated purity, and copy number-estimated purity. Nevertheless, in our study, DNA methylation-estimated tumor purity and ABSOLUTE purity were well correlated. Generally, Immune Class had lower tumor purity than Nonimmune Class, probably due to higher immune cell infiltration in Immune Class. Regarding prognosis prediction, the immune molecular subtypes rather than tumor purity served as a prognostic factor.
Multiple biomarkers of response to immunotherapy have been developed and include four main categories (1): antigens eliciting T cell responses, such as TMB, CNAs, and neoantigen counts (2); mechanisms of immune evasion, such as CTLA-4 and PD-L1 expression and certain oncogenic pathways (3); markers of immune infiltration, such as CD8+ T cell infiltration; and (4) host factors (54). In our study, both TMB and CNAs were associated with the immune molecular subtypes. Although PDAC has a relatively a lower TMB than other solid tumors (5), there was still a tendency for a higher TMB in Immune Class. We also concluded that patients within Immune Class had relatively lower both broadand focal-level CNAs. These findings highlight increased genomic stability in Immune Class and the role of aneuploidy in regulating immune response. Nevertheless, an association between neoantigen counts and immune molecular subtypes was not identified, which might be explained by the fact that neoantigen quality, rather than neoantigen quantity, is responsible for the CD8+ T cell-mediated immune response (55). In addition, the expression of immunomodulatory genes was compared between immune molecular subtypes to infer the potential immune evasion mechanisms. The expression of immune checkpoint molecules, such as PD-L1 and CTLA-4, was higher in Immune Class. Other immunostimulatory or immunosuppressive genes, including inducible T Cell costimulator (ICOS), 2,3-dioxygenase 1 (IDO1), and selectin P (SELP), was also differentially expressed (13,56,57).
The molecular characteristics of Immune Class also included elevated immune cytolytic activity, IFN-g signaling upregulation, and increased immune cell infiltration. Immune cytolytic activity, defined as the geometric mean of GZMA and perforin 1 (PRF1) expression, is associated with resistance to and relapse following immunomodulatory therapies (35). A six-gene IFN-g signature that can be used to predict the response to pembrolizumab in melanoma patients was also significantly enriched in Immune Class. In previous research, IFN signaling was considered an important inducer of the innate and adaptive responses and served as a new therapeutic approach in pancreatic cancer. The upregulation of IFN signaling promoted (B) SubMap analysis was used to evaluate the immune molecular subtypes in the TCGA-PDAC cohort and four groups of melanoma patients (pre-treatment CTLA-4 inhibitor responders and non-responders, pretreatment PD-1 inhibitor responders and non-responders). Similarity between these two cohorts were illustrated as Bonferroni-corrected P-values. (C) SubMap analysis was used to evaluate the immune molecular subtypes in the TCGA-PDAC cohort and four groups of melanoma patients (pre-treatment CTLA-4 inhibitor responders and non-responders, pre-treatment PD-1 inhibitor responders and non-responders). Similarity between these two cohorts were illustrated as nominal P-values. PD-1, programmed cell death-1; CTLA-4, cytotoxic T lymphocyte-associated antigen-4; B/P metagene, B cell/plasma cell metagene; M/D metagene, monocyte/dendritic cell metagene; IFN-g, interferon-g; TBRS, transforming growth factor-b response signature; ECM, extracellular matrix.
PD-L1 expression, facilitated recruitment of CD8+ T cells and induced immunogenic cell death (34,58). In addition, we also observed significant enrichment of both adaptive and innate immune cell subpopulations using ssGSEA and CIBERSORT. Cytolytic T cells and NK cells, together with a T cell inflamed signature, indicating the upregulation of cellular immunity, were enriched in Immune Class. Similarly, B cells and plasma cells, together with a B cell/plasma cell metagene, implying the upregulation of humoral immunity, were also enriched. Interestingly, the majority of TILs in PDAC were macrophages, indicating the potential of targeting tumor-associated macrophages. Immune Class had significantly more infiltration of proinflammatory M1 macrophages, whereas Nonimmune Class had more infiltration of anti-inflammatory M2 macrophages.
These findings indicate upregulation of innate immune response in Immune Class.
In the current study, the enrichment of stromal signatures and upregulation of the TGF-b and WNT/b-catenin pathways were detected in Immune Class. Immune Class also had fewer mutations in the WNT/b-catenin pathway. It is well established that intrinsic tumor activation of the TGF-b pathway plays a role in the suppression of CD8+ T cell recruitment and function as well as the proliferation and activation of CAFs (59,60). The TGF-b pathway might also lead to chemotherapy and immunotherapy resistance. In addition, our findings demonstrated that patients within Immune Class had a significantly lower frequency of mutations in SMAD4. The loss of SMAD4 was previously reported to regulate the cell cycle and promote tumor proliferation and indicated poor survival in PDAC patients (61,62). Crosstalk between the WNT/b-catenin pathway and TGF-b/ SMAD4 pathway in the tumor immune microenvironment was thus implied. In conclusion, all these findings suggested that the immunotherapy response in PDAC was modulated by a combination of tumor-intrinsic mechanisms (e.g., TMB, CNAs, immunomodulatory gene expression, and certain oncogenic pathways) and tumor-extrinsic mechanisms (e.g., TILs).
In summary, our study revealed robust immune molecular subtypes in PDAC that achieved better performance in capturing immune components than previous classifications. Immune molecular subtypes correlated with currently used immunotherapy biomarkers, which confirmed the reliability of our classification. The cumulative effect of tumor, immune, and stroma classifications on prognosis prediction was confirmed. Nevertheless, our findings still require further validation in large cohorts of early-stage and metastatic PDAC patients. Additionally, further investigation should be performed in PDAC patients receiving immune checkpoint blockade therapies, to demonstrate its potential value in the immunotherapy response.

AUTHOR CONTRIBUTIONS
RL was involved in methodology, software, formal analysis, visualization, data curation, and writing -original draft. YH was involved in software and writing -review and editing. HZ was involved in validation and data curation. JW was involved in validation and methodology. XL was involved in data curation and resources. HL was involved in validation and investigation. HW was involved in funding acquisition, methodology, project administration, and writing -review and editing. ZL was involved in supervision, conceptualization, funding acquisition, and project administration RL and HW have accessed verified the underlying data. All authors contributed to the article and approved the submitted version.