S100A2 Is a Prognostic Biomarker Involved in Immune Infiltration and Predict Immunotherapy Response in Pancreatic Cancer

Pancreatic cancer (PC) is a highly fatal and aggressive disease with its incidence and mortality quite discouraging. It is of great significance to construct an effective prognostic signature of PC and find the novel biomarker for the optimization of the clinical decision-making. Due to the crucial role of immunity in tumor development, a prognostic model based on nine immune-related genes was constructed, which was proved to be effective in The Cancer Genome Atlas (TCGA) training set, TCGA testing set, TCGA entire set, GSE78229 set, and GSE62452 set. Furthermore, S100A2 (S100 Calcium Binding Protein A2) was identified as the gene occupying the most paramount position in risk model. Gene set enrichment analysis (GSEA), ESTIMATE and CIBERSORT algorithm revealed that S100A2 was closely associated with the immune status in PC microenvironment, mainly related to lower proportion of CD8+T cells and activated NK cells and higher proportion of M0 macrophages. Meanwhile, patients with high S100A2 expression might get more benefit from immunotherapy according to immunophenoscore algorithm. Afterwards, our independent cohort was also used to demonstrate S100A2 was an unfavorable marker of PC, as well as its remarkably positive correlation with the expression of PD-L1. In conclusion, our results demonstrate S100A2 might be responsible for the preservation of immune-suppressive status in PC microenvironment, which was identified with significant potentiality in predicting prognosis and immunotherapy response in PC patients.


INTRODUCTION
Pancreatic cancer (PC) is one of the most aggressive malignancies, with a five-year survival rate of only 10% in the United States (1). According to the latest epidemiological data, there are 495,773 new cases and 466,003 deaths of PC worldwide in 2020, making ratio of incidence and mortality close to 1:1 (2). In addition to the lack of sensitive screening methods and the rapid progression of PC, the dismal prognosis of this disease is largely attributable to the lack of valid risk prediction models and biomarkers in PC development (3). Therefore, it is of great significance to construct an effective prognostic signature of PC and find the novel biomarker for the optimization of the clinical decision-making.
Tumor microenvironment (TME), a concept developed from Paget's "seed and soil" theory, is regarded as both a cause and consequence of tumorigenesis, which is demonstrated to provide a permissive environment for tumor initiation and progression (4,5). In addition to fibroblasts endothelial cells, stromal cells, blood vessels and secreted factors, the TME comprises innate and adaptive immune cells, which have a profound impact on tumor development (6,7). In recent years, the vital role of immune cells in the occurrence and progression of PC is gradually revealed (8)(9)(10). For example, Yamamoto et al. identified NBR1-mediated selective macroautophagy/autophagy of MHC-I hindered cancer cell recognition and clearance by CD8+ T cells in PC (11), and granulin secretion by metastasis-associated macrophages activates resident hepatic stellate cells into myofibroblasts, resulting in a fibrotic microenvironment that sustains m e t a s t a t i c P C g r o w t h ( 1 2 ) . M e a n w h i l e , a l t h o u g h immunotherapy is almost ineffective for PC (13,14), PC patients who exhibited high effector T-cell infiltration in tumor had longer overall survival (15,16), implying that valuing immune heterogeneity and remodeling the immune microenvironment may hold promise for PC treatment. Therefore, we considered a prognostic model based on immune-related genes (IRGs) to better predict the prognosis of PC patients and optimize the clinical decision-making. Furthermore, the most paramount gene and its potential mechanisms were further explored, as well as its ability to predict patients' response to immunotherapy.
In the present study, we constructed a prognostic model based on nine IRGs and the corresponding nomogram, which were proved to be an independent risk factor and was validated in the training set, testing set, entire set, GSE78229 set and GSE62452 set. S100A2 (S100 Calcium Binding Protein A2), a highly conserved elongation factor (EF)-hand calcium-binding protein, was identified as the gene occupying the most paramount position in the risk signature. GSEA, ESTIMATE and CIBERSORT algorithm revealed that S100A2 was closely associated with the immune status in the PC microenvironment, mainly related to lower proportion of CD8+T cells and activated NK cells and higher proportion of M0 macrophages. Meanwhile, the results of immunophenoscore (IPS) algorithm proved that patients with high S100A2 expression might get more benefit from immunotherapy. Afterwards, our own independent cohort (PUMCH cohort) was also utilized to demonstrate S100A2 was an unfavorable marker of PC, as well as its remarkably positive correlation with the expression of PD-L1.

Datasets Sources and Processing
Immune-related genes were extracted and integrated from the ImmPort database (https://immport.niaid.nih.gov; ≤March 1, 2021) (17). Gene expression profile, clinical information, and mutation profile of the patients were downloaded from The Cancer Genome Atlas (TCGA) dataset (https://portal.gdc. cancer.gov/; ≤March 1, 2021). Samples with inadequate clinical information and follow-up period less than 30 days were excluded. Finally, 166 cases with corresponding gene expression profiles and clinical information were included in the study ( Table 1, detailed in Table S1). Gene IDs was converted to gene symbol using a GFF3 file, which was downloaded from GENCODE (https://www.gencodegenes.org/). The gene expression data was converted to TPM (Transcripts Per Kilobase Million), and log2(TPM + 0.01) was used throughout the analysis unless otherwise noted. The samples of tumor tissues in TCGA set were randomly divided into to a training set and a testing set by a ratio of 7:3 using "sample" function of R software.

Construction and Validation of a Risk Signature Associated With Survival of PC Patients
Limma package was applied to screen differentially expressed genes (DEGs) in GSE15471, GSE28735, and GSE62165 datasets respectively (27). |Fold Change| >1.5 and false discovery rate (FDR) <0.05 were set as the cutoffs for the DEGs. The intersection of DEGs were selected as candidate genes. Univariate Cox regression was used to identify genes that were significantly associated with overall survival (OS) of PC patients in the training set (P <0.01). Subsequently, Least absolute shrinkage and selection operator (LASSO) regression analysis was further used to screen out the optimal gene combination for constructing the risk signature. According to the regression coefficient-weighted pseudogene expression, the risk signature was established as follows: Risk score = (expr gene1 × Coef gene1 ) + (expr gene2 × Coef gene2 ) + … + (expr genen × Coef genen ). The efficiency and independence of the risk signature were assessed by Kaplan-Meier (K-M) curve, time-dependent receiver operating characteristic (ROC) curve and survival point diagram in both the internal validation set (training set, testing set, and entire set) and the external validation set (GSE78229 set and GSE62452 set). Copy number variation information of the nine genes was extracted from the cBioportal database (http:// www.cbioportal.org/) (28), and protein expression in normal and tumor tissues was obtained from the Human Protein Atlas (HPA) database (https://www.proteinatlas.org/).
Meanwhile, in order to make the prediction model more accurate, the clinicopathological information was also incorporated with the riskscore to establish a nomogram, which was based on the results of the univariate and multivariate analysis by using the 'rms' package in R language. The C-index, calibration curve and time-dependent ROC curve of 1-, 1.5-, and 2-year were applied to evaluate the predictive effectiveness of the nomogram.

Differential Gene Analysis, Co-Expression Network Construction and Functional Enrichments Analysis Between S100A2 High and Low Expression Group
The pan-cancer expression analysis of S100A2 was performed through the GEPIA2 database (29). edgeR package was used to perform DEGs analysis between S100A2 high and low expression group, in which |Log2FC| >2 and FDR <0.001 were considered statistically significant (30). The pheatmap package, tidyverse package, and ggrepel package were utilized to create the heatmap and the volcano plot in R language. Approximately 50 genes with the most significant differences were shown in the heatmap, and those genes with their P values <1 × 10 -20 and |logFC| >4 were labeled in the volcano plot. Afterwards, the co-expression network was constructed and visualized with STRING database and Cytoscape. The minimum required interaction score was set to be high confidence (0.700) and disconnected nodes were hidden in the network, therefore not all genes were represented. To further elucidate the mechanism of S100A2 in the development of PC, we performed GSEA analysis of the DEGs (31). The ALL ontology of the DEGs was analyzed by Gene Ontology (GO) (32), while pathway enrichment was analyzed by the Kyoto Encyclopedia of Genes and Genomes (KEGG) (33).
The number of random sample permutations was set at 1,000, and NOM p-value <0.05 and FDR q-value <0.25 were set as the significance threshold.

Estimation of Tumor Infiltrating Immune Cells
CIBERSORT algorithm could calculate the ratios of infiltrating immune cells from tissue transcriptional profiles by a deconvolution algorithm (34). Based on the expression profiles of patients in the TCGA and GSE71729 datasets, we calculated the relative abundance of 22 types of tumor infiltrating immune cells in each patient. Meanwhile, stromal, immune, and estimate scores were outputted respectively by the R package 'estimate' (35).

Tumor Mutation Burden Analysis
The mutation profile was acquired from TCGA data portal (https://portal.gdc.cancer.gov/; ≤March 1, 2021). Somatic variants data of patients were analyzed and visualized by maftools package in R language (36). Then the tumor mutation burden (TMB) of each patient was calculated and analyzed by TCGA mutations package.

Prediction of the Patients' Response to Immunotherapy
Immunophenoscore (IPS) was a scoring scheme for the quantification of tumor immunogenicity, which was verified to positively correlated to the probability to respond to immunotherapy (37). The Cancer Immunome Atlas (https:// tcia.at/) characterized the intratumoral immune landscapes and the cancer antigenomes from 20 solid cancers (37). The IPS data of PC patients was extracted for the following analysis, including the scores for anti-PD-1/PD-L1 treatment and anti-CTLA-4 treatment. Meanwhile, the correlation between S100A2 and immune checkpoints was also investigated in TCGA entire set, including PD-1, PD-L1, and CTLA-4.

Clinical Specimens
A total of 65 patients with primary PDAC who underwent surgical resection at the Peking Union Medical College Hospital (PUMCH) were included in this study (PUMCH cohort, April 2019-November 2020). TNM staging was evaluated according to the 8th edition of the American Joint Committee on Cancer (AJCC) staging system for PC (38). Sequential sections of each patient were used for following studies. Written informed consent were obtained from all the patients enrolled in this study. This project was approved by the Ethics Committee of Peking Union Medical College Hospital.

Statistical Analysis
All the statistical analyses and visualization were performed using Rstudio (version 4.1.0) and GraphPad Prism 8 (version 8.0.1), including DEGs analysis, univariate and multivariate Cox regression analysis, LASSO regression analysis, correlation analysis, clinicopathological factor analysis, ROC curve analysis, and K-M survival analysis. A two-sided P <0.05 was considered as statistically significant unless otherwise noted.

Nine Immune-Related Genes Were Screened Out For Constructing A Risk Signature
The flowchart of the whole analysis was illustrated in Figure S1. A total of 1,793 IRGs were integrated from the ImmPort database (Table S4)  Then we intersected the three differential gene sets, and finally obtained 86 common DEGs ( Figure 1D). Subsequently, univariate Cox regression analysis of 86 candidate genes was applied in TCGA training set (n = 116) to identify prognosisrelated genes (P <0.01), resulting in 26 genes with Hazard Ratio (HR) >1 and one gene with HR <1 (Table S5). LASSO regression analysis was further performed on the prognosis-related genes in order to avoid overfitting problems and construct the risk signature, and nine genes (AREG, CXCL10, MET, OAS1, PI3, PLAU, S100A14, S100A2, and SPP1) were finally screened out according to the optimal lambda value ( Figures 1E, F, log (lambda.min) = −2.554188). At the same time, the copy number variation and the protein expression status of these nine genes were also explored through the cBioportal database and the HPA database ( Figures S2, S3).

CONSTRUCTION OF A RISK SIGNATURE FOR PREDICTING SURVIVAL RATE OF PC
Base on the expression level of nine IRGs and the regression coefficient derived from LASSO regression model, we designed a risk-score formula for PC patients' survival prediction in training set. The risk score for each patient was calculated as follows: Risk score = (0.0356 × expression level of AREG) + (0.0651 × expression level of CXCL10) + (0.1030 × expression level of MET) + (0.0269 × expression level of OAS1) + (0.0002 × expression level of PI3) + (0.0129 × expression level of PLAU) + (0.0455 × expression level of S100A14) + (0.0519 × expression level of S100A2) + (0.0404 × expression level of SPP1). Then the patients in the training set were divided into high-risk group (n = 58) and low-risk group (n = 58) according to the median cut-off value of the risk scores.
To evaluate the competitive performance of the nine immunerelated genes signature, Kaplan-Meier (K-M) curve analysis and time-dependent receiver operating characteristic (ROC) curve analysis were applied ( Figure 2A). As shown in the Kaplan-Meier curves, patients in the high-risk group suffered worse prognosis than the patients in the low-risk group ( Figure 2B Figure 2B), proving a high prognostic value for survival prediction in the training set. Compared with the low-risk group, the expressions of S100A2, AREG, CXCL10, MET, OAS1, PI3, PLAU, S100A14, and SPP1 increased in the high-risk group. Consistent with this, the number of deaths increased with the risk scores rising ( Figure 2B).

Effectiveness and Independence Validation of the Risk Signature for the Survival Prediction
We next performed internal validation of the risk signature in testing set (n = 50) and the entire set (n = 166), and external validation in GSE78229 dataset (n = 49) and GSE62452 dataset (n = 66). By calculating the risk scores for each patient based on the above-mentioned formula, the patients in these datasets were divided into high-risk group and low risk group using the same criteria. Consistent with the results in the training set, patients in the high-risk group had significantly lower overall survival (OS) than those in the low-risk group ( Figures 2C, D Figures 2E, F). Meanwhile, the expressions of the nine hub IRGs increased significantly and the number of deaths was remarkably higher in the high-risk group, which was consistent with the results of the training set ( Figures 2E, F).
Afterwards, we intended to investigate whether the survival prediction based on the risk signature was independent of other clinical factors ( Table 1). Univariate Cox regression analysis and multivariate Cox regression analysis were conducted on these factors in the training set, testing set and entire set respectively. And the results showed that the risk signature was independent of other clinical factors, including age, gender, AJCC_stage, grade, T stage and N stage ( Figures S4A-F, P <0.05 in all dataset for risk score). The prognostic value of the risk signature was also explored in different cohorts stratified by age, gender, tumor grade and T stage (Figures S5A-L, P <0.05 in all subgroups). Regardless of the subgroup, patients in the highrisk group suffered significantly poorer prognosis than those in the low-risk group, further confirming that this risk signature was an independent prognostic factor for PC.   Table 1). First, we performed univariate Cox regression analyses on all the factors in training set, and then factors with P <0.2 were included in the multivariate analysis ( Figures 3A, B). Concomitantly, we reconfirmed that risk score was an independent prognostic factor in this process. Finally, risk score, age, T stage and N stages were incorporated into the construction of nomogram for predicting 1-, 1.5-, and 2-year survival rate of PC. In the nomogram, the patients' 1-, 1.5-, and 2-year survival rates were estimated by the total points obtained by adding up the point of each factor ( Figure 3C). The C-index of the training set, the testing set and entire set were 0.718, 0.686, and 0.708 respectively, indicating the excellent performance of the nomogram. Subsequently, time-dependent ROC curve and calibration plot were applied to further evaluate the effectiveness of the nomogram. The AUCs of ROC curves for predicting 1-, 1.5-, and 2-year survival were 0.764, 0761, and 0.807 in the training set ( Figure 3D), 0.785, 0.692, and 0.723 in the test set ( Figure 3E), and 0.767, 0.732, and 0.777 in the entire set, respectively ( Figure 3F). In addition, the calibration plot showed good agreement between the predicted and actual outcome of 1-year, 1.5-year, and 2-year OS of the nomogram in training set ( Figures S6A-C), testing set ( Figures S6D-F) and entire set ( Figures S6G-I).

S100A2 Is Highly Expressed and Correlates With Unfavorable Prognosis in PC
In the DEGs analysis between the high and low risk groups, the increased expression of S100A2 occupied the most significant position ( Figure 4A, FDR = 5.55 × 10 −36 , log 2 FC = 4.36). Furthermore, due to its high proportion in the risk signature, we tended to consider that S100A2 occupied the core position in the risk signature. A pan-cancer analysis of S100A2 was performed, showing that PC experienced one of the most remarkably increase of S100A2 expression among all types of cancer ( Figure S7). To be specific, a joint analysis of TCGA and GTEx databases confirmed that the expression of S100A2 in PC tissues was significantly higher than that in normal tissues ( Figure 4B, P <0.001). Meanwhile, TCGA entire set was divided into S100A2 high and low expression groups based on S100A2 median expression. The Kaplan-Meier analysis elucidated that PC patients with S100A2 high expression suffered a poor prognosis than those with S100A2 low expression ( Figure 4C, P <0.01). Concomitantly, the association between S100A2 expression and patients' clinicopathological information was further investigated.
Notably, the expression of S100A2 was significantly increased along with the progression of tumor grade, AJCC_stage, age and T stage (Figures 4D-I).
In order to further verify the above findings, we conducted clustering on the single-cell dataset CRA001160 and explored the predominant expression cells of S100A2 (24,25). It was found that S100A2 was mainly expressed by cancer cells in PC tissues ( Figure 5A). Subsequently, the significantly high expression of S100A2 in tumor cells was confirmed by qRT-PCR in pancreatic normal cell line (HPNE) and pancreatic cancer cell lines (AsPC-1, BxPC-3, Capan-1, CFPAC-1, MIA PaCa-2, PATU 8988, PANC-1, SW1990, and T3M4) ( Figure 5B). Meanwhile, PUMCH cohort (n = 65) was utilized to further validate that high expression of S100A2 was associated with poor prognosis in PC ( Table 3). Comprehensive analysis of S100A2 immunohistochemical scores and clinicopathologic information revealed that tumor with high S100A2 expression experienced higher T stage and poorer differentiation ( Figures 5C-E). Collectively, these results indicated that high S100A2 expression in PC patients was correlated with unfavorable prognosis.

S100A2 Predicts the Infiltration of Immune Cells Into PC Microenvironment
Next, in order to investigate the in-depth mechanism of S100A2 leading to poor prognosis of PC, DEGs analysis was performed between the S100A2 high expression group (n = 83) and S100A2 low expression group (n = 83) in TCGA entire set ( Figure 6A). As predicted, S100A2 was the gene with the most significant difference between the two groups, supporting the accuracy of the analysis. Then the co-expression network was constructed and visualized with STRING database and Cytoscape ( Figure 6B).
To further elucidate the mechanism of S100A2, GSEA analysis was conducted on DEGs, in which P <0.05 and q <0.25 was considered statistically significant. Five representative pathways for the Kyoto Encyclopedia of Genes and Genomes (KEGG) and the Gene Ontology (GO) analyses were presented respectively ( Figures 6C, D). Collectively, it was uncovered that part of the pathways of DEGs enrichment were associated with immune response and associated signaling pathways. Therefore, CIBERSORT algorithm was applied to detect the proportions of 22 kinds of immune cells in TCGA entire set ( Figure 6E). The results showed that relatively higher proportion of M0 macrophages cells and a lower proportion of resting memory CD4+ T cells were found in the S100A2 high expression group compared with the low expression group ( Figure 6F). To further verify this conclusion, the number of macrophages and CD4+ T cells in the single cell dataset CRA001160 was statistically analyzed, which were divided into S100A2 high-expression group, S100A2 moderate-expression group and S100A2 low-expression group. Consistent with the previous results, with the increase of S100A2 expression, the proportion of macrophages gradually increased while that of CD4+T cells declined (Figures S8A-C) and immunohistochemical images also support these findings, in which patients with high S100A2 expression exhibited higher CD68 expression and lower CD4 expression (Figures S8D, E).
Moreover, in order to prove the universality of the results, GSE71729 dataset (n = 125) was also included for following B C E D A FIGURE 5 | Validation of high expression of S100A2 in PC cancer cells and its association with poor prognosis. (A) The results of clustering and S100A2 expression distribution in single cell dataset CRA001160. (B) The expression difference of S100A2 between normal and pancreatic cancer cell lines detected by qRT-PCR. The difference between each PC cell line and HPNE was analyzed. (C) Representative images of low and high expression of S100A2 in PUMCH cohort (n = 65). (D-E) Correlation between S100A expression and T stage and differentiation status in PUMCH cohort. *P < 0.05; **P < 0.01; ***P < 0.001.
analysis. It was discovered that the expression of S100A2 had a significant positive correlation with M0 macrophages and activated dendritic cells, while a remarkable negative correlation with CD8+ T cells and activated NK cells ( Figure 6G). In addition, ESTIMATE package was also used to score the immune microenvironment, which revealed that the immune score of the group with high S100A2 expression was significantly lower than that of the group with low S100A2 expression ( Figure 6H).

S100A2 Is Associated With Patients' TMB and Response to Immunotherapy
The mutation profiles of each PC patients were analyzed and visualized ( Figure S9). For the TCGA dataset, the ten genes with the highest mutation rate were KRAS, TP53, SMAD4, CDKN2A, TTN, MUC16, RNF43, GNAS, ARID1A, and PCDH15 ( Figure 7A). Meanwhile, we calculated the tumor mutation burden (TMB) of each sample and found that the TMB was higher in the group with high S100A2 expression ( Figure 7B, P <0.05). Combined with the fact that patients with high TMB suffered a worse prognosis ( Figure 7C, P <0.05), it was hypothesized that the effect of S100A2 on the progression of PC might result from a higher TMB.
IPS is a machine learning-based scoring system, which was able to predict patients' response to immunotherapy including anti-PD-1/PD-L1 and anti-CTLA-4 treatment (37). Combined analysis of the expression S100A2 and IPS score proved that patients with high S100A2 expression had a relative high probability to respond to anti-PD-1/PD-L1 treatment and anti-CTLA-4 treatment ( Figures 7D, E, P <0.05). These results indicated that patients with high S100A2 expression are more suitable for immunotherapy such as anti-PD-1/PD-L1 treatment and anti-CTLA-4 treatment.

The Expression of S100A2 Was Positively Correlated With PD-L1 in PC Cells
In addition, it was discovered the expression of S100A2 in tumor tissues was remarkably positively correlated with the expression of PD-L1 ( Figure 8A, P = 0.001, r = 0.25) and CTLA-4 ( Figure 8A, P <0.01, r = 0.23), especially PD-L1. It might partly explain why samples with high expression of S100A2 experienced fewer CD8+ and CD4+ T cell infiltration, as well as better therapeutic effect on anti-PD1/PD-L1 therapy and anti-CTLA-4 therapy.
Since the relationship between S100A2 and PD-L1 was the most remarkable, PUMCH cohort (n = 65) was used to further demonstrate the positive correlation between S100A2 and PD-L1 ( Figure 8B and Table 3). There was a significantly increased expression of PD-L1 in patients with high expression of S100A2 according to the immunohistochemical analysis of sequential sections staining S100A2 and PD-L1 ( Figure 8C, P <0.001). Meanwhile, the expression profiles of 51 pancreatic cancer cell lines in Cancer Cell Line Encyclopedia (CCLE) database also supported above results ( Figure 8D, P <0.05).

DISCUSSION
PC is one of the leading causes of cancer-related death worldwide, which is expected to become the second most common cause of cancer-related death by 2030 after lung cancer (40). There are a number of crucial reasons for this dismal status, and one of them is the lack of effective risk prediction models and biomarkers, which hinders individualized treatment of PC. Herein, due to the critical role of tumor microenvironment in the carcinogenesis and progression of PC (41,42), we explored an IRGs-based predictive model to evaluate the prognosis of PC patients. Nine prognosis-specific IRGs were identified by a series of bioinformatics analysis: S100A2, AREG, CXCL10, MET, OAS1, PI3, PLAU, S100A14, and SPP1. Among them, AREG, CXCL10, MET, PLAU, S100A14, and SPP1 have been reported to be involved in the carcinogenesis and progression of PC (43)(44)(45)(46)(47)(48), implying that our risk signature has considerable prognostic B C E F G H D A FIGURE 6 | Differential gene analysis, co-expression network construction and functional enrichments analysis between S100A2 high and low expression groups, as well as the correlation analysis of the expression of S100A2 and immune cell infiltration. (A) Heatmap of top 50 DEGs in PC between S100A2 high and low expression groups. (B) Co-expression network of DEGs constructed and visualized with STRING database and Cytoscape. (C, D) Gene Set Enrichment Analysis between S100A2 high and low expression groups. The representative 5 KEGG enrichments (C) and GO enrichments (D) were displayed respectively. (E) The abundance ratio of the 22 types of immune cells in TCGA entire set. (F) Differential immune cell type abundance between S100A2 high and low expression groups. (G) Correlation analysis between the expression of S100A2 and the proportion of immune cells in GSE71729 dataset. Immune cell types with P < 0.05 were shown. (H) Differences in immune scores between high and low S100A2 expression groups. *P < 0.05; **P < 0.01. value. The remaining three genes, including S100A2, OAS1, and PI3, have not been well documented for their participation in PC development. Since S100A2 occupied the most paramount position in the risk signature-based DEGs analysis, and S100A2 accounted for a relatively high proportion in the risk signature, we tended to consider that S100A2 occupied the core position in the risk signature. Therefore, we gave special attention to S100A2 in the following exploration. The association between S100A2 expression and the relative probabilities to respond to immunotherapy, including anti-PD-1/PD-L1 therapy and anti-CTLA-4 therapy. *P < 0.05. S100A2 is an important member of the S100 protein family, which is a group of highly conserved elongation factor (EF)-hand calcium-binding proteins (49,50). Aberrant expression of S100A2 affects a range of cellular physiological functions, such as calcium homeostasis, enzyme activities and protein phosphorylation (51,52). Notably, the role of S100A2 in tumors appears to be dual (53). Li et al. have reported that S100A2 activated the PI3K/AKT signaling pathway and upregulated GLUT1 expression in colorectal cancer, which induced glycolytic reprogramming and consequently increased tumor proliferation (54). Conversely, S100A2 was also identified to be one of the crucial tumor suppressor genes involved in the lung carcinogenesis (55). And our results supported its deteriorating effect in PC development. Previous clinical studies have proved S100A2 to be an independent poor prognostic factor and an indicator of less benefit to pancreatectomy for PC (56,57). However, the underlying mechanism by which S100A2 promotes the progression of PC has not been fully revealed, which is also the main content of this study, especially the relationship between S100A2 and the tumor immune microenvironment. GSEA analysis revealed that the high expression of S100A2 was closely associated with the tumor immune microenvironment and corresponding pathways, (TNF) signaling pathways, and also weakened adaptive immune response, which have been widely reported to participate in tumor progression (58)(59)(60)(61)(62). Therefore, CIBERSORT algorithm was applied to further elucidate the abundance ratios of 22 types of immune cells in each PC patients from TCGA entire set. It was found that compared with S100A2 low expression patients, S100A2 high expression patients experienced significantly higher proportions of M0 macrophages cells and activated dendritic cells, as well as remarkable lower proportions of CD8+ T cells, resting memory CD4+ T cells and activated NK cells. Among them, CT8+T cells, the immune cell with the most prominent tumor killing ability (63,64), were significantly reduced in S100A2 high expression group, which partially explained the poor prognosis of patients with high S100A2 expression. Meanwhile, NK cells, another major tumor killer cells (65,66), showed a similar trend in S100A2 high expression group. In addition, M0 macrophages have been demonstrated to be associated with worse prognosis of PC (67), but some other researches reached the opposite conclusion (68). In our analysis, the high expression of S100A2 was associated with the increase of M0 macrophages, but whether this is related to the mechanism of S100A2 leading to PC progression remained to be explored. It was also possible that the increase of M0 macrophages is a precursor to the increase of tumor-associated macrophages (TAMs), and the immune profiles reflected in the TCGA and GSE71729 datasets were both prior to the differentiation of M0 macrophages. In addition, it was worth noting that the expression of S100A2 was positively correlated with the activation of dendritic cells, which played a pivotal role in anti-tumor immunity (69). For this phenomenon, we suspected that it might be due to the negative feedback effect caused by the decrease and functional deficiency of T cells.
In recent years, immunotherapy has been proved to be one of the most promising therapies for cancer therapy and has made a profound progress in prolonging the survival time of patients with of various types of tumors (70,71). However, the immunotherapy is almost ineffective for pancreatic cancer (72,73). Promisingly, a small subset of patients who exhibited high effector T-cell infiltration in tumor had longer overall survival (15,16), implying that immunotherapy still had certain application value for PC patients.
Since we have previously explored the role of S100A2 in predicting tumor immune microenvironment, we wondered whether S100A2 has any predictive effect in predicting the efficacy of immunotherapy for PC. In the past years, studies have revealed that tumor mutation burden is positively related to the efficacy of immunotherapy (74,75). Specifically, the more TMB a tumor has, the more neoantigens it is also likely to form and T-cells released by immune checkpoint inhibitors are more likely to recognize the neoantigens and thus attack the tumor cell. Therefore, we explored the relationship between the expression level of S100A2 and TMB. The results showed that patients with high S100A2 expression had higher TMB, which indirectly indicated that patients with high S100A2 expression might have better therapeutic effect on immunotherapy. Apart from that, according to the IPS algorithm (37), it was estimated that patients with high expression of S100A2 displayed relatively significant anti-PD1/PD-L1 and anti-CTLA-4 therapeutic effects. Moreover, the expression of S100A2 was remarkably positively correlated with the expression of PD-L1 and CTLA-4, especially with the expression of PD-L1. It has been reported that PD-L1 was able to inhibit the activation of T cells by binding to PD-1 receptor on the surface of T cells (76). In our study, we found that the expression of PD-L1 was significantly increased in patients with high S100A2 expression, suggesting that patients with high S100A2 expression may have fewer T cells infiltration in tumor microenvironment. Meanwhile, the results obtained by CIBERSORT algorithm also showed that patients with high S100A2 expression had fewer CD8+ T cells, which was exactly consistent with the previous speculation. To further verify the correlation between the expression of S100A2 and PD-L1, immunohistochemistry was performed on sequential sections of PUMCH cohort (n = 65) for S100A2 and PD-L1 respectively. According to comprehensive analysis of immunohistochemical scores, it was confirmed that patients with high S100A2 expression had higher PD-L1 expression in tumor tissues. In addition, expression profile of S100A2 and PD-L1 in all pancreatic cell lines was integrated from the CCLE database, and similar results were obtained. Regarding the co-expression of S100A2 and PD-L1, studies have shown that overexpression of S100A2 in A549 lung cancer cells enhanced Akt phosphorylation (77). Meanwhile, numerous studies have revealed that Akt activation could increase the expression of PD-L1 (78,79). On this basis, we hypothesized that the co-expression of S100A2 and PD-L1 in pancreatic cancer might be based on the activation of the S100A2-Akt-PD-L1 signaling pathway.
In spite of the positive results, several limitations in our study should also be acknowledged. Firstly, due to the extremely poor prognosis of PC, the survival time of patients rarely exceeds three years, which may bring some imprecise results when we want to predict long-term prognosis. Besides, IPS algorithm is applied to mimic patients' response to immunotherapy. Although the prediction of immunotherapy efficacy by IPS algorithm has been verified in several independent datasets, it still cannot completely replace the actual therapeutic effect.
In summary, a risk signature consisting of nine immunerelated genes was constructed through a series of bioinformatics analysis, which was validated in TCGA training set, TCGA testing set, TCGA entire set, GSE78229 set and GSE62452 set. Subsequently, a nomogram was also developed to establish a more accurate prognostic prediction model for PC. Furthermore, S100A2 was identified as the gene occupying the core position in risk model, which was demonstrated to be significantly associated with the progression of tumor grade, AJCC_stage, age and T stage. Mechanically, GSEA, ESTIMATE and CIBERSORT algorithm analysis revealed that the deteriorating effect of S100A2 was associated with dysfunctional tumor immune microenvironment, mainly related to lower proportion of CD8+T cells and activated NK cells and higher proportion of M0 macrophages. Meanwhile, the results of IPS algorithm revealed that patients with high expression of S100A2 might get more benefit from immunotherapy. Finally, our independent cohort was applied to demonstrate a remarkably positive correlation between the expression of S100A2 and PD-L1, as well as the positive relationship between S100A2 expression and unfavorable prognosis of PC patients. Our findings demonstrate S100A2 might be responsible for the preservation of immune-suppressive status in PC microenvironment, which contributes to accurate assessment of the prognosis of PC patients and optimization of the clinical decision-making.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.

ETHICS STATEMENT
Written informed consent was obtained from all the patients enrolled in this study. This project was approved by the Ethics Committee of Peking Union Medical College Hospital.  The proportion of macrophages and CD4+ T cells in high S100A2 expression group, moderate S100A2 expression group and low S100A2 expression group. (D, E) Representative images of S100A2 and CD68/CD4 expression by immunohistochemistry in HPA database.

AUTHOR CONTRIBUTIONS
Supplementary Figure 9 | The mutation profiles of patients in TCGA dataset.