Establishment of a Macrophage Phenotypic Switch Related Prognostic Signature in Patients With Pancreatic Cancer

Introduction Macrophage phenotype switch plays a vital role in the progression of malignancies. We aimed to build a prognostic signature by exploring the expression pattern of macrophage phenotypic switch related genes (MRGs) in the Cancer Genome Atlas (TCGA)—pancreatic adenocarcinoma (PAAD), Genotype-Tissue Expression (GTEx)-Pancreas, and Gene Expression Omnibus (GEO) databases. Methods We identified the differentially expressed genes between the PAAD and normal tissues. We used single factor Cox proportional risk regression analysis, Least Absolute Shrinkage and Selection Operator (LASSO) analysis, and multivariate Cox proportional hazard regression analysis to establish the prognosis risk score by the MRGs. The relationships between the risk score and immune landscape, “key driver” mutations and clinicopathological factors were also analyzed. Gene-set enrichment analysis (GSEA) analysis was also performed. Results We detected 198 differentially expressed MRGs. The risk score was constructed based on 9 genes (KIF23, BIN1, LAPTM4A, ERAP2, ATP8B2, FAM118A, RGS16, ELMO1, RAPGEFL1). The median overall survival time of patients in the low-risk group was significantly longer than that of patients in the high-risk group (P < 0.001). The prognostic value of the risk score was validated in GSE62452 dataset. The prognostic performance of nomogram based on risk score was superior to that of TNM stage. And GSEA analysis also showed that the risk score was closely related with P53 signaling pathway, pancreatic cancer and T cell receptor signaling pathway. qRT-PCR assay showed that the expressions of the 9 MRGs in PDAC cell lines were higher than those in human pancreatic ductal epithelium cell line. Conclusions The nine gene risk score could be used as an independent prognostic index for PAAD patients. Further studies validating the prognostic value of the risk score are warranted.


INTRODUCTION
Pancreatic ductal adenocarcinoma (PDAC), with an estimated 5year overall survival rate less than 10%, is the fourth leading cause of cancer-related mortality in the world (1). And its incidence is still increasing these years due to lifestyle change and improved medical detection technology. Majority of pancreatic cancer patients were usually at advanced stages at their initial diagnosis. The lack of effective systematic therapies and useful prognostic indexes deteriorates the dismal prognosis of pancreatic cancer patients (2).
With the development of high-throughput technologies, molecular characterization may shed light on newer therapeutic targets (3). Therefore, it is essential to identify molecular prognostic factors of pancreatic cancer which aid in rational stratification of patients according to the clinical prognosis as well as in providing potentially therapeutic targets (4).
It has long been held that the dismal therapeutic effects in pancreatic cancer can largely be attributed to the complex tumor microenvironment (TME). Macrophages are one of the most abundant immune cells in PDAC tumor microenvironment (5). According to their polarization states, macrophages are roughly categorized into two types: classically activated type 1 (M1 macrophages), and alternatively activated type 2 (M2 macrophages) (6). M1 macrophages, characterized by the expression of the inducible-type nitric oxide synthase (iNOS), are pro-infl ammatory and develop in response to lipopolysaccharides (LPS) or interferon-g (IFN-g). M2 macrophages, or anti-inflammatory macrophages, develop in response to interleukin (IL)-4, IL-13 or glucocorticoids, and are characterized by the secretion of anti-inflammatory mediators, including transforming growth factor-b1 (TGF-b1) and IL-10 to promote extracellular matrix remodeling and angiogenesis (7). M2 macrophages exert pro-tumor functions, whereas M1 macrophages exert anti-tumor functions (8). The macrophage phenotypic switch related gene (MRGs) may provide us with indepth information on the prognosis of PAAD patients (9).
In the present study, we aimed to build a prognostic model via thorough investigation of the cancer genome atlas (TCGA) database, Genotype-Tissue Expression (GTEx) database and Gene Expression Omnibus (GEO) database. We hoped that our prognostic risk score could aid in the prognostic prediction as well as the treatment strategy design.

Data Collection
We collected the transcriptome profiles of pancreatic adenocarcinoma (PAAD) available in the TCGA database (https://portal.gdc.cancer.gov/) and Genotype-Tissue Expression (GTEx) GTEx-Pancreas datasets (https:// xenabrowser.net/) on July 5 th , 2020. Our study included the expression profile of 171 normal samples and 177 PAAD samples. The clinicopathological information including gender, age, tumor grade, T classification, N classification, M classification, TNM stage, follow-up time, and survival status of the patients from TCGA-PAAD was also retrieved (10). We excluded samples with follow-up time shorter than 30 days and samples with missing clinicopathological information. For validation, gene expression data and clinical data of 70 PAAD patients in GSE62452 was downloaded from Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/). The flow of the study was shown in Figure S1.

Gene Set Selection
Two macrophage phenotype switch related gene sets (GSE5099_ CLASSICAL_M1_VS_ALTERNATIVE_M2_MACROPHAGE_ UP, GSE5099_CLASSICAL_M1_VS_ALTERNATIVE_M2_ MACROPHAGE_DN) (11) were selected from the Molecular Signatures Database v7.1 (MSigDB) (12) (https://www.gseamsigdb.org/gsea/msigdb/index.jsp). No overlapped genes were detected and a total of 382 genes were retrieved ( Table S1). The expression data of the 382 MRGs was extracted. Since the data was retrieved from public datasets and we followed the respective publishing guidelines, no ethics approval was needed.

Screening of Differentially Expressed MRGs
We collected the mean expression data of 382 MRGs comprising 177 PAAD and 171 non-tumor samples. The mean expression values were then normalized by log 2 transformation. The differentially expressed MRGs between tumor and normal samples were identified using the Wilcoxon signed-rank test in R (version 4.0.2, https://www.r-project.org/) with a threshold of | log(foldchange)| > 1 and a false discovery rate (FDR) < 0.05.

Identification of the Prognostic Signatures
We built the prognostic signatures in three steps: 1) we conducted univariate Cox regression analysis of the MRGs by "survival" in R. Genes with P < 0.05 were chosen for Lasso Cox regression analysis. 2) The Lasso Cox regression analysis (13) which was implemented with "glmnet" and "survival" packages in R was utilized to remove highly correlated genes and to prevent over-fitting. 3) Risk scores were calculated as the sum of each gene's expression levels multiplying the regression coefficient in the multivariate Cox regression model. Patients were dichotomized into high-risk group and low-risk group by median value of risk score. The Kaplan-Meier method was utilized to compare the survival outcome between high-risk group and low-risk group. Time dependent receiver operating characteristic (ROC) curves by R package "survivalROC" (14) was also used to determine the efficacy of the prognostic model.
We developed a prognostic nomogram predicting OS based on the Cox proportional hazard regression model by the "rms" package in R. A concordance index (C-index) was calculated to evaluate the performance of the nomogram.

Immune Phenotypes
Using R package "GSVA," single-sample gene set enrichment analysis (ssGSEA) was conducted to quantify the activity or enrichment levels of immune cells, functions, or pathways (15,16). The comparison of immune cell distribution between patients at high-risk group and patients at low-risk group was performed.

Mutation of the Four Key Drivers
The mutations of the four most prominent key drivers in pancreatic cancer progression, namely KRAS, TP53, CDKN2A, and SMAD4 (17), were downloaded. And the relationships between them and MRGs based risk score were gauged.

Cell Lines and Culture
Human pancreatic ductal epithelium cell line HPDE6-C7 was obtained from Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences (Shanghai, China). PDAC cell lines (MIA PaCa-2 and Capan-1) were acquired from the American Type Culture Collection (ATCC, Manassas, VA, USA). HPDE6-C7 was cultured in Dulbecco's modified Eagle's medium (DMEM) (Invitrogen, Carlsbad, CA, USA) added with 10% fetal bovine serum (FBS) (Invitrogen). MIA PaCa-2 was cultured in DMEM added with 10% FBS and 2.5% horse serum. Capan-1 was cultured in Iscove's modified Dulbecco's medium added with 20% FBS. All cells were cultured in a humidified 5% CO 2 incubator at a temperature of 37°C.

Statistical Analysis
All statistical analyses were conducted using R software (version 4.0.2), GraphPad Prism 7 (San Diego, CA, USA), and SPSS 20.0 (Chicago, IL, USA). The median OS was compared using the Kaplan-Meier method with log-rank test. The distributions of clinicopathological parameters between the high-risk and lowrisk groups were compared using chi-square tests, T test, or oneway analysis. Two-sided P value less than 0.05 was considered as statistically significant.

Survival−Related MRGs and the Prognostic Signature
The univariate Cox regression analysis and identified 59 MRGs that were significantly related to overall survival in TCGA-PAAD cohort ( Figure 1B). The Lasso regression analysis excluded 40 genes that may be highly correlated with other genes (Figures 1C, D). The rest 19 genes (KIF23, BIN1, GBP4, LAPTM4A, ERAP2, SLC22A18AS, CDK6, KIF20B, RNFT2, TES, ATP8B2, FAM118A, ARID5A, RGS16, TNFRSF21, UBASH3B, GPRIN1, ELMO1, RAPGEFL1) were then submitted to a multivariate Cox proportional hazards model, resulting in 9 candidate genes (KIF23, BIN1, LAPTM4A, ERAP2, ATP8B2, FAM118A, RGS16, ELMO1, RAPGEFL1) as significant predictors of prognosis ( Table 1) The median overall survival time of patients in the high-risk group was significantly lower than that in the low-risk group ( Figure 1E, P < 0.001). Expression profile of the included 9 MRGs (Figure 2A), distribution of patients ( Figure 2B), individual survival status ( Figure 2C) stratified by risk score were shown in Figure 2. Risk score remained to be an independent prognostic indicator in multivariate analysis (HR=3.129, 95% CI = 1.950-5.022, P < 0.001, Table 2, Figures 3A, B). Then we assessed the prediction efficiency of the risk score by drawing the time dependent ROC curve, which revealed that the risk score could predict the 1-year overall survival (AUC=0.765, Figure 3C), 3-year overall survival (AUC =0.793, Figure 3D), and 5-year overall survival (AUC =0.776, Figure 3E) for PAAD patients effectively. And the AUC of risk score was larger than those of the other enrolled clinocopathological factors.

Validation of the Risk Signature
We verified the robustness of our MRG based risk score with the patients from GSE62452 dataset. We calculated the risk score for patients and divided them into high-risk group and low-risk group in test dataset with the same formula derived from the training cohort as the TCGA-PAAD database. Patients in lowrisk group had obviously better overall survival (P < 0.001, Figure 4A). After adjusting for clinicopathological features, risk score remained to be an independent prognostic indicator in multivariate analysis (HR=2.385, 95% CI = 1.209-4.707, P = 0.012, Table 3). Then we assessed the prediction efficiency of the risk signature by drawing the time dependent ROC curve, which   revealed that the risk score could predict the 1-year overall survival rate (AUC = 0.617, Figure 4B), 3-year overall survival rate (AUC = 0.878, Figure 4C), and 5-year overall survival rate (AUC = 0.761, Figure 4D) for GSE62452 dataset effectively.

Construction of the Nomogram
The 166 patients with complete clinical information from the TCGA-PAAD dataset were used to establish a prognostic nomogram based on the stepwise Cox regression model ( Figure 3F). The MRGs based risk score, T classification, and N classification were finally included into the stepwise Cox regression model for the nomogram. The C-index of the nomogram was 0.723, which was higher than that of TNM stage itself as 0.535. Thus, the nomogram was regarded to be superior to the TNM stage in terms of predicting overall survival of PAAD.

Relationship Between MRGs Risk Score and "Key Driver" Mutations
Via the analysis of the relationship between the mutations of the four "key drivers" of pancreatic cancer and risk score, we found that patients with higher risk score were enriched for KRAS mutations (P<0.001, Figure 7A) and TP53 mutations (P < 0.001, Figure 7B).

Relationship Between MRGs Risk Score and Clinicopathological Features
Relationship between MRGs prognostic index and clinicopathological features were subsequently analyzed ( Figure 8). Significant higher risk scores were in patients with higher tumor grade (P < 0.001). In addition, the expression level of BIN1 was significantly related with advanced M (P < 0.001). The expression level of ELMO1 was significantly related with higher tumor grade (P = 0.008). The expression level of ERAP2 was significantly related with advanced N (P = 0.027). The expression level of KIF23 was significantly related with higher tumor grade (P < 0.001). The expression level of LPTM4A was significantly related with younger age (P=0.038) and higher tumor grade (P=0.024). The expression level of RGS16 was significantly related with higher tumor grade (P=0.015).

Gene Set Enrichment Analysis
GSEA compared the high-and low risk groups stratified by the risk score ( Figure 9). KEGG pathways enriched in the high-risk group included KEGG P53 signaling pathway, KEGG cell cycle, KEGG base excision repair, KEGG pancreatic cancer, and KEGG mismatch repair. KEGG pathways enriched in the low-risk group included KEGG glycine serine and threonine metabolism, KEGG chemokine signaling pathway, KEGG complement and coagulation cascades, KEGG cytokine cytokine receptor interaction, and KEGG T cell receptor signaling pathway.

DISCUSSION
The present study, to the best of our knowledge, was the first study constructing a prognostic index based on MRGs via analyzing the TCGA, GTEx, and GEO datasets. We identified nine prognostic MRGs. We evaluated the prognostic value of the nine-gene signature and constructed a nomogram. Our results suggested the promising value of tumor-associated macrophage in pancreatic cancer.
Due to its simplicity and practicality, TNM staging system remains to be the most widely used cancer staging system. It acts as the base for the prognosis prediction and the selection of therapeutic options (18). Meanwhile, diverse clinical outcomes may be observed in patients at the same TNM stage. Along with the development of precision medicine, it is increasingly recognized that the molecular biomarkers provide important prognostic information and clinical implication. With the development of large-scale gene sequence databases, some molecular prognostic features have been proposed. In the 8 th AJCC staging manual for breast cancer (19), the gene panel have been introduced into the staging system, which partly reflects the increasing recognization of the value of molecular prognostic factors. Immune dysregulation is pivotal in cancer progression. To date, researches on the immune dysregulation have largely focused on the T cell compartment (20, 21). Other immune subsets including macrophages, which constitute the largest portion of immune cells in TME, may also contribute to the cancer progression (22). In our study, we discovered that high risk score was associated with increased parainflammation and APC co inhibition. Parainflammation, which was initially described by Pribluda et al., referred to the moderately increased inflammatory cytokines which might promote the proliferation, angiogenesis, invasion, and migration. Parainflammation is thus considered as a promotion factor in carcinogenesis (23,24). Further, increased APC co-inhibition, along with the increase of MRGs based risk score, partly reflects  the defected neoantigen recognition, presentation, and anti tumor effects. Meanwhile, positive relationship between risk score and MHC class I indicates increased recognition by cytotoxic T cells (25). The effects may be compensated by the complicated immune network involving parainflammation and APC co-inhibition. The negative relationship between MRGs risk signature and B cells also suggests defected immune response. Of note, the positive relationship between risk score and type I interferon (IFN) response while the negative relationship between risk score and type II IFN response are intriguing. We speculated that the complicated biological function of the two types of IFN responses might accelerate the tumor development (26,27) in patients with higher MRGs based risk score, which called for more researches in the future. Immune checkpoints play a crucial role in carcinogenesis by promoting tumor immunosuppressive effects (28). Tumors can protect themselves from being attacked by stimulating immune checkpoint targets. In our study, immune checkpoints expressing in the tumor tissues as PD-L1 were upregulated in the group with higher MRGs based risk score. PD-L1, also known as CD274, is one of the ligands that binds to programmed death-1 (PD-1) on T cells and attenuates the immune response by downregulating the activity of antitumor T cells. In the PDAC microenvironment, PD-L1 is highly expressed in cancer cells, facilitating immune escape and cancer progression (29). Blockade of PD-L1 has gained survival benefits in various cancers including the NSCLC, myeloma, and kidney cancer. However, a previous study indicated that targeting PD-L1 for PDAC therapy was unsuccessful, as the response rate was < 3.1% (28,29). It was suggested that the macrophage phenotypic switch might compensate the effects by blocking of PD-L1 (30,31). The positive relationship between MRGs risk score and IDO1 also referred to defected anti-tumor immune response (32). The decreased CD8A expression indicated defected CD8 T cell infiltration and cytolytic effects (33). Our results further revealed the potential influence of macrophage phenotypic switch on immune checkpoints and immune escape. There are four major driver genes for pancreatic cancer, including KRAS, CDKN2A, TP53, and SMAD4 (17). KRAS mutations occurred in the early stage of pancreatic cancer. As a classic tumor suppressor gene, TP53 has mutation in many tumor types. In our study, we found that the mutation rates of KRAS and TP53 in the high-risk group was significantly higher than that in the low-risk group, which suggested that macrophage phenotype switch might be related to somatic mutations (34). Besides, Bishehsari et al. reported that KRAS mutations induced a protumorigenic phenotype in macrophages (35). These findings were consistent with our results.
The strength of our study came as it shifted from the T cell compartment to the macrophage compartment. We explored the relationship between the 9 MRGs risk score and immune landscape, "key driver" mutations and so on. And some basic experiments were performed. Admittedly, there were some limitations in this study. As a retrospective study, though we validated the nine MRGs risk score using training/testing sets from the public databases, it was limited by the selection bias. Secondly, the external validation of our risk signature in GEO datasets strengthened the robustness of our results. We should remind that more external validation was needed. Thirdly, we tested the expression of the MRGs in the cell line experiments. The biological processes and molecular mechanisms of the nine MRGs should be in depth evaluated in more in-vivo and invitro studies.

CONCLUSIONS
Our study suggested that nine macrophage phenotypic switch related gene signature had satisfactory prognostic ability. The nine-gene signature may provide us with a novel metric for prognosis prediction and more potential treatment targets for pancreatic cancer.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

AUTHOR CONTRIBUTIONS
M-xL and H-yW conceived and performed the study. C-hY, Z-lM, BJ, and LL guided the analysis. LZ polished the manuscript. D-rX guided the research process. All authors contributed to the article and approved the submitted version.