A Novel Computational Framework for Predicting the Survival of Cancer Patients With PD-1/PD-L1 Checkpoint Blockade Therapy

Background Immune checkpoint inhibitors (ICIs) induce durable responses, but only a minority of patients achieve clinical benefits. The development of gene expression profiling of tumor transcriptomes has enabled identifying prognostic gene expression signatures and patient selection with targeted therapies. Methods Immune exclusion score (IES) was built by elastic net-penalized Cox proportional hazards (PHs) model in the discovery cohort and validated via four independent cohorts. The survival differences between the two groups were compared using Kaplan-Meier analysis. Both GO and KEGG analyses were performed for functional annotation. CIBERSORTx was also performed to estimate the relative proportion of immune-cell types. Results A fifteen-genes immune exclusion score (IES) was developed in the discovery cohort of 65 patients treated with anti-PD-(L)1 therapy. The ROC efficiencies of 1- and 3- year prognosis were 0.842 and 0.82, respectively. Patients with low IES showed a longer PFS (p=0.003) and better response rate (ORR: 43.8% vs 18.2%, p=0.03). We found that patients with low IES enriched with high expression of immune eliminated cell genes, such as CD8+ T cells, CD4+ T cells, NK cells and B cells. IES was positively correlated with other immune exclusion signatures. Furthermore, IES was successfully validated in four independent cohorts (Riaz’s SKCM, Liu’s SKCM, Nathanson’s SKCM and Braun’s ccRCC, n = 367). IES was also negatively correlated with T cell–inflamed signature and independent of TMB. Conclusions This novel IES model encompassing immune-related biomarkers might serve as a promising tool for the prognostic prediction of immunotherapy.


INTRODUCTION
Tumor cells acquire numerous genomic alterations, deriving "non-self" neoantigens that the immune system can recognize. Although an immune response is noticed in patients with cancer, this response is usually ineffective to tumor elimination (1)(2)(3)(4). One of the reasons is the mechanism of immune escape, including profound local immune suppression, induction of dysfunction, tolerance in T-cell signaling, and evasion of immune destruction by the expression of endogenous "immune checkpoints" that generally lead to immune responses after antigen activation (5). These discoveries have increased cancer understanding and developed immunotherapy treatments such as immune checkpoint inhibitors (ICIs) (6). To date, ICIs therapy like anti-PD-1 has been successful for treating many cancers, particularly malignant melanoma, non-small cell lung cancer, and bladder cancer, among others (7)(8)(9)(10).
One challenge of the ICIs immunotherapy is the limited proportion of responders, which leads to the urgent need to find predictive biomarkers to identify responders from nonresponders (11). Emerging data suggest that patients overexpressing PD-L1 in tumors by IHC have improved clinical outcomes under anti-PD-1 immunotherapy (12). Although PD-L1 IHC seems predictive in lung cancer, it might not be suitable for many other cancers. The microenvironment of tumors has been recognized as a complex system, as the immune response is affected by many different mechanisms besides PD-L1 (13). Besides, IHC-based detection of PD-L1 as a predictive biomarker is confounded by multiple issues, many still unresolved so far, such as variable detection antibodies and cutoff values and the biomarker's stability and staining of tumor versus immune cells (12)(13)(14).
The development of gene expression profiling within tumors has enabled identifying prognostic gene expression signatures and patient selection (15,16). Recently reported studies had assessed the association of immune-related gene expression in patients with various solid tumors who received immunotherapy. For instance, a genome-wide analysis of melanoma patients treated with recombinant IL2 revealed a signature predictive of clinical response from pretreatment biopsies (14). Moreover, an IFNinflammatory immune gene expression signature is associated with both enhanced overall response rates (ORRs) and progression-free survival (PFS) in patients with melanoma who received pembrolizumab, which is subsequently being investigated in other malignancies (17). Other examples include an eight-gene signature reflecting pre-existing immunity, the T-effector/IFN-g signature, explored in a phase II trial of non-small cell lung carcinoma (NSCLC) (18). Although these studies revealed the intrinsic association between pre-existing immunity and the benefit of ICI therapy, the limitations still exist. On the one hand, these studies failed to consider the functional status of pre-existing immunity, which might affect the outcomes of ICI therapy to a large extent. On the other hand, the selection of those signature genes was mainly based on prior knowledge rather than data exploration, which might lead to insufficient application coverage of these signatures.
Previous research revealed two distinct mechanisms of immune escape in tumor (5;Joyce et al., 2015). One is T cell dysfunction, and the other is T cell exclusion. Approaches that measure immune functional signature based on the gene expression profile were developed to explore the correlation with clinical response of immunotherapy, such as Tumor Immune Dysfunction and Exclusion (TIDE) (19), which identified factors which underlie mechanisms of tumor immune evasion.
So far, most published prognostic gene expression signatures have been explored from the perspective of immune activation and elimination (20,21). However, as another essential character of the tumor microenvironment, immune-exclusive signatures play a suppression role and are rarely researched. Predicting clinical benefit to ICI therapy requires an understanding of how tumors escape the immune system.
In this study, we evaluated the immune-related gene expression profiles in patients with advanced NSCLC, skin cutaneous melanoma (SKCM), and also head and neck squamous cell carcinoma (HNSCC). We are supposed to find an immune signature that can explain the immune-exclusive statement of tumor samples and can predict response to anti-PD-1 checkpoint inhibitor independently of cancer type.

Differentially Expressed Genes
The software R package DESeq2 (V.1.30.1) was used to calculate the fold-change of transcripts and to screen for differentially expressed genes (DEGs) (27) in the RNA-seq data. A fold-change larger than two and an adjusted p-value less than 0.05 were set as the cutoff values for screening significant DEGs. Cluster analysis and heatmap generation were performed by the R package and ComplexHeatmap (V.3.12) (25).

Estimation of Immune-Cell Type Fractions
Cell-type identification by estimating relative subsets of RNA transcripts (CIBERSORT), which is a deconvolution algorithm that can characterize the cell proportion of complex tissues, based on LM22, a normalized gene expression profiles (GEPs) (23,24,29). In this study, CIBERSORT (https://cibersort. stanford.edu/) and leucocyte signature matrix 22 (LM22) were used to quantify the proportions of immune-cell types HNSCC samples from the TCGA data. Normalized gene expression data were analyzed by the CIBERSORT algorithm, running 1000 permutations. The CIBERSORT p-value reflects the statistical significance of the results, and a threshold less than 0.05 is recommended. Finally, samples with CIBERSORT p-values less than 0.05 were included in correlation analyzes between genes and immune-cell types.

Immune Gene Signatures
Twenty-three independent gene signatures tracking different cell types (e.g., CD8 T cells, NK cells, and Macrophage) and microenvironment (e.g., cytolytic and dysfunction signatures) were evaluated. Correlation coefficients between IES and different gene signatures were calculated. An unsupervised analysis was performed to cluster correlation coefficients with similar values together. The correlation coefficients of each signature can be found in Supplementary Table S2.

Statistical Analysis
Categorical variables were evaluated with Fisher's exact tests. Correlation analysis was assessed by Pearson coefficient. Multivariable Cox proportional hazards models were built with gene expressions as covariables. Stepwise regression was used to determine the most informative variables included in multiple (linear) regression models. ROC analysis was done using the cenROC package in R. Significance of overall survival (OS) and progression-free survival (PFS) was determined via Kaplan-Meier analysis with log-rank analysis. The hazard ratio was calculated by the cox function of the survival package in R. All statistical analysis was performed in the R statistical environment version 3.6.1. All tests were two-tailed and a p-value < 0.05 was considered significant.

Immune−Related Gene Expression in the Prognosis of Immunotherapy
To explore the immune-related genes that are related to the prognosis of immunotherapy, we introduced a cohort of 65 patients with advanced NSCLC (n=35), HNSCC (n=5), and SKCM (n=25) from Prat et al. (26). Patients were treated with anti-PD-1 monotherapy, and the expression profile of 730 immune-related genes on this cohort was collected. We compared the relationship between clinical characteristics and immune-related gene expression.

Prognosis and Clinical Response Prediction With Gene Expression Profile
Fifteen gene expression profiles according to patient prognosis are presented in Figure 1D. All patients were divided into two risk groups according to predict scores based on the regression equation of 15 gene expression profiles. The cutoff of low and high risk was based on the median value of the predicted score (cutoff = -0.133). The result of hierarchical cluster analysis was similar to the predicted-score grouping ( Figure 1D). Patients in the low-risk group were enriched in expression genes of cancersuppressing inflammation such as NFKB2 and CCR5. In contrast, patients in the high-risk group were enriched in expression genes of cancer-promoting inflammation such as ITGB1 and CD46.
To understand the relationship between clinical prognosis and 15 risk genes, we analyzed the correlation between progression-free survival (PFS) and individual gene expression ( Figure S2). Patients were divided into two groups according to the median expression level. We found that only IL1RAPL2 expression showed a significant correlation in clinical prognosis ( Figure S2). The expression of the remaining 14 genes tended to predict prognosis to various extents, though it did not reach significance. When merging the expression of 15 genes, patients in the high-risk group had 2.48-fold higher risk of death compared with patients in the low-risk group (hazard ratio [HR]: 2.48, 95% confidence interval [CI]: 1.34, 4.59, p=0.003, Figure 2A; Table S2). We applied R package cenROC to analysis the ROC efficiencies and found that the area under the ROC curve (AUC) of 1-and 3-year prognosis were 0.842 and 0.82, respectively ( Figure 2B). Similar results were observed by cancer types of non-squamous NSCLC and SKCM ( Figure S3). Cancer types of squamous NSCLC and HNSCC did not reach significance due to the small candidate size ( Figure S3).
Based on the survival analysis results, we evaluated the predicted score and made a prediction model of clinical response in patients. First, we analyzed the predictive performance of every single gene by ROC curve (Figure S4). The objective response's largest AUC was 0.637 of PLA2G6 gene. None of the gene expression was significantly related to the response of anti-PD-1 therapy, including PD-1 (Figures S5, S6). However, the predictive risk score value from 15 immune-related gene expression levels was observed to increase significantly. This combination predictive model had an AUC of 0.731, higher than the AUC of PD-L1 expression (AUC=0.625, Figure S7). Patients without clinical response had a higher value of predicted score ( Figure 2C). In particular, only 6 of 33 patients in the highrisk group had an objective response after anti-PD-1 therapy   Figure 2D). Together, the expression profile of 15 filtered risk genes was correlated with the prognosis and clinical response of patients treated with anti-PD-1 therapy.

Pathway and Gene Ontology Analysis
Revealed the Difference in Immune Activities Between Two Risk Groups To identify the inner differences in the tumor microenvironment between two risk groups divided by the predicted score of 15 genes expression, an unsupervised analysis of 730 immune-related genes and risk classification was performed in Figure 3A. We observed that cluster 2, which mainly consisted of patients with a low-risk score, was enriched with a large number of highly expressed immune eliminated cell genes, such as CD8+ T cells (PRF1, CD8A, CD8B, GZMM and FLT3LG), CD4+ T cells (IL26 and IL17A), NK cells (SPN, BCL2 and NCR1) and B cells (BLK and CD19). Then the analysis of differential expression genes was performed between two risk groups, and the expression of 62 genes (of 730) was identified as statistically altered (p <0.05) ( Figures 3B, S8; Table S3). The majority of differentially expressed genes displayed decreased expression in the high-risk group. The greatest downregulation of differential expression gene was MS4A1 (2.64 folds), while the expression of ARG1 and S100A7 in the high-risk group upregulated 2.52 folds and 2.66 folds, respectively. To better understand the differential expression genes discovered above, we performed pathway enrichment analysis of all differential expression genes between two risk groups by computing their KEGG term and biological process associations. Our analysis generated a total of 19 KEGG terms with a significant p-value (p < 0.05, Benjamini-Hochberg corrected) ( Figure 3C). Among these KEGG terms, 'Cytokine-cytokine receptor interaction' attracted the highest number of differential expression genes, 16 of which seven were discovered among the top twelve differentially expressed transcripts (Table S4). After gene ontology analysis, we found ten biological process terms that highly correlated with the differential expression genes (p < 0.001, Benjamini-Hochberg corrected) ( Figure 3D). Most terms were immune-related receptor activity (i.e., T cell, CCR chemokine and CXCR chemokine) and cytokine activity.
Immune cells are essential components of the tumor microenvironment and closely correlate with immunotherapy responses. We used the CIBERSORT software to assess the abundances of 22 different immune-cell types in 65 patients (Table S5). Similar to the findings of KEGG pathways, naïve B cells, CD4 resting-memory T cells, and activated natural killer (NK) cells account for more enormous proportions of the infiltrating immune cells in the low-risk group than in the high-risk group ( Figure 3E).

The 15-Gene Expression Profile Is Correlated With Immune-Excluded Microenvironment Characteristics
Path enrichment analysis was also applied to understand the key pathways and biological processes involved in the 15-gene expression profile. Significant pathways and GO terms with two or more enriched genes were selected. The results showed that all of the selected pathways and GO terms were highly relevant to the tumor immune response, such as cytokine signaling, interleukins signaling and lymphocyte activation ( Figures 3F, G). These critical steps of immune response explained the expression of 15 risk genes that could predictively evaluate the response to immunotherapy.
To further explore the association between the predictive risk score and various immune-related signatures, Pearson correlation analysis was performed to calculate the pairwise correlations among 24 signatures in 65 patients ( Figure S9). The result of hierarchical clustering revealed that our predictive risk score was positively correlated with M2 macrophage signature, cancer-associated fibroblast (CAF) signature and tumor exclusion signature, while negatively correlated with immune-elimination-related signatures such as cytotoxic T lymphocytes (CTL) signature, MHC-II signature and interferon-gamma (IFN-g) signature. This observation helps us understand that patients with high predicted risk scores received worse clinical outcomes could be explained by the immune exclusion status in their tumor microenvironment to some extends. We also found that the predictive risk score was positively correlated with the expression of classical immune checkpoints, such as PD1, PD-L1, and CTLA4 ( Figure 3H). Thus, we termed our 15-gene risk score as "Immune Exclusion Score" (IES).

Validation of IES Score in Multiple ICIs Therapy Cohorts
To validate the robustness and eligibility of the IES, we collected three more cohorts that underwent ICIs treatments. All of the patients in the three cohorts were with advanced melanoma, including Liu cohort (n=121), Riaz cohort (n=41) and Nathanson cohort (n=24) (31)(32)(33). IES scores were calculated among patients in three cohorts and applied to predict the clinical response of immunotherapy (Tables S6-S8). ROC analysis showed that the ROC efficiencies of 1-prognosis were 0.547 and 0.622 in Liu and Nathanson cohort, respectively (1year prognosis AUC was unevaluable in Riaz cohort). While the ROC efficiencies of 3-year prognosis were 0.622, 0.558 and 0.836 in Riaz, Liu and Nathanson cohort, respectively ( Figures 4A-C). Patients were divided into IES high and low groups according to the Youden index and the threshold value were -0.0006, -0.0001 and -0.0015 in Riaz, Liu and Nathanson cohort, respectively. We observed a better survival rate and more extended survival advantage in patients within the IES low group in all of the three cohorts ( Figures 4D-F). In the Riaz and Nathanson cohorts, the cutoff value was close to the median value of IES among patients. We found that patients with high IES were correlated with worse clinical response in three cohorts, while the objective response rates of the low IES group among three cohorts were 31.8%, 43.9% and 60%, respectively ( Figure S10). These results confirmed the predictive performance of clinical outcomes of IES in immunotherapy.
Since the cancer type of all three cohorts used for verification above was melanoma, we introduced another immunotherapy cohort of clear-cell renal cell carcinoma(ccRCC) from Braun et al. (34). The ccRCC cohort contains 181 patients treated with anti-PD-1 therapy. Previously studies revealed that traditional biomarkers of immunotherapy such as PD-L1 and TMB didn't show the ability to distinguish the clinical outcome in ccRCC and the microenvironment of CD8+ T-cell infiltration was related to poor prognosis of immunotherapy. After calculating the IES of ccRCC patients, survival analysis and ROC analysis were applied to evaluate the performance of IES. Intriguingly, our results indicated that patients with high IES, whose tumor microenvironment had the feature of immune exclusion, were correlated with longer PFS (p=0.02, IES cutoff=-0.0976) ( Figure  S11; Table S9). This result was consistent with the findings in ccRCC, although different from the majority of understanding in the tumor microenvironment. Previous studies proved that PBRM1 mutation was promoting factor of ccRCC ICIs therapy. Thus, we applied IES in the PBRM1-mut subgroup of the ccRCC cohort ( Figure S11). The findings in the datasets above showed that patients with high IES scores showed a worse prognosis of OS in the PBRM1-mut subgroup, which indicated that IES could further filter patients with worse clinical outcomes and prognosis ICIs therapy who acquired PBRM1 mutations.

Comparison Between IES and Other Biomarkers of Immunotherapy
Recently, the expression of genes related to cytolytic immune activity was associated with clinical response to ICIs in certain tumors (35,36). A previous study discovered a T cell-inflamed 18-gene expression profile (GEP) shown to predict response to anti-PD-1 therapy (37). To compare the performance on clinical response to anti-PD-1 therapy between GEP and IES, T cellinflamed GEP was assessed in all patients from the Prat cohort, Liu cohort, Riaz cohort and Nathanson cohort ( Figure S12). We found that higher T cell-inflamed GEP scores were also positively associated with response and prognosis in Prat cohort, Liu cohort, Riaz cohort and Nathanson cohort, showing that the T cell-activated tumor environment also affects response in addition to IES. However, significance was not demonstrated in the subgroups by cancer types in Prat cohort when using GEP as a predictor. We also found that a higher proportion of responders were enriched in the "low-risk" group by IES, compared with GEP. We next evaluated the correlation between GEP and IES among those patients. Negative correlations were found between GEP and IES, and 57.1% of shared patients were selected as low risk by both GEP and IES ( Figure 5A). So IES could be a necessary complement to GEP.
Furthermore, we evaluated the relationship between TMB and IES in Liu and Riaz cohort. IES score was found independent of TMB in the discovery and validation datasets with correlations of -0.11 and 0.011, respectively ( Figure 5B). This result indicated that the IES score could be applied independently or jointly with TMB in predicting the response of ICIs therapy.

DISCUSSION
The immune checkpoint inhibitors (ICIs) have made remarkable progress in the clinical treatment of tumors in the past decade (38)(39)(40). However, immune escape mechanisms and immune resistance to ICIs therapy have not been well-studied. Here, we discovered and developed an immune exclusion-related 15-gene risk score termed "Immune Exclusion Score" (IES) to predict the clinical response and prognosis to anti-PD-1 therapy. Limited clinical outcomes and prognosis to anti-PD-1 treatment occurred in patients with high IES risk scores in the discovery and validation datasets. Besides, IES was also found as a necessary complement to T cell-inflamed GEP signature and independent to TMB, reflective of the relationship of GEP and TMB to IES. These observations suggest that using the IES immune exclusion biomarker may help identify patients who are responsive to anti-PD-1 therapies and explain the mechanism of immune exclusion in the tumor microenvironment.
The differential gene expression and pathway analysis were evaluated between two risk groups divided by the IES scores. Our results showed that patients with low IES score enriched with highly expressed immune eliminated cell genes of CD8+ T cells, CD4+ T cells, NK cells and B cells. Most of the enriched pathways were related to immune-cell membrane receptor activity and cytokine activity. We also observed that naïve B cells, CD4 resting-memory T cells, and activated natural killer (NK) cells infiltrated more enormous proportions in the low IES group than in the high IES group.
Our study results may explain the low response rates and the limited efficacy of ICIs for patients with high IES scores. The IES was positively correlated with immune exclusion signatures such as M2 macrophage signature and CAF signature. In contrast, IES was negatively correlated with immune-elimination-related signatures such as CTL signature, MHC-II signature and IFN-g signature. In addition, the IES score was positively correlated with the high expression of immune checkpoints like PD1, PD-L1 and CTLA4, which indicated the status of immune suppression among IES-high patients.
The limitations of this study still exist. Firstly, the development and validation of IES were conducted on four retrospective cohorts. Prospective clinical studies are needed to verify the clinical efficacy of IES as a predictive biomarker for immunotherapy. Secondly, the prognostic model is built based only on gene expression data. A model involving more types of data, especially pathological images might be able to improve the prediction accuracy (50,51). Thirdly, due to the limited size of patients/cohorts, IES was confirmed in a small number of cancer types, such as NSCLC, melanoma, ccRCC and NPC. Data on more cancer types are needed to prove the broad applicability of IES. Besides, all of the cohorts introduced in this study were treated with ICI monotherapy. Combined treatment approaches with ICIs, such as ICI combined with chemotherapy, demonstrated a superior clinical response in recent trials. A more significant implication will illustrate if IES is successfully validated in the combination of ICIs and chemotherapy.

CONCLUSIONS
Our data demonstrate that IES can be used to categorize tumors into different subgroups that exhibit distinct patterns of potentially recognizable biology to enhance clinical response. Although the utility of IES, T cell-inflamed GEP and TMB, as well as other emerging agnostic biomarkers, need further validated for response prediction to various immunotherapy approaches, including combination therapies, these findings provide the possibility for further exploring the utility of these biomarkers as guides for clinical precision cancer immunotherapy.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.

AUTHOR CONTRIBUTIONS
XS, HJ, ZG, and ML contributed to research design. Data analysis was carried out by HJ, JW, JX, and XL. JY and XD developed the algorithm and refined the prediction model. ND, HL, and TG collected clinical information that was used in the study and critically revised the manuscript. XS, HJ, and ND wrote the manuscript. All authors provided feedback and approved the final version.

ACKNOWLEDGMENTS
We thank all patients and researchers involved in this study. Objective response rate (ORR) between high-and low-risk patients in the three cohorts. ORR is defined as having a partial response (PR) or a complete response (CR); NOR is defined as having no PR or CR.
Supplementary Figure 11 | The predictive value of IES score in ccRCC cohort. (A) ROC curves and AUC values, (B) Objective response rate (ORR), and (C) Kaplan-Meier survival curves of PFS between high-and low-risk patients in the total ccRCC cohort, PBRM1-mut subset, and PBRM1-wt subset. ORR is defined as having a partial response (PR) or a complete response (CR); NOR is defined as having no PR or CR.
Supplementary Figure 12 | The predictive value of GEP score in Prat, Liu, Riaz, and Nathanson cohorts. (A) Objective response rate (ORR) and (B) Kaplan-Meier survival curves of PFS or OS between high-and low-risk patients in the four cohorts. ORR is defined as having a partial response (PR) or a complete response (CR); NOR is defined as having no PR or CR.