Abstract
Recently, tumor immunotherapy based on immune checkpoint inhibitors (ICI) has been introduced and widely adopted for various tumor types. Nevertheless, tumor immunotherapy has a few drawbacks, including significant uncertainty of outcome, the possibility of severe immune-related adverse events for patients receiving such treatments, and the lack of effective biomarkers to determine the ICI treatments’ responsiveness. DNA methylation profiles were recently identified as an indicator of the tumor immune microenvironment. They serve as a potential hot spot for predicting responses to ICI treatment for their stability and convenience of measurement by liquid biopsy. We demonstrated the possibility of DNA methylation profiles as a predictor for responses to the ICI treatments at the pan-cancer level by analyzing DNA methylation profiles considered responsive and non-responsive to the treatments. An SVM model was built based on this differential analysis in the pan-cancer levels. The performance of the model was then assessed both at the pan-cancer level and in specific tumor types. It was also compared to the existing gene expression profile-based method. DNA methylation profiles were shown to be predictable for the responses to the ICI treatments in the TCGA cases in pan-cancer levels. The proposed SVM model was shown to have high performance in pan-cancer and specific cancer types. This performance was comparable to that of gene expression profile-based one. The combination of the two models had even higher performance, indicating the potential complementarity of the DNA methylation and gene expression profiles in the prediction of ICI treatment responses.
Introduction
Cancer immunotherapy based on immune checkpoint inhibitors (ICI), such as antibody-mediated interventions targeting cytotoxic T lymphocyte antigen-4 (CTLA-4), programmed death receptor-1 (PD-1) on T lymphocytes, and the principal ligand (PD-L1), has recently caught much attention in the field of cancer therapy for its high efficiency for reversing the tumor-induced immunosuppression and yielding a durable clinical response for a wide range of tumor types. ICIs are now used as single agents or combined with chemotherapies as first or second lines of treatment for about 50 cancer types (). There are more than 3,000 active clinical trials evaluating ICIs by now, representing about 2/3 of all oncology trials ().
A major limitation of tumor immunotherapy, especially those based on the ICI, lies in that only a fraction of cancer patients could respond to the therapy (, ), and severe immune-related adverse events (irAEs) are frequently seen in some patients undergoing the ICI therapy (). These adverse events are mainly due to the inhibition of immune checkpoints that reinforce the normal physiological barriers against autoimmunity, leading to various local and systemic autoimmunity ().
Therefore, the development of the biomarkers evaluating the responsiveness of a patient to the ICI therapy is key for the application of immune therapy in a wider range. Several biomarkers have been proposed, including the tumor mutation burden (TMB) (), the neoantigens (, ), the overexpression of targeting genes such as PD-L1 (, ), and the amount and composition of the tumor-infiltrating immune cells (, ). However, the predictive powers of these biomarkers are still not applaudable with unwanted tumor type specificity. For example, TMB failed to predict the responsiveness to PD-L1 ICI treatment in non-small-cell lung cancer (NSCLC) (, ). The objective response rate to anti-PD-L1 treatment of colorectal cancer (CRC) patients with high microsatellite instability (MSI) values was also only about 40 to 70% ().
Other than biomarkers mentioned above, DNA methylation could potentially be a rich source of biomarkers for ICI responsiveness (). Besides its role in tumorigenesis by regulating gene expression (, ) and promoting somatic mutations and structural variations (, ), DNA methylation profiles have long been recognized as indicators for the status of tumor immune microenvironment as well. For instance, demethylation of transcription start sites (TSSs) of key effector genes, such as Interferon Gamma (IFNG), Granzyme B (GZMB), C-C Motif Chemokine Receptor 7 (CCR7), and Transcription Factor 7 (TCF7), indicates the stimulation of naive CD8+ T cells (). Genome-wide DNA-methylation landscape defines specialization of regulatory T cells (). Clonal expansion of T cells from naive T cells to effective T cells is associated with distinct DNA methylation landscapes (). The tumor immune infiltration analysis based on DNA methylation profiles has been successful in a variety of tumor types (). It has also been pointed out recently in an NSCLC cohort that the global DNA methylation loss is tightly related to the poor outcome of ICI therapy ().
We first showed the potential of DNA methylation profiles in predicting the responsiveness to ICI therapy and then selected a combination of methylation sites with prediction power. Next, we built a machine learning model to predict immune therapy responsiveness based on the selected methylation features. The model has high prediction accuracy at both pan-cancer level and tumor type specific level. The performance was validated in an independent cohort and is comparable to that of previously reported models based on gene expression profiles. We also showed that the combination of DNA methylation and gene expression profiles overtops models based on single types of biomarkers, indicating the possibility to improve the prediction accuracy by combining different types of biomarkers.
Materials and Methods
Datasets
We downloaded the DNA methylation data measured by Illumina Infinium HumanMethylation450 BeadChip (β values), together with the gene expression data measured by RNA-seq (read counts), the somatic mutation data (MC3 public version), and the overall survival (OS) information of 32 tumor types from the Cancer Genome Atlas [TCGA, downloaded from the GDC data portal (portal.gdc.cancer.gov) in October 2020]. There were 7,131 cases in total. The name, abbreviations, and number of cases of each tumor type are listed in Supplementary Table S1.
The tumor mutation burden (TMB) was calculated as described before with some small modifications (). In each case, mutations annotated as “Frame Shift Ins”, “Nonsense Mutation”, “Frame Shift Del”, and “Splice Site” were defined as truncation mutations, while those annotated as “Missense Mutation”, “In Frame Del”, “In Frame Ins”, and “Nonstop Mutation” were defined as non-truncation mutations. TMB value was then calculated as
The distribution of calculated TMBs is shown in Supplementary Figure S1.
The microsatellite instability (MSI) state for each case was calculated by the MSIpred package as stable or unstable ().
The cases that simultaneously satisfy the following two criteria are defined as responsive to ICI treatment (positive), and the rest were defined as non-responsive to the treatment (negative). First, the TMB value of the case should be higher than the upper quartile of TMB values in all cases ignoring tumor types. Second, the transforming growth factor β (TGF-β) score [defined in () as the weighted average of normalized expression of genes in the TGF-β signaling pathway summarized in (), taking the regularity direction between genes as weights] should be smaller than the median of the values in all cases ignoring tumor types. It has been shown that both the TMB and the TGF-β score were fine and independent measurements of the responsiveness to ICI treatment (–), and the combined measurement has also been successfully used in other pan-cancer studies (). It is also a practical consideration to define the responsiveness indirectly since there is no large pan-cancer ICI treatment cohort with DNA methylation levels measured for building the models to the best of our knowledge. The number of cases labeled as positive in each tumor type is shown in Supplementary Figure S2. It was noticeable that this definition led to severe class imbalance, with 9.86% cases labeled as positive. This issue was solved by the random oversampling scheme in the model-building step described below.
The NSCLC validation cohort data were retrieved from literature () and (). The clinical information was downloaded from the Supplementary Tables in corresponding literature. The DNA methylation profiles were downloaded from the associated datasets (GSE119144 and GSE126043) in Gene Expression Omnibus (GEO). Missing values were imputed as described in the raw pieces of literature (Supplementary Table S2).
Differential Analysis
To perform differential analysis of methylation data, each β value was first transformed into M-value by the transformation, . Then differential analysis was carried out using limma package as common practices (). Probes with absolute logFC > 1 and adjusted p-value less than 0.01 were determined as differential methylated probes (DMP).
RNA-seq read counts for each gene were directly inputted into the DESeq2 package for differential expression analysis (). Genes with log2 fold change greater than 1 and adjusted p-value less than 0.01 were determined as differentially expressed.
Predictive Model Building
Feature Selection
There are two notable features of DNA methylation data that complicate the analysis. First, the data are high dimensional, with the number of probes (features) amounting to 480K in Illumina Infinium HumanMethylation450 BeadChip. Second, there was also strong collinearity among different probe groups. These two points made most popular ad hoc feature selection methods inefficient, which hindered the prediction model building process. Here we describe a feature selection process based on prior knowledge. The basic idea behind this process was that cases in most tumor types could be grouped into two clusters based on the immune infiltration analysis from DNA methylation profiles (). The two clusters differed in the compositions of immune cells and potentially in the responsiveness to immune therapies. In particular, we applied the analysis pipeline described by Hyunchul et al. to the 32 TCGA tumor types separately (). In each tumor type, cases were clustered using PAM clustering into two groups, and the group with longer mean survival time was set as cluster 2. The best number of clusters were inspected using a combination of 13 indicators with R package “NbClust,” and an optimal value 2 was observed in 20 out of 32 tumor types (). For the remaining types, cases were also clearly separated into two groups by visual inspection.
To evaluate the potential relationship between the immune infiltration–based clustering of cases and the responsiveness of the ICI treatment, we selected three indicators of the responsiveness: the overall survival time (OS), the TMB, and the expression level of PD-L1 (CD274) gene. The three indicators were then subjected to appropriate tests for calculating the p-value of differences between the two clusters for each tumor type (log rank test for OS, Mann Whitney U test for TMB, and directly taken from the differential expression results for PD-L1 expression).
For each indicator, differentially methylated probes (DMPs) between the two clusters with significance together with the directions of the differences were extracted in each tumor type. DMPs with the same direction in half or more such tumor types were selected as features. We also added the 1152 probes in the signature used in the immune infiltration analysis with positive β value in at least one case () to make the final selected feature set.
Model Building
We evaluated a series of commonly used machine learning algorithms [logistic regression with L1 regularization (LR), support vector machine classifier (SVM), random forest (RF), and k-nearest neighbor classifier (kNN)]. The best parameter combinations for each model were selected by performing a grid search in their spaces. Each parameter combination was evaluated using 5-fold cross-validation. For the SVM model, we used the radial basis function as the measurement of the inner product. The scale parameter of the basis function was taken as the mean variance of all features, and only the multiplier λ was tuned for the tune of the scale parameter led to few improvements of the model performance (data not shown). In the model training step, a random oversampling step was added to deal with the severe class imbalance (). This scheme resampled the positive training cases with replacement to the number of negative ones and then used these balanced training data for model training. The performance of the best models was evaluated by averaging the results of tests on 100 times’ random 80 vs 20% train test splits. The F1 score, Matthews correlation coefficient (MCC), and area under the receiver operating characteristic curve (AUC, if decision scores exist) were calculated as the measurements of the model performance in each test. These measurements were reported to be stable when the classes were severely imbalanced (). The feature selection and model building processes were illustrated in Figure 1.
Figure 1
Results
The Responsiveness of Tumor Cases to the Immune Checkpoint Inhibitor Was Highly Related to the Methylation Level of a Small Set of Methylation Sites
This study was focused on predicting the responsiveness of immune checkpoint inhibitor (ICI) at the pan-cancer level. To achieve this goal, the 7,131 TCGA cases from 32 tumor types with both DNA methylation (measured by Illumina Infinium HumanMethylation450 BeadChip, about 480K probes measured in total) and gene expression (measured by RNA-seq) data were downloaded (in October 2020). Cases with high tumor mutation burden (TMB, see the distributions of TMBs in Supplementary Figure S1) and low TGF-β expression were defined as responsive to ICI (positive) and the others as without responsiveness (negative) as previously suggested (Methods, see Supplementary Figure S2A for the number of cases marked as positive in each tumor type) (–, ) since large pan-cancer tumor immunotherapy cohort with DNA methylation level measured was lacking. A series of well-known biomarkers, such as the concentration of CD8+ T cells, the expression level of PD-1 and CTLA4 genes, were significantly separated between positive and negative cases (all with p < 0.01, Supplementary Figures S2B–D), elucidated the rationality of our classification. It was reported that the clustering based on immune infiltration analysis using DNA methylation data in most tumor types grouped cases into two clusters, and the clusters were shown to be related to the outcome of immunotherapy (). Based on this background knowledge, we designed an efficient feature selection scheme to select 1,495 methylation probes from the total 480K candidate methylation probes, which distinguished cases with ICI responsiveness and those without (see Methods for detail). In the first step of the feature selection, the analysis was restricted to tumor types with more than 5 cases marked as positive (13 types in total). Tumor cases were clustered into two clusters based on the immune infiltration profiles derived by the DNA methylation profiles. The size of the two clusters for each used tumor type is shown in Supplementary Table S1. Then we tested differences between the two clusters in each tumor type for three indicators for the responsiveness of ICI treatment, overall survival (OS), tumor mutation burden (TMB), and the PD-L1 gene expression level. After applying the Benjamini-Hochberg FDR correction scheme, we obtained 6, 7, and 7 tumor types with significant differences of these indicators, respectively (FDR<0.01, see Supplementary Figures S3A–C for each indicator Supplementary Figure S3D for the overlap of tumor types selected for the three indicators). For each indicator, we selected those methylation probes which differentially methylated with the same direction in at least half of these tumor types with significant differences in the indicators between groups. We got 890, 862, and 534 feature probes for the indicators OS, TMB, and PD-L1 expression, respectively (Supplementary Figure S3E). So far, 11 tumor types and more than 85% of positive cases are covered (Supplementary Figures S2A and S3D). We then merged the feature probes selected for each indicator to obtain 1,495 final probes (the selected features are listed in Supplementary Table S3). The 1,152 probes in the signature used in the immune infiltration analysis () were added and yielded a final feature set with 2,546 probes.
The efficiency of the 2,546 features was revealed by the distinct methylation profiles between cases with ICI responsiveness and those without by visual inspection of the methylation pattern of all cases clustered by the methylation pattern (Figure 2A). This pattern was also illustrated by the mutual dependence between the responsiveness of neighboring cases along the dendrogram of the hierarchical clustering (p < 10-52, χ2 test). Moreover, compared to randomly selected, equal-sized control feature sets, clustering based on methylation pattern of selected probes is much better at separating cases with and without ICI responsiveness. In 100 times of such comparisons, the selected features outperformed 92 and 99 random controls when the separation was measured using F1 score and Matthews correlation coefficient (MCC) score, comparing the responsiveness and cluster labels when clustering the cases into two clusters with the methylation patterns (Figure 2B). Also, a large portion of selected probes were under differential methylation between cases with and without responsiveness. When performing the differential methylation analysis among all 32 tumor types, there were 8.76% of selected probes with differential methylation, compared with 5.22 to 7.62% in the 100 randomly selected control features under the threshold of adjusted p-value less than 0.01 (Figure 2C). At last, the GO and KEGG enrichment analysis of genes in which the selected probes located showed enrichment of terms related to immune activity and immunotherapy outcome, such as “T cell activation”, “adaptive immune response”, and “regulation of lymphocyte activation” in the GO enrichment analysis, and “primary immunodeficiency” and “Th1 and Th2 cell differentiation” in KEGG enrichment analysis (Figure 2D).
Figure 2
The Responsiveness of Cases to ICI Treatment Could Be Predicted by the Selected Probes
After the prediction potentials for the selected methylation sites for the ICI treatment responsiveness were elucidated, a prediction model was built and tested from various aspects. Here we tested a series of commonly used machine learning models, including the support vector machine (SVM), logistic regression with L1 regularization (LR), the random forest (RF), and k-nearest neighbor classifier (kNN). The hyper-parameters of each model were tuned by 5-fold cross-validation. In each model training, the positive cases were oversampled to the size of negative cases to deal with the severe class imbalance (). The performances of the optimized models were assessed by randomly splitting the raw data into 80% training and 20% test datasets for 100 times. For comparison, we also added a naive predictor based on the clustering results derived from the immune infiltration analysis based on the DNA methylation profiles. In each tumor type, the naive predictor declared the cluster of cases with higher average TMB as positive and the other cluster as negative.
Among the four assessed models, SVM got the highest performance. The fine-tuned SVM model in the 5-fold cross validation (with λ = 13) outperformed the other three models (p = 0.041, 3.83×10-28 and 7.33×10-32 for LR, RF, and kNN, respective, paired t test) when the performance was measured using F1 score, and Matthews correlation coefficient (MCC, p = 0.011, 1.30×10-25 and 8.11×10-33, Figure 3A). All the machine learning–based models significantly outperformed the naive predictor in both measurements (all with p < 10-30, Figure 3A).
Figure 3
The prediction power of the model cannot be achieved by randomly selected probes. To show this, we randomly selected methylation probe sets of the same size as the selected probe set for 100 times. The performances were measured by repeatedly training models on 80% randomly chosen samples and tested on the other 20% 100 times along all the regularization parameter λ values of the SVM model searched in the cross-validation step (1 to 20). The performances of models based on the selected probes were consistently and significantly higher than those of random controls when the performance was measured by F1 score, while the same conclusion held under most λs (17 out of 20) when the performance was measured using MCC score (Figure 3B).
The probes selected by the four indicators were all important to the model performance. This was shown by retraining and evaluating the SVM model with probes selected by only one indicator. The performances were all significantly decreased compared to the full model no matter measured by the F1 score or the MCC score (Figures 3A, C, all with p < 10-4, paired t tests).
To exclude the possibility that the model performances were due to overfitting, we tested our prediction model with 100 randomly permuted datasets and expected a sharp shrink of model performances in these permutated datasets. We trained and evaluated SVM models with the same super-parameters in each permutated dataset as described above. The performances (measured by both F1 and MCC scores) of models in these permutated datasets were only slightly higher than the performance measurements calculated by comparing the responsiveness and its random permutation (Figure 3D).
At last, we assessed the separation of a series of well-known biomarkers for ICI responsiveness between groups predicted as positive and negative independent from those used to label the cases. The comparisons were made under the test sets of the 100 randomly trained models in the model evaluation step. First, the PD-1 and CTLA4 gene expression levels were significantly higher in cases predicted as positive than those predicted as negative (both with p < 10-32, Mann Whitney U test, Figures 3E, F). Second, there were also outstandingly more CD8+ T cells in cases in the positive cases than negative ones (p < 10-52, t test, Figure 3G). Last, there were higher proportions of cases with microsatellite instability (MSI) () in those positive cases than in those negative ones (p < 10-52, t test, Figure 3H).
In conclusion, the fine-tuned SVM model based on the selected probes came up with high performances in prediction of the ICI responsiveness in this pan-cancer cohort. The prediction accuracy is remarkably better than randomly selected probe sets of the same size, and we confirmed the improvement cannot be achieved by overfitting.
The Prediction Model Based on Methylation Data Was Comparable and Complementary to That Based on Gene Expression Profiles
It was reported previously that the ICI responsiveness could be predicted by the gene expression levels of genes indicating the immune state of the cases in the pan-cancer level (). Here we compared the performances of the model based on DNA methylation levels with that of models based on gene expression level (). The gene expression level–based model was built exactly the same as reported before (), except for taking the advantage of random oversampling to account for the class imbalance. The accuracy of the native built model was consistently higher than that originally reported [the mean and 95% confident interval MCC score under the 100 random test splits were 0.445 (0.371, 0.501) and 0.296(0.287, 0.306), for the native built one and originally reported, respectively] ().
The performance of the methylation-based model was competitive with that based on gene expression levels. When assessing the performances using 100 times random splits during the model evaluation step, the methylation-based model got better performances when measured by F1 score (with mean F1 score 0.4971 and 0.4844, p = 2.82×10-4 Wilcoxon signed rank test, Figure 4A). The performances were not notably different as measured by MCC score (with mean MCC score 0.4516 and 0.4446, p = 0.06, Figure 4B), while lower when measured using AUC score (with mean AUC score 0.8914 and 0.8949 p = 0.01, Figure 4C). The comparability between the performances of two models was also indicated by the closely located ROC curves between the two models (Figure 4D).
Figure 4
The genes in which the selected methylation sites located were then compared with those selected in the gene expression–based model. An enormous distinction between the two gene sets was observed. Only 189 out of 2,023 genes selected in the methylation model were found in the genes selected in the expression-based model (2,614 in total). The two gene sets also enriched few common GO and KEGG terms (Figures 2D and 4E). These observations indicated that the prediction power of DNA methylation and gene expression profiles could be complementary, and combining the two profiles would further enhance the prediction power.
To validate our assumption of complementarity of DNA methylation and gene expression profile, we further developed an SVM model based on the combination of the methylation levels of the selected methylation sites and the gene expression levels in the gene expression–based models. The best super-parameter λ was selected using 5-fold cross-validation. Performances of the selected model were again assessed in a 100 times random split of the cases into 80% training and 20% testing sets.
The combined model surpassed both the methylation-based and gene expression–based ones under all performance measure [all with p < 10-10 for measurement F1, MCC, and AUC scores (Figures 4A–C), Wilcoxon signed rank test, with mean scores 0.5773, 0.5340, and 0.9327, respectively]. The high performance of the combined model was also validated by the ROC curves (Figure 4D).
These observations have implicated a scheme to enhance the predictive models of ICI responsiveness by assembling the multi-omics data.
The Methylation-Based Model Accurately Predicts ICI Responsiveness at Specific Tumor Type Level
To evaluate the performance of the methylation-based prediction model in each tumor type, we tested the model in tumor types with more than 5%, and 20 cases marked as positive (10 types in total: SKCM, BLCA, UCEC, CESC, COAD, LUAD, LIHC, STAD, LUSC, HNSC, ordered by the proportion of cases marked as position, Figure 5A).
Figure 5
One important characteristic of responsiveness to ICI treatment was that the effective rate differed from tumor type to tumor type (). As in the TCGA dataset, the proportion of cases marked as positive varied largely, from 37% in SKCM to zero in five other types (Figure 5A and Supplementary Table S1). So, we first investigated that whether the methylation-based model could predict this variation. We calculated the proportion of cases in each tumor type predicted as positive and compared the numbers with the true proportions in the 100 times random test sets. Although due to the high false positive rate, which was the common characteristic of such kind of models (, ), the predicted proportion of cases marked as positive were always higher than the ground truths (Figure 5A), the two correlated tightly, with average Spearman’s correlation coefficient 0.74 [with 95% confidence interval (0.46,0.93) among the 100 random test sets, Figure 5A, embedded panel].
Next, we assessed the performances of the model in each tumor type using F1 and MCC scores. We compared the performances of the model based on the selected probes in the 100 randomly split train and test datasets with the performances of the models based on randomly selected probes of the same number, under the same hyper-parameter λ = 13, which was optimal for the selected probes-based model (Figure 5B, upper panel). A paired t test was used to compare the performance between the two types of models. As expected, in 4 out of 10 tumor types, the performance of model based on selected probes was notably higher than that of model based on randomly selected probes (with significant level p < 0.1, Figure 5B, lower panel). These tumor types included SKCM and HNSC, which were commonly admitted as tightly related to the immune checkpoint evasion and may benefit from the ICI treatment (Figure 5B, lower panel) (, ). On the other hand, only 2 and 3 tumor types were tested with significantly lower F1 and MCC scores than the random ones (p < 0.1) with the paired t test. This result further illustrated the high performance of the pan-cancer prediction model in tumor type level.
At last, we tested the performance of the methylation-based prediction model in an independent validation cohort. The cohort was taken from two newly published research on non-small-cell lung carcinoma (NSCLC) patients accepting anti-PD-1/PD-L1 treatments with clinical responses measured (, ). There were 60 and 18 cases included in the two datasets, with 14 and 6 being identified as responsive to the treatments, respectively. It was worth noticing that this cohort was not included in the TCGA cohort used for model building, and mutation burden was tested as a poor predictor for treatment responsiveness (). This was the only publically available tumor immunotherapy cohort with DNA methylation levels measured to the best of our knowledge. The methylation levels in this cohort were measured using Illumina Infinium HumanMethylation850 BeadChip. We extracted the probes in the feature set existing in this chip and retrained the model in TCGA data. No significant drop of the performance of the model was observed (with average F1 and MCC score 0.4949 and 0.4503, with standard deviation 0.0288 and 0.0308). After applying the retrained model in this cohort, it got F1 = 0.4255, MCC = 0.1899, and AUC = 0.6742. The performance was also indicated in the ROC curve (Figure 5C). The progression-free survival time (PFS) was also significantly prolonged for cases predicted as positive compared with the negative ones (Figure 5D), though the p-value (p=0.06, log rank test) was not so significant due to the limited positive cases (only 20 cases predicted as positives). The performance of the model in this totally independent cohort demonstrated its efficiency at both pan-cancer level and for specific tumor type. It also indicated that the information the model caught was indeed the responsiveness itself other than its indicators such as TMB since the model retained its performance when TMB was not predictable to the responsiveness ().
Discussion
In this work, we first proposed the potential of the DNA methylation profiles to predict case responsiveness to the immunotherapy using the immune checkpoint inhibitors. Then we designed a feature selection scheme to extract the methylation sites with the prediction power based on the commonly used Illumina Infinium HumanMethylation450 BeadChip measurements of the methylation levels in the pan-cancer level on 32 types of TCGA data. Next, we built a machine learning prediction model for the responsiveness using the methylation levels of these selected sites. The performance of this model was shown both at pan-cancer level and for specific tumor types. The model performance was also compared with that of the existing pan-cancer model based on the gene expression profiles and proved to be comparable and complementary to that model. The combination of the two models was shown to perform better than the single ones. At last, the performance of the model was further shown using a cohort of NSCLC patients. Neither the patients nor the tumor type was involved in the model-building process.
The uncertainty of the outcome and possibility of severe immune-related adverse events were the major issues for the immunotherapy based on the ICIs (–). There has been a large number of biomarkers for the prediction of the responsiveness both at genome level and at transcriptome level, such as the tumor mutation burden (), the microsatellite instability (), the neoantigens (, ), the PD-L1 expression (, ), and the tumor immune microenvironments based on the gene expression profiles (, ). But the discussion of such biomarkers based on epigenetic signals were far less discussed. The tight relationship between the DNA methylation profiles and the responsiveness to the ICI treatment was only recently shown in separated tumor types (, ). Although the close correlation between the DNA methylation profile and the tumor immune microenvironments in the pan-cancer level has been introduced recently (), there are no direct, systematic discussion of the prediction power to the cases’ responsiveness at this level to the best of our knowledge. Our conclusion of the high performance of the methylation level–based model both at the pan-cancer level and for specific tumor types further illustrated the close relationship between the DNA methylation profiles and the tumor immunotherapy. It also directly offered a framework for outcome prediction of cases that received the ICI treatment. Moreover, it shows the important role of the epigenetic markers in the tumor immunotherapy. On one hand, we should also acknowledge that despite the model performance was applaudable, it still got a high false positive rate, just like other pan-cancer models did (). This major issue would be improved by the ensemble of multiple types of biomarkers, and we also showed the power of this kind of ensemble by integrating the methylation level and gene expression profiles. On the other hand, other frameworks such as the anomaly detection may help in tumor types with small proportions of cases responding to ICI treatment. The integration of these supervised and unsupervised frameworks may further improve the model performance. Finally, the performance of the model would certainly be improved with the appearance of large tumor immunotherapy cohorts with direct measurements of the responsiveness to the treatments.
The models involved in these studies were all commonly used ones with limited simplicity. We admitted that the model performance would be further improved if more complicated models such as deep neural networks were applied. We did not apply the deep neural networks because the main goal of this study was to deduce the feasibility of the DNA methylation profiles in the prediction of responsiveness of patients in ICI treatments and to introduce the model-building framework. The SVM model, whose performance was high, comparable with and complementary to that of the gene expression–based ones, was suitable enough for these goals. The more powerful models will be discussed in follow-up studies when more high-quality immunotherapy cohorts are available.
We should also acknowledge that the indirect definition of the responsiveness to the ICI treatments introduced irremediable bias of the model. This, together with limited samples in the NSCLC cohort, led to the degradation of the model performance. Unfortunately, there were few currently available tumor immunotherapy cohorts with responsiveness annotated in the pan-cancer level. It is unpractical to build such models with directly defined responsiveness now.
Compared with the state-of-the-art biomarkers, the epigenetic markers such as DNA methylation profiles, histone modifications, chromatin structure, accessibility, and the nucleosome positioning come up with a lot of advantages, such as the low patient invasiveness. For many of epigenetic markers can be measured in liquid biopsies and body fluids (), contain rich information of life habits and conditions of patients (), and reveal the origin and evolution of a given disease they carried (). Their close correlation with the tumor immune microenvironment and importance in the tumor immunotherapy have received more and more attention recently. A series of epigenetic biomarkers for immunocompetent phenotypes have also been established (). The thorough study of the roles of these markers will certainly mark a new dawn in the tumor immunotherapy.
In conclusion, the DNA methylation profiles were predictable to the responsiveness to the ICI treatments. The built SVM model were well-performed both at pan-cancer level and for specific tumor types. The performance of our model was comparable with and complementary to that of the gene expression–based model.
Funding
This work was supported by grants from the Collaborative Innovation Major Project of Zhengzhou (Grant No. 20XTZX08017); the China’s National Key R&D Program (Grant Nos. 2018ZX10305-409-006, 2018ZX10305-409-005); the UK-China Collaboration Fund to tackle AMR (Innovate UK (TS/S00887X/1) and Ministry of Science and Technology (Grant No. 2018YFE0102100); National Natural Science Foundation of China (Grant No. 81702966).
Publisher’s Note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Statements
Data availability statement
The TCGA datasets analyzed during the current study are available in the GDC portal, https://portal.gdc.cancer.gov/. The DNA methylation profiles in the NSCLC cohort was downloaded from Gene Expression Omnibus (GEO) database under the accession number GSE119144 and GSE126043.
Author contributions
Study concept and design: HG, LH, and YS. Acquisition, analysis, or interpretation of data: BX, ML, LY, MG, and YR. Drafting of the manuscript: BX, RW, HG, and LH. Study supervision: HG and LH. All authors contributed to the article and approved the submitted version.
Acknowledgments
The authors would like to thank the TCGA, GEO database for the availability of the data. We would also thank Prof. Zhihua Zhang from the Beijing Institute of Genomics, Chinese Academy of Sciences for providing the computational resources and Dr. Xiaomeng Gao for dealing with the figures. We acknowledge the valuable support of Lu Yang, Manjiao Liu, Yiran Zheng, Huijuan Qin, Chao Song, Guowei Dong and Huanxi Cui from Simcere Diagnostics.
Conflict of interest
Authors BX, LY, MG, YR and HG were employed by company Jiangsu Simcere Diagnostics Co., Ltd. and Nanjing Simcere Medical Laboratory Science Co., Ltd.
The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fimmu.2021.796647/full#supplementary-material
Supplementary Figure S1The tumor mutation burden (TMB, log transformed) calculated for each case involved in this study. The map between abbreviations and associated tumor types could be found in Table S1.
Supplementary Figure S2The rationality of defining the ICI treatment responsiveness using TMB and TGF-β indirectly. (A) The number of cases defined as positive and negative in each tumor type. (B–D) The differences of biomarkers independent with those in the definition of responsiveness between cases defined as positive and negative. In each condition, 100 randomly cases were plotted as points.
Supplementary Figure S3Differential analysis of the three indicators to the responsiveness of ICI treatment. (A) The survival curves of cases in the two clusters of the tumor types with significant differences in the overall survival times. (B) The distributions of TMBs of cases in the two clusters of the tumor types with significant differences in the TMB values. (C) The volcano plot showing the expression differences of CD274 gene between the two clusters in all tumor types. Those tumor types with significant differences were marked as red. (D) The numbers of tumor types detected as significant in each indicator and their overlap. (E) The number of features extracted for each indicator and their overlaps.
Supplementary Table S1The abbreviation, names, number of cases, number of cases in each cluster of immune infiltration–based clustering, and number of cases predicted as positive or negative of tumor types involved in this study. The cluster in each tumor type with longer average overall survival time was defined as cluster 2.
Supplementary Table S2The clinical information and predicted ICI treatment responsiveness of cases in the NSCLC cohort.
Supplementary Table S3The features selected by each indicator.
Abbreviations
ICI, immune checkpoint inhibitors; CTLA-4, cytotoxic T lymphocyte antigen-4; PD-1, programmed death receptor-1; PD-L1, the principal ligand; irAEs, immune-related adverse events; TMB, tumor mutation burden; NSCLC, non-small-cell lung cancer; CRC, colorectal cancer; MSI, microsatellite instability; TSSs, transcription start sites; IFNG, Interferon Gamma; GZMB, Granzyme B; CCR7, C-C Motif Chemokine Receptor 7; TCF7, Transcription Factor 7; OS, overall survival; TCGA, the Cancer Genome Atlas; TGF-β, transforming growth factor β; GEO, Gene Expression Omnibus; DMP, differential methylated probes; MCC, Matthews correlation coefficient; AUC, area under the receiver operating characteristic curve.
References
1
RobertC. A Decade of Immune-Checkpoint Inhibitors in Cancer Therapy. Nat Commun (2020) 11(1):3801. doi: 10.1038/s41467-020-17670-y
2
JiaXYVanessaMHJunT. Immuno-Oncology Drug Development Goes Global. Nat Rev Drug Discov (2019) 18(12):899–900. doi: 10.1038/d41573-019-00167-9
3
GaronEBRizviNAHuiRLeighlNBalmanoukianASEderJPet al. Pembrolizumab for the Treatment of Non-Small-Cell Lung Cancer. N Engl J Med (2015) 372(21):2018–28. doi: 10.1056/NEJMoa1501824
4
RobertCLongGVBradyBDutriauxCMaioMMortierLet al. Nivolumab in Previously Untreated Melanoma Without BRAF Mutation. N Engl J Med (2015) 372(4):320–30. doi: 10.1056/NEJMoa1412082
5
Ramos-CasalsMBrahmerJRCallahanMKFlores-ChávezAKeeganNKhamashtaMAet al. Immune-Related Adverse Events of Checkpoint Inhibitors. Nat Rev Dis Primers (2020) 6(1):38. doi: 10.1038/s41572-020-0160-6
6
ZhuJZhangTLiJLinJLiangWHuangWet al. Association Between Tumor Mutation Burden (TMB) and Outcomes of Cancer Patients Treated With PD-1/PD-L1 Inhibitions: A Meta-Analysis. Front Pharmacol (2019) 10:673. doi: 10.3389/fphar.2019.00673
7
McGranahanNFurnessAJRosenthalRRamskovSLyngaaRSainiSKet al. Clonal Neoantigens Elicit T Cell Immunoreactivity and Sensitivity to Immune Checkpoint Blockade. Science (2016) 351(6280):1463–9. doi: 10.1126/science.aaf1490
8
JiangTShiTZhangHHuJSongYWeiJet al. Tumor Neoantigens: From Basic Research to Clinical Applications. J Hematol Oncol (2019) 12(1):93. doi: 10.1186/s13045-019-0787-5
9
GandhiLRodríguez-AbreuDGadgeelSEstebanEFelipEDe AngelisFet al. Pembrolizumab Plus Chemotherapy in Metastatic Non-Small-Cell Lung Cancer. N Engl J Med (2018) 378(22):2078–92. doi: 10.1056/NEJMoa1801005
10
RouquetteITaranchon-ClermontEGilhodesJBluthgenMVPerallonRChalabreysseLet al. Immune Biomarkers in Thymic Epithelial Tumors: Expression Patterns, Prognostic Value and Comparison of Diagnostic Tests for PD-L1. Biomark Res (2019) 7:28. doi: 10.1186/s40364-019-0177-8
11
ChenDSMellmanI. Elements of Cancer Immunity and the Cancer-Immune Set Point. Nature (2017) 541(7637):321–30. doi: 10.1038/nature21349
12
LlosaNJCruiseMTamAWicksECHechenbleiknerEMTaubeJMet al. The Vigorous Immune Microenvironment of Microsatellite Instable Colon Cancer Is Balanced by Multiple Counter-Inhibitory Checkpoints. Cancer Discov (2015) 5(1):43–51. doi: 10.1158/2159-8290.Cd-14-0863
13
JungHKimHSKimJYSunJMAhnJSAhnMJet al. DNA Methylation Loss Promotes Immune Evasion of Tumours With High Mutation and Copy Number Load. Nat Commun (2019) 10(1):4278. doi: 10.1038/s41467-019-12159-9
14
ShollLMHirschFRHwangDBotlingJLopez-RiosFBubendorfLet al. The Promises and Challenges of Tumor Mutation Burden as an Immunotherapy Biomarker: A Perspective From the International Association for the Study of Lung Cancer Pathology Committee. J Thorac Oncol (2020) 15(9):1409–24. doi: 10.1016/j.jtho.2020.05.019
15
AsaokaYIjichiHKoikeK. PD-1 Blockade in Tumors With Mismatch-Repair Deficiency. N Engl J Med (2015) 373(20):1979. doi: 10.1056/NEJMc1510353
16
KerachianMAJavadmaneshAAzghandiMMojtabanezhad ShariatpanahiAYassiMShams DavodlyEet al. Crosstalk Between DNA Methylation and Gene Expression in Colorectal Cancer, a Potential Plasma Biomarker for Tracing This Tumor. Sci Rep (2020) 10(1):2813. doi: 10.1038/s41598-020-59690-0
17
LeeCJEvansJKimKChaeHKimS. Determining the Effect of DNA Methylation on Gene Expression in Cancer Cells. Methods Mol Biol (2014) 1101:161–78. doi: 10.1007/978-1-62703-721-1_9
18
WajedSALairdPWDeMeesterTR. DNA Methylation: An Alternative Pathway to Cancer. Ann Surg (2001) 234(1):10–20. doi: 10.1097/00000658-200107000-00003
19
KulisMEstellerM. DNA Methylation and Cancer. Adv Genet (2010) 70:27–56. doi: 10.1016/b978-0-12-380866-0.60002-2
20
HoggSJBeavisPADawsonMAJohnstoneRW. Targeting the Epigenetic Regulation of Antitumour Immunity. Nat Rev Drug Discov (2020) 19(11):776–800. doi: 10.1038/s41573-020-0077-5
21
DelacherMImbuschCDWeichenhanDBreilingAHotz-WagenblattATrägerUet al. Genome-Wide DNA-Methylation Landscape Defines Specialization of Regulatory T Cells in Tissues. Nat Immunol (2017) 18(10):1160–72. doi: 10.1038/ni.3799
22
ChakravarthyAFurnessAJoshiKGhoraniEFordKWardMJet al. Pan-Cancer Deconvolution of Tumour Composition Using DNA Methylation. Nat Commun (2018) 9(1):3220. doi: 10.1038/s41467-018-05570-1
23
MeléndezBVan CampenhoutCRoriveSRemmelinkMSalmonID’HaeneN. Methods of Measurement for Tumor Mutational Burden in Tumor Tissue. Transl Lung Cancer Res (2018) 7(6):661–67. doi: 10.21037/tlcr.2018.08.02
24
WangCLiangC. MSIpred: A Python Package for Tumor Microsatellite Instability Classification From Tumor Mutation Annotation Data Using a Support Vector Machine. Sci Rep (2018) 8(1):17546. doi: 10.1038/s41598-018-35682-z
25
TeschendorffAEGomezSArenasAEl-AshryDSchmidtMGehrmannMet al. Improved Prognostic Classification of Breast Cancer Defined by Antagonistic Activation Patterns of Immune Response Pathway Modules. BMC Cancer (2010) 10:604. doi: 10.1186/1471-2407-10-604
26
Sanchez-VegaFMinaMArmeniaJChatilaWKLunaALaKCet al. Oncogenic Signaling Pathways in the Cancer Genome Atlas. Cell (2018) 173(2):321–37.e10. doi: 10.1016/j.cell.2018.03.035
27
MariathasanSTurleySJNicklesDCastiglioniAYuenKWangYet al. Tgfβ Attenuates Tumour Response to PD-L1 Blockade by Contributing to Exclusion of T Cells. Nature (2018) 554(7693):544–48. doi: 10.1038/nature25501
28
WangXLiM. Correlate Tumor Mutation Burden With Immune Signatures in Human Cancers. BMC Immunol (2019) 20(1):4. doi: 10.1186/s12865-018-0285-5
29
CharoentongPFinotelloFAngelovaMMayerCEfremovaMRiederDet al. Pan-Cancer Immunogenomic Analyses Reveal Genotype-Immunophenotype Relationships and Predictors of Response to Checkpoint Blockade. Cell Rep (2017) 18(1):248–62. doi: 10.1016/j.celrep.2016.12.019
30
FujiiTNaingARolfoCHajjarJ. Biomarkers of Response to Immune Checkpoint Blockade in Cancer Treatment. Crit Rev Oncol Hematol (2018) 130:108–20. doi: 10.1016/j.critrevonc.2018.07.010
31
ShieldsBDMahmoudFTaylorEMByrumSDSenguptaDKossBet al. Indicators of Responsiveness to Immune Checkpoint Inhibitors. Sci Rep (2017) 7(1):807. doi: 10.1038/s41598-017-01000-2
32
PolanoMChiericiMDal BoMGentiliniDDi CintioFBabociLet al. A Pan-Cancer Approach to Predict Responsiveness to Immune Checkpoint Inhibitors by Machine Learning. Cancers (Basel) (2019) 11(10):1562. doi: 10.3390/cancers11101562
33
ChoJWHongMHHaSJKimYJChoBCLeeIet al. Genome-Wide Identification of Differentially Methylated Promoters and Enhancers Associated With Response to Anti-PD-1 Therapy in Non-Small Cell Lung Cancer. Exp Mol Med (2020) 52(9):1550–63. doi: 10.1038/s12276-020-00493-8
34
RitchieMEPhipsonBWuDHuYLawCWShiWet al. Limma Powers Differential Expression Analyses for RNA-Sequencing and Microarray Studies. Nucleic Acids Res (2015) 43(7):e47. doi: 10.1093/nar/gkv007
35
LoveMIHuberWAndersS. Moderated Estimation of Fold Change and Dispersion for RNA-Seq Data With Deseq2. Genome Biol (2014) 15(12):550. doi: 10.1186/s13059-014-0550-8
36
MalikaCNadiaGV´eroniqueBAzamN. NbClust: An R Package for Determining the Relevant Number of Clusters in a Data Set. J Stat Software (2014) 061(6):1–36. doi: 10.18637/jss.v061.i06
37
GuoHXLiYJShangJGuMYHuangYYGongB. Learning From Class-Imbalanced Data: Review of Methods and Applications. Expert Syst Appl (2017) 73:220–39. doi: 10.1016/j.eswa.2016.12.035
38
RáczABajuszDHébergerK. Multi-Level Comparison of Machine Learning Classifiers and Their Performance Metrics. Molecules (2019) 24(15):2811. doi: 10.3390/molecules24152811
39
MurphyKMZhangSGeigerTHafezMJBacherJBergKDet al. Comparison of the Microsatellite Instability Analysis System and the Bethesda Panel for the Determination of Microsatellite Instability in Colorectal Cancers. J Mol Diagn (2006) 8(3):305–11. doi: 10.2353/jmoldx.2006.050092
40
AzourySCStraughanDMShuklaV. Immune Checkpoint Inhibitors for Cancer Therapy: Clinical Efficacy and Safety. Curr Cancer Drug Targets (2015) 15(6):452–62. doi: 10.2174/156800961506150805145120
41
WieswegMMairingerFReisHGoetzMWalterRFHHagerTet al. Machine Learning-Based Predictors for Immune Checkpoint Inhibitor Therapy of Non-Small-Cell Lung Cancer. Ann Oncol (2019) 30(4):655–57. doi: 10.1093/annonc/mdz049
42
ButnerJDElganainyDWangCXWangZChenSHEsnaolaNFet al. Mathematical Prediction of Clinical Outcomes in Advanced Cancer Patients Treated With Checkpoint Inhibitor Immunotherapy. Sci Adv (2020) 6(18):eaay6298. doi: 10.1126/sciadv.aay6298
43
WuCEYangCKPengMTHuangPWLinYFChengCYet al. Immune Checkpoint Inhibitors for Advanced Melanoma: Experience at a Single Institution in Taiwan. Front Oncol (2020) 10:905. doi: 10.3389/fonc.2020.00905
44
PestanaRCBecnelMRubinMLTormanDKCrespoJPhanJet al. Response Rates and Survival to Systemic Therapy After Immune Checkpoint Inhibitor Failure in Recurrent/Metastatic Head and Neck Squamous Cell Carcinoma. Oral Oncol (2020) 101:104523. doi: 10.1016/j.oraloncology.2019.104523
45
HuangJWangL. Cell-Free Dna Methylation Profiling Analysis-Technologies and Bioinformatics. Cancers (Basel) (2019) 11(11):1741. doi: 10.3390/cancers11111741
46
LimUSongMA. Dietary and Lifestyle Factors of DNA Methylation. Methods Mol Biol (2012) 863:359–76. doi: 10.1007/978-1-61779-612-8_23
47
XiaDLeonAJCabaneroMPughTJTsaoMSRathPet al. Minimalist Approaches to Cancer Tissue-of-Origin Classification by DNA Methylation. Mod Pathol (2020) 33(10):1874–88. doi: 10.1038/s41379-020-0547-7
48
XiaoQNobreAPiñeiroPBerciano-GuerreroMAlbaECoboMet al. Genetic and Epigenetic Biomarkers of Immune Checkpoint Blockade Response. J Clin Med (2020) 9(1):286. doi: 10.3390/jcm9010286
Summary
Keywords
DNA methylation, predictive modeling, immunotherapy, immune checkpoint inhibitors, support vector machine
Citation
Xu B, Lu M, Yan L, Ge M, Ren Y, Wang R, Shu Y, Hou L and Guo H (2021) A Pan-Cancer Analysis of Predictive Methylation Signatures of Response to Cancer Immunotherapy. Front. Immunol. 12:796647. doi: 10.3389/fimmu.2021.796647
Received
17 October 2021
Accepted
18 November 2021
Published
09 December 2021
Volume
12 - 2021
Edited by
Dawei Chen, Shandong Cancer Hospital, China
Reviewed by
Lu Zhou, Fudan University, China; Mei Liu, Chinese Academy of Medical Sciences and Peking Union Medical College, China; Liuluan ZHU, Capital Medical University, China
Updates
Copyright
© 2021 Xu, Lu, Yan, Ge, Ren, Wang, Shu, Hou and Guo.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Ru Wang, wangru@ty.sus.edu.cn; Yongqian Shu, shuyongqian@csco.org.cn; Lin Hou, houl@tsinghua.edu.cn; Hao Guo, h.guo@foxmail.com
†These authors have contributed equally to this work and share first authorship
This article was submitted to Cancer Immunity and Immunotherapy, a section of the journal Frontiers in Immunology
Disclaimer
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.