Donor whole blood DNA methylation is not a strong predictor of acute graft versus host disease in unrelated donor allogeneic haematopoietic cell transplantation

Allogeneic hematopoietic cell transplantation (HCT) is used to treat many blood-based disorders and malignancies, however it can also result in serious adverse events, such as the development of acute graft-versus-host disease (aGVHD). This study aimed to develop a donor-specific epigenetic classifier to reduce incidence of aGVHD by improving donor selection. Genome-wide DNA methylation was assessed in a discovery cohort of 288 HCT donors selected based on recipient aGVHD outcome; this cohort consisted of 144 cases with aGVHD grades III-IV and 144 controls with no aGVHD. We applied a machine learning algorithm to identify CpG sites predictive of aGVHD. Receiver operating characteristic (ROC) curve analysis of these sites resulted in a classifier with an encouraging area under the ROC curve (AUC) of 0.91. To test this classifier, we used an independent validation cohort (n = 288) selected using the same criteria as the discovery cohort. Attempts to validate the classifier failed with the AUC falling to 0.51. These results indicate that donor DNA methylation may not be a suitable predictor of aGVHD in an HCT setting involving unrelated donors, despite the initial promising results in the discovery cohort. Our work highlights the importance of independent validation of machine learning classifiers, particularly when developing classifiers intended for clinical use.


Introduction
In the past 6 decades, allogeneic hematopoietic cell transplantation (HCT) has become a cornerstone of treatment for haematological malignancies and is still often considered the only curative option (Duarte et al., 2019).Despite advances in the precision of HLA matching in unrelated donor selection and supportive care leading to ongoing improvements in HCT outcomes, severe graft versus host disease (GVHD) regularly occurs, increasing the risk of morbidity and mortality (McDonald-Hyman et al., 2015).Acute GVHD (aGVHD) occurs when the donor immune cells attack healthy tissue in the graft recipient, causing a range of inflammatory lesions which primarily affect the skin and digestive organs.Typically aGVHD occurs within 100 days of transplant.While the incidence has decreased in the last decade due to better HLA matching of donors, aGVHD still affects ~30-50% of allogeneic HCT recipients (Al-Kadhimi et al., 2014), making the prevention of aGVHD an important area of research.
DNA methylation is a stable modification of the DNA which can influence gene expression without altering the underlying genetic sequence.DNA methylation has an emerging role in precision medicine due to the environmental and developmental exposures it can capture.Several factors associated with the development of aGVHD are also known to influence the epigenome, including age (Hannum et al., 2013;Horvath, 2013), sex (Yousefi et al., 2015) and viral infections (Birdwell et al., 2014).Despite the relative infancy of the field, DNA methylation classifiers predictive of clinical outcome are now being used in the clinic, notably in oncology to guide treatment of brain tumours (Capper et al., 2018;Koelsche et al., 2021).The development of machine learning algorithms and increasing size of datasets has also allowed improvement in the development of such classifiers for early diagnosis and determining subtypes of disease (Maros et al., 2020).
In 2015, we published a pilot study investigating DNA methylation as a potential classifier of aGVHD in HCT of HLA matched sibling pairs (Paul et al., 2015).In that study, we assessed DNA methylation in a cohort of 85 HCT donors selected based on recipient outcome, identifying 31 DNA methylation markers associated with aGVHD severity in graft recipients.In internal cross-validation these markers showed strong predictive performance (AUC = 0.98) indicating the potential utility of DNA methylation in improving donor selection in sibling HCT.The purpose of the current study was to investigate if DNA methylation is also predictive of outcome in HLA matched unrelated donor-recipient pairs, which constitute a much greater proportion of HCTs.To do this, we assessed genome-wide DNA methylation of 576 individuals recruited from the Center for International Blood and Marrow Transplant Research (CIBMTR).The scale and quality of annotation of the CIBMTR donor collection allowed us to use stringent selection criteria to minimise confounding and increase our power to detect methylation differences which were predictive of the development of aGVHD following HCT.

Study population
The discovery study cohort consisted of 288 HLA-A, -B, -C and -DRB1 matched, unrelated donor transplants reported to the CIBMTR that had pre-transplant donor peripheral blood samples available through the CIBMTR Research Repository.Patients received a transplant between 2002 and 2017 for acute lymphoblastic leukemia (ALL), acute myelogenous leukemia (AML) and myelodysplastic syndromes (MDS) using T-cell replete peripheral blood stem cell grafts, myeloablative conditioning and tacrolimus with methotrexate or mycophenylate mofetil based GVHD prophylaxis.The population was selected as a case-control cohort with 144 cases that developed aGVHD III-IV and controls with no aGVHD.Cases and controls were matched for sex, age, disease and GVHD prophylaxis.Donors were all selfreported as Caucasian.
The validation cohort (n = 288) was selected using the same criteria.Using a previously described method (Tsai and Bell, 2015), power calculations for the discovery study using the EPIC array for genome-wide methylation measurement were performed with genome-wide significance set at 1 × 10 −6 .Sample groups of 140 donors matched to recipients with grade III-IV aGvHD, and 140 donors matched to recipients with no aGVHD, would give us 88% power to detect a methylation difference of 10% between the groups, and 100% power to detect methylation differences of 25%.Several additional samples for each group were profiled to ensure adequate power even if samples were removed during quality control.

Samples
Genomic DNA was extracted from whole blood samples obtained from CIBMTR using the QIAamp DNA Blood Mini Kit (Qiagen) at the UCL Pathology Department (discovery study) and the UCL Genomics facility (validation study).The quality and concentration of DNA was assessed using NanoDrop and Qubit (Thermo Fisher).

Genome-wide DNA methylation profiling
For each sample, 500 ng high-quality DNA was bisulphite converted using the EZ DNA methylation kit (Zymo Research), using alternative incubation conditions recommended for Illumina methylation arrays.Methylation was subsequently analysed using the Infinium MethylationEPIC array (Illumina) measuring CpG methylation at >850,000 sites across the genome.Array preparation was performed at the UCL Genomics facility using standard operating procedures.Discovery and validation cohorts were processed independently at different timepoints, but within each cohort batches were minimised by distributing comparison groups evenly across BeadChips and position on BeadChip.

Analysis overview
All analyses were performed in R version 3.6.Samples remaining following quality control (n = 282 for discovery cohort and 288 for validation cohorts) were normalised using SWAN, then problematic probes were removed including those with a detection p-value > 0.01, probes with a beadcount <3 in more than 5% of samples (Pidsley et al., 2013), non-cg probes, probes containing any common SNPs in dbSNP (Zhou et al., 2017) and probes mapping to the X or Y chromosomes.Singular value decomposition (SVD) (Teschendorff et al., 2009) and principal components analysis (PCA) were used to assess batch effects in the data, which were subsequently adjusted for using Combat (Johnson et al., 2007).Cell composition was estimated and adjusted for using the Houseman method (Houseman et al., 2012) as implemented in ChAMP (Morris et al., 2014;Tian et al., 2017), estimating cell proportions using the Reinius reference dataset (Reinius et al., 2012).Differentially methylated positions (DMPs) were assessed using a linear model in Limma (Smyth, 2004).
Machine learning analysis was performed using the random forest method (Breiman, 2001) as implemented in the RandomForest package.Instead of using all CpG sites as input for the RandomForest analysis, a subset of 10,000 CpG sites were selected through feature selection.
A supervised approach was used, where DMPs were identified in the discovery cohort using a linear model and the top 10,000 ranked probes were used as input for the random forest analysis.An alternative unsupervised approach was also carried out where the top 10,000 probes with the largest overall beta variance across all samples in the discovery cohort were used as input for the random forest analysis.In both cases, the classifiers were then tested on matched probe sets from the validation cohort, and sensitivity and specificity of the classifiers were calculated.
Following these analyses, we performed machine learning on the supervised dataset which had performed more robustly in random forest analysis, using Support Vector Machines (SVM), Gradient Boosting Machines (GBM), k-Nearest Neighbours (KNN), Multilayer perceptrons (MLP) and Logistic Regression (LR).For each model, we explored a range of hyperparameters through a grid search approach.Each experiment was executed 40 times with different random seeds, resulting in training over 2,300 models in total.

Data availability
The participants involved in the study had been recruited under different consents which require different levels of data access.According to consent given, the corresponding data are being made available in a three-tiered data access approach: 1. Processed data (beta matrix) for all individuals (n = 570) are available from the open access 'Gene Expression Omnibus' under accession number GSE196696.

Results
Cohort and dataset characteristics of a donor-based methylation resource for aGVHD investigation Unrelated donor-recipient pairs undergoing HCT were selected from the CIBMTR Research Repository, based on the aGVHD outcomes in recipients (Figure 1).Blood-based DNA methylation from donors was assessed using the Illumina EPIC arrays.Methylation differences were assessed, and random forest analysis was used to test for the presence of a classifier of aGVHD outcome.
Unrelated donor-recipient pairs were selected by CIBMTR using stringent criteria as described in methods, resulting in 282 individuals in the discovery cohort following initial data quality control, and 288 individuals in the validation cohort.The resulting cohorts were well matched for characteristics that can influence DNA methylation profile, including age and sex (as shown in Table 1).
The discovery cohort was well matched for disease, with no significant difference in proportion of AML, ALL and MDS between comparison groups (p = 0.339).Median recipient ages for the no/ severe aGVHD groups were 45 (range 19-76) and 47 (range 18-72), respectively.There was no significant difference for recipient sex (p = 0.716) or ethnicity (p = 0.113) across comparison groups.Donors were well matched across comparison groups for sex (p = 0.585), however, there was a difference in median age (p = 0.003), though this was not apparent when individuals were stratified into age brackets (p = 0.090).There were no significant differences across comparison groups for donor/recipient ABO type, blood type, Rh factor, CMV status or sex match.
The validation cohort had a significant difference in proportions of these diseases across comparison groups (p = 0.02).The median recipient ages for the no/severe aGVHD groups in the validation cohort was 49 (range 20-75) and 50 (range 19-71) respectively.There was no significant difference in the recipient age distribution across comparison groups (p = 0.998).There was no difference in recipient sex across the comparison groups (41% female recipients, p = 1.0).Donors were well matched across comparison groups for sex (p = 0.063) and median age (p = 0.076).There were no significant differences in ethnicity, donor/recipient ABO type, blood type, Rh factor, CMV status or sex match across groups.There were differences in conditioning regimen across comparison groups (p < 0.001).
Following sample removal, quality control plots showed that the 282 individuals remaining in the discovery dataset and 288 individuals remaining in the validation dataset had very high quality methylation profiles (Supplementary Figure S1).Following probe filtering, 661,114 probes remained in the discovery dataset.Singular Value Decomposition (SVD) and principal components analysis (PCA) indicated that estimated 'cell composition', 'Slide/ BeadChip' and 'Array' batch effects were having the largest impact on the data (Supplementary Figures S2, 3), which were subsequently adjusted for using ChAMP cell composition correction and ComBat adjustment respectively.Cell type proportions were estimated for each group using the DNA methylation profiles and were found to be well balanced in each cohort with no significant difference between groups (Supplementary Table S1).
We have created an extremely well phenotyped and highly curated methylation dataset which has been developed with careful consideration of technical, biological and clinical confounders, with extensive matched clinical data.This methylation dataset provides a unique resource for the investigation of HCT donor DNA methylation, and will be beneficial to the wider research community as a 'healthy' cohort.

Significant aGVHD-associated differential methylation is not detectable in donor whole blood
No CpG sites passed a false discovery rate adjusted p-value significance threshold of 0.05 during DMP analysis when comparing the 'no aGVHD' group to the 'severe aGVHD' group.As the main batch and confounding effects of slide, array and cell composition had been previously adjusted in the dataset, no additional covariates were included during linear regression.This lack of significant differentially methylated positions indicates that individual CpG sites were not a strong classifier of aGVHD outcome in donor whole blood samples.

Random forest classifier identified failed to validate in independent cohort
Random forest analysis was performed on two sets of probes; the unsupervised analysis using the top-ranked 10,000 most variable probes, which all had a beta variance of >33% across all samples.The supervised analysis used the top 10,000 probes resulting from the linear model DMP analysis, though none passed statistical significance these were considered sites with putative methylation differences.Random forest analysis was run with 500 trees, with 100 variables tested at each split for both analysis approaches.
The high variability classifier showed very poor performance, with an out-of-bag (OOB) estimate of error rate of 45.39% and area under the curve (AUC) of 0.516 during internal cross-validation of the discovery dataset (Figure 2).The differential methylation dataset produced an initially promising classifier with an OOB estimate of error rate of 14.89% and an AUC of 0.913 (Figure 3).
During validation analysis, the matched CpG sites used as input to the original random forest training analysis were extracted from the validation dataset as all probes present in training analyses are required as input for validation.Validation analyses indicated that Study Design.Unrelated donor-recipient pairs were selected based on the outcome of recipients following HCT.DNA methylation levels were assessed in donors associated with no (Grade 0) or severe (Grades 3-4) aGVHD in recipients.Donor-recipient pairs were HLA matched, and comparison groups were matched for sex, age, disease and GVHD prophylaxis.Feature selection reduced the number of probes in the discovery dataset to 10,000 for input to random forest analyses, and this classifier was subsequently tested in the validation cohort following pre-processing of data and refinement to the same set of probes.Frontiers in Genetics frontiersin.orgthe differential methylation classifier had a sensitivity of 90.97% but a specificity of just 6.25%, and an AUC of 0.508.This is driven by an over-prediction of the 'severe aGVHD' group in the independent validation cohort, resulting in many false positive predictions.The unsupervised differential variability classifier also had an extremely poor performance in the validation cohort, with a sensitivity of just 50%, a specificity of 51.39% and an AUC of 0.523.As such, neither of these approaches yielded a useful classifier.Additional machine learning analyses applying a range of machine learning methods (SVM, GBM, KNN, MLP and LR) to the supervised dataset found a slight improvement in measures of AUC, however even the best models from an optimised selection of over 2,300 had an AUC of 0.60-0.61showing a marginal improvement which is not appropriate for clinical translation (Figure 4).
Through extensive analyses we have concluded that DNA methylation in donor whole blood is not a strong predictor of aGVHD outcome in recipients during unrelated HCT.These findings also demonstrate the importance of independent validation of methylation-based classifiers particularly when using machine learning approaches.

Discussion
Recently developed predictors of aGVHD using clinical variables have had modest success with an AUC of ~0.6 (Lee et al., 2018), however this indicated that biological markers of gene expression, such as epigenetic markers, could provide additional insight to improve prediction of aGVHD.This was also supported by the recent finding that hypermethylation of the TP53 gene in HCT recipients was found to correlate with relapse of myelodysplastic syndromes following transplantation, indicating recipient-based DNA methylation could be predictive of outcomes during HCT (Wang et al., 2021).As DNA methylation levels reflect both the underlying genetic sequence and factors known to be associated with aGVHD development (such as donor age, sex and cytomegalovirus serostatus), we hypothesised they would be a strong candidate for classifier identification.Our initial study focused on sibling donor-recipient pairs, in which a DNA methylation classifier of aGVHD development was identified in the blood of donors (Paul et al., 2015).In the current study, we tested if DNA methylation as measured by EPIC arrays is also predictive of aGVHD in unrelated donor-recipient pairs and found that it is not.
There are several potential technical and biological reasons that a robust classifier of aGVHD was not identified in this study.Firstly, while the study performed was shown to have power to detect larger methylation differences of >10%, the relatively small sample size of the discovery cohort (n = 280) and ROC curve of classifier performance of the unsupervised Random Forest Classifier.Plot (A) shows the performance of the variable probe based (unsupervised approach) classifier which used the top 10,000 most variable CpG sites as input, during internal cross validation on the training dataset.Plot (B) shows the performance of the variability based classifier on the independent validation cohort, with an AUC of 0.523, a sensitivity of 50.0% and a very poor specificity of 51.4%.ROC curve of classifier performance of the supervised Random Forest Classifier.The figure shows the performance of the differential methylation (supervised approach) classifier which used the top 10,000 most differentially methylated CpG sites as input, during internal cross validation on the training dataset (blue line).The performance of the differential methylation classifier on the independent validation cohort is indicated by the orange line, which had an AUC of 0.508, a sensitivity of 90.97% with a very poor specificity of 6.25%.While initially this differential methylation-based classifier appeared encouraging with the discovery cohort, the classifier did not perform well during validation analyses.
validation cohort (n = 288) may have limited our ability to detect more subtle methylation differences.In the future, larger scale studies may provide increased power to detect such differences.
Secondly, the tissue we investigated was peripheral blood of donors which was intended to act as a surrogate tissue reflecting outcome.DNA methylation profiles are known to be highly cell type specific (Ji et al., 2010), and while blood based DNA methylation may reflect certain exposures and factors associated with aGVHD development, it is possible that a specific cellular subtype which is not present in the whole blood of donors is responsible for the development of aGVHD and as such would not be reflected in the methylation profile.Another possibility is that the specific cell type which is causing aGVHD could be present in whole blood, but in small proportions, making the signal significantly diluted by other more prominent cell types.Indeed, in the current analysis, cell composition was the biggest driver of variation in the data, and though this was balanced overall between the comparison groups and adjusted for in the data analysis, it could have been a confounding factor in the study, or subtle methylation effects could have been lost during adjustment.In the future, methylation analysis of individual cell types isolated from stem cell grafts may provide more insight into DNA methylation differences driving the development of aGVHD.While this approach would provide a more refined methylation measurement, it would be a significantly less practical approach for a clinical test, limiting the utility for optimising donor selection as usually these cells would only be collected once a donor is committed.
A classifier of aGVHD development was identified in our previously published work, which investigated donor DNA methylation from sibling HCT.A potential reason a similar biomarker was not identified in this cohort is that it could have been specific to sibling transplants, which generally have a lower incidence of aGVHD which may be driven more by extrinsic factors which influence DNA methylation, while aGVHD following HCT from an unrelated donor may be driven more by genetic factors.There may also be an issue of 'epigenetic compatibility', with donors and recipients varying in epigenetic profile inciting the initiation of aGVHD in certain individuals, without this being driven by a specifically differentially methylated gene or pathway.This would explain why a classifier was not identified in the current study, as the epigenetic marks conferring risk of aGVHD would be different for each individual.In the future, studies investigating the DNA methylation of both donors and recipients during HCT could provide more insight into this possibility.This should be considered with the caveat that previous studies have assessed donor and recipient DNA (Rodriguez et al., 2013) and this revealed several key problems with comparing donor and recipient DNA for aGVHD prediction.Notably, HCT recipients are often being treated for blood-based malignancies which have enormous impacts on the epigenome (Blecua et al., 2020), as well as dramatically altering cell composition.In addition, many recipients have already been exposed to therapeutics which can dramatically alter the epigenome.Finally, as demonstrated by Rodriguez et al., following HCT, recipients retain the methylation patterns of the donor as well as their own, resulting in cellular chimerism.The combination of these factors make it difficult to extract meaningful signal when comparing the methylation patterns of donors and recipients during HCT.Even with access to the substantial cohorts we have used in this study, it would be immensely difficult to identify a suitably ROC curves of classifiers developed using additional machine learning methods.The additional machine learning methods applied to the supervised dataset were Support Vector Machines (SVM), Gradient Boosting Machines (GBM), k-Nearest Neighbours (KNN), Multi-layer perceptrons (MLP) and Logistic Regression (LR).For each model, we explored a range of hyperparameters through a grid search approach.Each experiment was executed 40 times with different random seeds, resulting in training over 2300 models in total.The ROC curves illustrate the best performing models which reached a maximum validation AUC of 0.6 for the LR method.Plot (A) shows the performance of these models in the discovery cohort while plot (B) shows the performance in the validation cohort.
homogenous population (with the same diagnosis, stage of disease and treatment history) with adequate power to identify subtle methylation differences in immune cells with clinical utility.Although both donor and recipients' genetic sequence is taken into account during HLA matching, we concluded that due to the dynamic nature of the epigenome, and confounding factors listed above, this is not an appropriate approach to take when developing an epigenetic classifier of outcome in HCT.
When considering the clinical context of the development of aGVHD, it is likely the end result of a complicated clinical setting with multiple donor and recipient factors affecting the outcome.If the epigenetic pattern was highly predictive, it might infer that the occurrence of severe aGVHD is preordained just by donor factors, which seems biologically unlikely.
On a technical level, this study has also demonstrated the importance of careful development and testing of analysis pipelines for methylation studies, in particular when applying complex machine learning methods to datasets.Our initial findings indicated a robust classifier might be present within the dataset, a finding which was amplified when data was preprocessed as a single batch with subsequent splitting of the dataset and internal cross validation.While our validation dataset was of exceptionally high quality and donors included were matched to a very high degree with the discovery cohort, the classifier was not validated even with extensive optimisation and testing of alternate pipeline settings.This demonstrates that even with the identification of a promising and robust classifier in a well-designed study, independent validation is critical (Ransohoff, 2004), and such validation datasets need to be generated completely independently with unique individuals and pre-processed separately to the training/discovery cohort.This also better mimics the experimental realities of clinical classifier use, making any findings that do stand up to the validation process more robust and clinically useful.

Conclusion
In this study, we performed the definitive investigation of donorderived blood-based DNA methylation as a classifier of aGVHD outcome in HCT and found that donor DNA methylation as assessed by methylation arrays is not a strong candidate for prediction of aGVHD.It is possible that other methylation signals exist which might improve our understanding of the development of aGVHD in these cohorts, which we plan to investigate in the future.We have also highlighted the importance of study design and well-designed independent validation of methylation differences especially when applying machine learning approaches.

TABLE 1
Discovery and validation cohort characteristics.Characteristics of adult patients undergoing first allogeneic PB HCT for acute leukemia or MDS from an 8/8 HLA-matched unrelated donor between 2000 and 2016 with available donor blood samples, as reported to the CIBMTR.Restricted to Caucasian donors, myeloablative preparative regimens, no ATG/Campath and patients surviving >100 days with no aGVHD or those that developed grades III-IV aGVHD at any time post-HCT.Donors were matched between comparison groups based on sex and age by decade.

TABLE 1 (
Continued) Discovery and validation cohort characteristics.Characteristics of adult patients undergoing first allogeneic PB HCT for acute leukemia or MDS from an 8/8 HLA-matched unrelated donor between 2000 and 2016 with available donor blood samples, as reported to the CIBMTR.Restricted to Caucasian donors, myeloablative preparative regimens, no ATG/Campath and patients surviving >100 days with no aGVHD or those that developed grades III-IV aGVHD at any time post-HCT.Donors were matched between comparison groups based on sex and age by decade.

TABLE 1 (
Continued) Discovery and validation cohort characteristics.Characteristics of adult patients undergoing first allogeneic PB HCT for acute leukemia or MDS from an 8/8 HLA-matched unrelated donor between 2000 and 2016 with available donor blood samples, as reported to the CIBMTR.Restricted to Caucasian donors, myeloablative preparative regimens, no ATG/Campath and patients surviving >100 days with no aGVHD or those that developed grades III-IV aGVHD at any time post-HCT.Donors were matched between comparison groups based on sex and age by decade.

TABLE 1 (
Continued) Discovery and validation cohort characteristics.Characteristics of adult patients undergoing first allogeneic PB HCT for acute leukemia or MDS from an 8/8 HLA-matched unrelated donor between 2000 and 2016 with available donor blood samples, as reported to the CIBMTR.Restricted to Caucasian donors, myeloablative preparative regimens, no ATG/Campath and patients surviving >100 days with no aGVHD or those that developed grades III-IV aGVHD at any time post-HCT.Donors were matched between comparison groups based on sex and age by decade.