A Model of Minor Histocompatibility Antigens in Allogeneic Hematopoietic Cell Transplantation

Minor histocompatibility antigens (mHAg) composed of peptides presented by HLA molecules can cause immune responses involved in graft-versus-host disease (GVHD) and graft-versus-leukemia effects after allogeneic hematopoietic cell transplantation (HCT). The current study was designed to identify individual graft-versus-host genomic mismatches associated with altered risks of acute or chronic GVHD or relapse after HCT between HLA-genotypically identical siblings. Our results demonstrate that in allogeneic HCT between a pair of HLA-identical siblings, a mHAg manifests as a set of peptides originating from annotated proteins and non-annotated open reading frames, which i) are encoded by a group of highly associated recipient genomic mismatches, ii) bind to HLA allotypes in the recipient, and iii) evoke a donor immune response. Attribution of the immune response and consequent clinical outcomes to individual peptide components within this set will likely differ from patient to patient according to their HLA types.


INTRODUCTION
Acute and chronic GVHD and recurrent or progressive malignancy (i.e., "relapse") represent major outcomes that determine the success of allogeneic hematopoietic cell transplantation (HCT). Acute and chronic GVHD reflect immune-mediated injury mediated by donor cells in recipient tissues, and to some extent, relapse represents a lack of immune-mediated attack on malignant cells that remain viable in the recipient after the pretransplant conditioning regimen. Donor T cells can recognize both major and minor histocompatibility antigens (mHAgs) in HLA-mismatched recipients, but in HLA-matched sibling recipients, donor T cells recognize only mHAgs (1,2). Minor histocompatibility antigens originate from source proteins or peptides that are processed through the MHC class I and class II antigen processing pathways for binding to major histocompatibility complex (MHC) molecules and presentation to T cells (1,2). DNA sequence and structural variation between siblings generates mHAg mismatching between recipients and donors. Amino acid differences resulting from single nucleotide polymorphisms (SNP), insertions and deletions (indels) and other types of variants represent well-recognized mechanisms that generate mHAgs (1)(2)(3)(4).
The past decade has seen rapid progress in the use of high throughput methods and computational biology to identify mHAgs in humans. Analysis of HLA-identical sibling pairs has estimated the proportion of nonsynonymous peptides that bind to HLA-class I molecules and the proportion of HLA-class I peptides that are polymorphic (5), and simulations with publicly available data have estimated the numbers of polymorphic peptides with strong binding to HLA-class I molecules in sibling and unrelated donor-recipient pairs (6). Others have identified mHAgs by screening patient-derived T cell clones against a panel of sequenced targets expressing common HLA alleles (7). A proteogenomic approach identified mHAgs expressed by hematopoietic tissues but not by other tissues as potential targets for graft-versus-leukemia effects (8). Although the diversity of peptides bound to class I and II HLA-molecules did not correlate with outcomes after HCT (9), the extent of genome-wide mismatching between siblings has been correlated with the risk of severe acute GVHD (10) and chronic GVHD (11). More sophisticated computational algorithms have been designed to predict the overall in vivo alloreactive T cell response (12), but their accuracy and utility have not yet been evaluated, and only limited progress has been made toward implicating individual mHAgs in outcomes after HCT in clinical studies (13).
The current study was designed to identify individual graftversus-host genomic mismatches associated with altered risks of acute or chronic GVHD or relapse after allogeneic HCT between HLA-identical siblings. We initially focused on genomic variants that could produce peptides predicted to bind HLA-A*02:01, the most frequent MHC allele in our study cohort. Associations observed in HLA-A*02:01-postive recipients but not in HLA-A02-negative recipients in a discovery cohort were tested for replication in an independent cohort of HLA-A*02:01-positive recipients. A separate MHC-agnostic analysis was used to identify recipient mismatch associations without taking HLA restriction into consideration.

Study Population
All recipient and donor blood samples were collected before HCT according to research protocols approved by the Fred Hutchinson Cancer Research Center (FHCRC) Institutional Review Board (IRB) or the National Marrow Donor Program (NMDP). Project-specific IRB approval was obtained for the use of clinical data and research biospecimens.
The FHCRC cohort included 1868 HLA-A, B, C, DRB1, DQA1, DQB1, DPA1, DPB1-matched donor-recipient sibling pairs with mostly European ancestry (~83%) who had a first allogeneic HCT with marrow or growth factor-mobilized blood cells at the FHCRC and Seattle Cancer Care Alliance from 1990 through 2011. Recipients treated with T-cell depleting antibodies or high-dose cyclophosphamide after HCT and those with syngeneic or cord blood donors were excluded. A single prior autologous HCT was allowed. Conditioning regimens were categorized as myeloablative or nonmyeloablative according to the intensity of chemotherapy and total body irradiation. Indications for HCT included hematological malignancy or myelodysplasia. Donors and recipient 4-digit typing of HLA-A, B, C, DRB1, DQB1, DPA1 and DPB1 alleles was determined as described previously (10). We used a cohort from the Center for International Blood and Marrow Transplant Research (CIBMTR), a research collaboration between the NMDP and the Medical College of Wisconsin to test replication of HLA-A*02:01 mHAg GWAS associations discovered in the FHCRC cohort. The CIBMTR cohort consisted of 838 HLA-A, B, C, DRB1, DQB1-matched sibling pairs with self-identified European ancestry and at least 1 HLA-A*02:01 allele. Patients in this cohort had HCT from 2008 through 2018.
Sample Preparation, Genotyping, Imputation and QA/QC Details regarding preparation of genomic DNA samples from donors and recipients for the FHCRC cohort and the related DNA amplification, genotyping platforms, hybridization, genotyping, imputation algorithms, quality control and quality assurance have been described previously (10). Targeted sequencing of variants selected for replication using the CIMBTR sample pairs was done by the Genomics & Bioinformatics shared resource at FHCRC using the AmpliSeq PCR workflow on the MiSeq platform (Illumina). Joint genotyping was done using the GATK pipeline 4.1.8.1 following the best practice workflow (14)(15)(16). Briefly, picard 2.18 (17) was used to process Fastq and Bam files to generate analysis-ready reads. BWA 0.7 (18) was used to map paired-end reads to reference genome hg19, and GATK 4.1.8 was used to call variants for each sample (BaseRecalibrator, ApplyBQSR and HaplotypeCaller in GVCF mode) followed by joint-calling (GenomicsDBImport and GenotypeGVCFs). Quality assurance, quality control and variant filtering followed recommendations for targeted sequencing from the GATK website (19)(20)(21). We further removed variants with high allelic ratios, ancestry outlier samples identified by clustering in the space of the first two principal components, male samples with high X chromosome heterozygosity and sample pairs whose observed identity-bydescent relationship did not match the expected full-sibling relationship. Whenever possible, HLA-A*02:01 replication testing with CIBMTR samples was done with 2 different variants representing each discovery in case the primary assay failed.
Variants were analyzed in 2 categories according to whether they are predicted to encode differences in the amino acid sequence of an annotated protein in the recipient when compared to the donor, hereafter termed "coding" variants, or not (hereafter termed "noncoding" variants). The coding category included variants whose alleles were predicted to have missense, inframe insertion or deletion, frameshift, start lost, stop gained or lost, stop retained, incomplete terminal codon, or protein altering effect on the overlapping transcript as predicted by the Ensembl Variant Effect Predictor (VEP) (26). The annotations were extracted from the Annotation Explorer application hosted on Biodata Catalyst (https://biodatacatalyst.nhlbi.nih.gov/) (27).
For each variant, recipient mismatching was evaluated separately for the major allele and the minor allele. For a variant with major and minor alleles "a" and "b", pairs with "bb" donors and "ab" or "aa" recipients were categorized as mismatched for the major allele, and pairs with "aa" donors and "ab" or "bb" recipients were excluded to avoid confounding by mismatching for the minor allele. Similarly, pairs with "aa" donors and "ab" or "bb" recipients were categorized as mismatched for the minor allele, and pairs with "bb" donors and "aa" or "ab" recipients were excluded to avoid confounding by mismatching for the major allele. Recipient allele mismatch associations (RAMAs) for each outcome were based on cause-specific hazard ratio analysis using Cox regression comparing pairs with recipient mismatching versus those without mismatching for a given major or minor allele, treating death as a competing risk for all endpoints and relapse as a competing risk for acute and chronic GVHD.
In the discovery phase of the HLA-A*02:01 analysis, RAMAs were tested by proportional hazards analysis in the FHCRC cohort using 824 HLA-A*02:01-positive donor-recipient pairs with at least 1 HLA-A*02:01 allele and in 929 HLA-A02 supertype-negative donor-recipient pairs who did not have HLA-A*02:01 or other HLA-A02 supertype alleles (A*02:XX, 68:02, 68:15, 68:57, 68:28 or 69:01) (28). Coding RAMAs with likelihood ratio p-values ≤.01 in HLA-A*02:01-positive pairs and p-values ≤.01 for A*02:01 interaction in a combined HLA-A*02:01-positive and HLA-A02negative analysis were identified as discoveries that could be attributed to HLA-A*02:01-specific mHAgs. The number of noncoding RAMAs was much larger than the number of coding RAMAs. Therefore, the threshold p-values for likelihood ratio tests and interaction tests used to identify noncoding discovery candidates were set at ≤10 -4 . Discovery RAMAs were tested in the CIBMTR replication cohort with Bonferroni adjustments for multiple comparisons. For Bonferroni adjustment, acute GVHD, chronic GVHD and relapse were treated as separate analyses, with grade 2-4, 2b-4, 3-4 acute GVHD and stage 2-4 gut GVHD considered as a single category. In addition, discoveries with expected hazard ratios >1.0 for acute and chronic GVHD and <1.0 for relapse were analyzed separately from discoveries with unexpected hazard ratios <1.0 for acute and chronic GVHD and >1.0 for relapse.
In a separate MHC-agnostic analysis, we screened for RAMAs independent of whether the donor and recipient pairs had any specific HLA allotype. For this purpose, the FHCRC cohort was randomized at a 3:2 ratio into discovery and replication cohorts with 1125 and 743 pairs, respectively. We initially identified coding RAMAs with likelihood ratio p-values <1.0 x 10 -3 and noncoding RAMA with p-values <1.0 x 10 -5 as discovery candidates. Because the number of candidates was too large to tolerate Bonferroni adjustment for multiple comparisons, we used p-values <1.0 x 10 -4 for coding RAMAs and 1.0 x 10 -6 for noncoding RAMAs as more stringent thresholds to prespecify discovery candidates for replication testing.
Identification of Missense Proxies, Linkage Disequilibrium Groups, and Peptides Encoded by Variant Alleles, and Prediction of Processed Peptide Binding to HLA Allotypes European ancestry linkage disequilibrium (LD) groups of genomic variants with pairwise r 2 ≥0.70 were identified by the LDmatrix tool (ldlink.nci.nih.gov) (29). The LDproxy tool (ldlink.nci.nih.gov) was used to identify external missense proxy variants with r 2 ≥0.70 in the European population to be included in linkage groups for analysis of HLA-A*02:01 RAMAs and to identify external proxy variants predicted to encode non-annotated open reading frames for analysis of MHC-agnostic RAMAs. When necessary, the LDproxy tool was also used to identify proxies with r 2 ≥0.95 that could be used as backup assays for the primary variant being tested for replication in the CIBMTR cohort. Peptides up to 13 residues upstream and downstream from the amino acid encoded by genomic variants (i.e., source peptides) were recovered from Haplosaurus (ensembl.org) (30), the dbSNP database (ncbi.nlm.nih.gov/snp) (31) or from manual reading of the UCSC genome browser (genome-euro.ucsc.edu) (32). Some source peptides were shorter than 27 residues due to start or stop codons. The source peptides were then analyzed for predicted binding of 8 to 14-mer processed peptides to selected MHC class I allotypes by NetMHCpan4.1 at the default setting (33). The binding predictions are based on both the estimated binding affinity percentile rank and on publicly available data from mass spectrometry analysis of peptides eluted from HLA molecules. Processed peptides predicted to bind HLA-A*02:01 but not containing the variant residue were culled from further analysis. Table 1 summarizes demographic, clinical and transplant characteristics of patients in the study cohorts. Patients in the CIBMTR cohort were older than those in the FHCRC cohorts, fewer had chronic myeloid leukemia and low-risk diseases, and more received non-myeloablative conditioning regimens and growth factor-mobilized blood cell grafts. These differences reflect more recent HCT in the CIBMTR cohort than in the FHCRC cohorts. Figure 1 shows the cumulative incidence frequencies of acute GVHD, chronic GVHD and relapse in the study cohorts. The cumulative incidence frequencies of grades 2-4 and grades 3-4 acute GVHD were higher in the FHCRC cohort (0.64 and 0.17, respectively) than in the CIBMTR cohort (0.36 and 0.10, respectively). The cumulative incidence frequencies of chronic GVHD and relapse were similar in the 2 cohorts ( Figure 1).

HLA-A*02:01-Positive Recipient Allele Mismatch Association Discoveries
The initial discovery analysis identified 604 variants associated with an outcome in the HLA-A*02:01-positive cohort and having a statistical interaction with the absence of HLA-A*02:01 in the combined HLA-A*02:01-positive and negative cohorts, indicating that these associations could be attributed to HLA-A*02:01-specific mHAgs ( Figure 2 and Table S1). Among the 604 discoveries, 197 were coding variants and 407 were non-coding variants. To characterize noncoding variants more accurately according to the presence or absence of any associated coding variants, we identified 82 external missense proxies with r 2 ≥0.70 as additional RAMA discoveries. Of the 686 variants, 527 were combined into 93 linkage groups with pairwise r 2 ≥0.70, and 159 did not have r 2 ≥0.70 for association with any other variant in the data set. The 93 linkage groups contain between 2 and 105 members. Among the 686 RAMAs, 279 involved a variant that encodes at least 1 peptide, yielding a total of 284 unique source peptides ( Table S2).

Prediction of Peptides That Bind HLA-A*02:01
To validate the use of NetMHCpan4.1 (33) to identify source peptides that generate processed peptides predicted to bind to HLA-A*02:01, we tested its performance with a set of 49 previously identified genomic variants reported to encode mHAgs presented by HLA-A*02:01 (4,8). The NetMHCpan4.1 algorithm predicted HLA-A*02:01 binding for all but 3 of the source peptides tested (Table S3). In 4 cases, the predicted peptide differed from the reported peptide, but the NetMHCpan4.1 algorithm predicted alternatives that approximated the reported peptide. Although this analysis could not determine the false-positive rate, the results show that the NetMHCpan4.1 algorithm has a low false-negative rate in predicting the binding of processed peptides to HLA-A*02:01.
Of the 284 unique source peptides identified in our discovery analysis, 102 have at least 1 processed peptide that i) contains 8 to 14 residues, ii) includes the amino acid encoded by the genomic variant, and iii) is predicted to bind to HLA-A*02:01 ( Figure 3 and Table S4). These 102 source peptides produce a total of 188 unique processed 8 to 14-mer peptides that are predicted to bind to HLA-A*02:01. Of these, 47 have a 9-mer HLA-A*02:01-binding pattern or sequence logo characterized by having leucine, isoleucine, or methionine at position 2 and valine, leucine, isoleucine, or methionine at position 9 (34,35).

Replication Testing of HLA-A*02:01 Discoveries
For purposes of replication, 372 RAMAs were removed from the list of 686 discoveries and external proxies, so that no linkage group contained more than 2 variants, and 137 assay proxies with r 2 >0.95 were added to enable backup testing of singleton RAMAs that did not fit within any linkage group. A total of 49 RAMAs were not evaluated due to lack of an appropriate assay or because the results did not pass QC after testing ( Figure 4 and Table S5). Grade 2b GVHD could not be ascertained in the CIBMTR cohort because gut staging was not available. Due to differences in the cumulative incidence frequencies of acute GVHD in the discovery and replication cohorts, we tested grades 2-4 GVHD in the CIBMTR cohort for replication of grades 2b-4 GVHD discoveries in the FHCRC cohort. We also tested replication for grades 3-4 GVHD, chronic GVHD and relapse, and excluded 117 RAMAs for grade 2-4 GVHD and stage 2-4 gut GVHD from replication testing. Of the 285 RAMAs tested for replication, 121 represented backup assays within a linkage group, leaving 164 independent genomic signals (IGS). Among the 164 IGS, 83 (51%) had hazard ratios >1.0 as expected for acute and chronic GVHD or <1.0 as expected for relapse, while the remainder had unexpected hazard ratios <1.0 for acute or chronic GVHD or >1.0 for relapse. Each IGS was characterized according to whether any member of its linkage group encoded a source peptide, whether any processed peptides were predicted to bind to HLA-A*02:01, and whether any processed peptides had an HLA-A*02:01 sequence logo as defined above. Of the 164 IGS, 133 contained at least one coding RAMA. Among the 133, 57 contained at least one RAMA that was predicted to produce processed peptides that bind to HLA-A*02:01, and 30 of the 57 contained at least one RAMA that was predicted to produce a processed peptide with an HLA-A*02:01-binding sequence logo.
Among the 30 IGS associated with at least 1 HLA-A*02:01 sequence logo peptide, 16 had hazard ratios >1.0 as expected for acute and chronic GVHD and <1.0 as expected for relapse. These IGS with expected hazard ratios were prespecified to be evaluated formally for replication with statistical correction for multiple comparisons. These pre-specified IGS all had p-values >.05 for replication ( Table 2). We also evaluated the 14 IGS that were associated with at least 1 HLA-A*02:01 sequence logo peptide but had unexpected hazard ratios. Likewise, these IGS all had p-values >.05 for replication ( Table 2).
One IGS result that was not pre-specified drew scrutiny because the replication p-values for the primary and backup assays were 0.001 and 0.004, respectively. The replication hazard ratios for the association of recipient minor allele mismatching of rs62572859 and rs12554984 with the risk of grade 3-4 acute GVHD were 0.12 [95% confidence interval (CI), 0.02-0.84)] and 0.14 (95% CI, 0.02-0.99), respectively. In addition to the implausibility of the direction, the hazard ratios reflect a result based on an expectation of only 7 events in a few dozen patients (10% incidence of grade 3-4 acute GVHD, 9% probability of recipient allele mismatching for these variants, and~800 genotyped pairs).
Previously Identified HLA-A*02:01-Associated Minor Histocompatibility Antigens Are Not Associated With GVHD or Relapse After HCT With the test for statistical interaction as a criterion, the 604 discovery RAMAs and 82 external missense proxies in Table S1 included only 2 of the 49 previously identified variants that produce HLA-A*02:01-associated mHAgs (rs9051-G and rs9051-C) in Table S3. Very few of these 49 HLA-A*02:01-associated mHAgs have been tested for association with HCT outcomes. Therefore, we evaluated these alleles as candidates for association with all 6 HCT outcomes. For this analysis, the FHCRC HLA-A*02:01-positive cohort was randomized at a 3:2 ratio into discovery and replication cohorts with 486 and 338 donor and recipient pairs, respectively. Associations with p-values ≤.05 in the discovery cohort were tested in the replication cohort. The discovery analysis identified 17 RAMAs, 6 with expected hazard ratios and 11 with unexpected hazard ratios (Tables S6). These RAMAs all had p-values >.05 for replication ( Table 3). Table S7 summarizes result for the association of the 49 RAMAs with the 6 outcomes in the combined FHCRC discovery and replication cohorts to enable future meta-analysis with results from other cohorts.

HLA-A*02:01-Associated mHAg Source Peptides Produce Processed Peptides Predicted to Bind to Other HLA-A and B Allotypes
Minor histocompatibility antigens originate from source proteins or peptides that are processed through the MHC class I and class II antigen processing pathways for binding to MHC molecules (1, 2). To address the possibility that the source peptides that produce HLA-A*02:01-associated mHAgs could also produce processed peptides predicted to bind to other HLA allotypes, we used the NetMHCpan4.1 algorithm at the default setting to identify 9-mer peptides that include the variant amino acid residue and were identified as strong binders having a predicted affinity <500 nM for binding to the same HLAallotypes listed above. Twenty-six (53%) of the 49 tested source peptides produced processed peptides that met these stringent specifications ( Figure 5B and Table S9). Elution scores ranged from 0.33 to 0.96, indicating a high likelihood that these peptides have been detected in mass spectrometry studies. Binding score percentile ranks ranged from 0.01 to 1.03, indicating strong binding to the respective HLA allotype, and predicted binding affinities ranged from to 6.5 to 383 nM (Table S9). At least one processed 9-mer peptide was predicted to have strong binding to HLA-  Figure 5B). Four processed peptides were predicted to bind to 2 of the 10 tested HLA allotypes, 2 were predicted to bind to 3 of the 10 tested allotypes, and 1 was predicted to bind 4 of the 10 tested allotypes. No processed 9-mer peptides that include the

Discovery and Replication of MHC-Agnostic RAMAs
The test for statistical interaction with HLA-A*02:01 as a criterion for discovery eliminated many possible mHAgs from consideration. To address this limitation, we used an MHCagnostic approach to identify mHAgs associated with outcomes in the FHCRC cohort (10). The initial screen identified 802 variants, 403 with expected hazard ratios and 399 with unexpected hazard ratios ( Table 4 and Table S10). Pruning for linkage disequilibrium (LD) yielded 285 IGS, 142 with expected hazard ratios and 143 with unexpected hazard ratios. By using FIGURE 4 | Replication testing of recipient allele mismatch associations (RAMAs) identified in the discovery cohort. Testing in the replication cohort was limited to no more than 2 members of each linkage group constituting an independent genomic signal (IGS). Discovery hazard ratios >1.0 for acute and chronic GVHD and <1.0 for relapse were categorized as expected, while discovery hazard ratios <1.0 for acute and chronic GVHD and >1.0 for relapse were categorized as unexpected. Each linkage group was characterized according to whether any variant in the group codes a source protein, is predicted to produce a processed peptide that binds to HLA-A*02:01 or to generate a peptide with an HLA-A*02:01 sequence logo.
10-fold smaller p-value thresholds, we prespecified 28 signals to be tested for replication in categories defined according to whether the discovery hazard ratio was expected or not and whether the IGS included coding variants or not. None of the 17 prespecified signals with unexpected hazard ratios met criteria for replication (Table S11). Of the 11 prespecified signals with expected hazard ratios, 1 signal met criteria for replication with Bonferroni adjustment for multiple comparisons, and 1 other signal came close to meeting criteria (Table S11). Table S12 summarizes result for the association of the 802 MHC-agnostic RAMAs with the 6 outcomes in the combined FH discovery and replication cohorts to enable future metanalysis with results from other cohorts. The prespecified signal that met criteria for replication contained 2 variants, rs56040842 and rs61851500. The replication hazard ratios for the association of recipient minor allele mismatching of rs56040842 and rs61851500 with the risk of grade 2b-4 acute GVHD were 3.38 (95% CI, 1.9-6.1; p = .0006) and 2.71 (95% CI, 1.5-4.8; p = .004), respectively. These hazard ratios reflect results based on an expectation of only 9 events in a score of patients (40% incidence of grade 2b-4 acute GVHD, 3% probability of recipient allele mismatching for these variants, and~7 50 genotyped pairs). According to LDproxy, the signal defined by recipient minor allele mismatching of rs56040842 and rs61851500 contains only 1 other variant in LD with pairwise r 2 ≥ 0.7 (rs61851533), and all 3 variants are in a PRKG1 intron. Because intron sequences can encode non-annotated proteins (43)(44)(45), we used the ExPASy Translate tool (web.expasy.org/ translate) to identify open reading frames that include the variant allele and encode an amino acid sequence that differs from the major allele. With this process we identified 4 such open reading frames ( Table S13), none of which map to an annotated coding region as determined by the BLATP tool (ensembl.org). We used NetMHCpan4.1 to determine whether any of the potential source peptides are predicted to produce processed 8-11mer peptides with elution scores >0.05 and predicted percentile ranks ≤2.0 for binding to highly prevalent HLA-A and B allotypes as  Figure 6A and Table S14).
One other variant that came close to meeting criteria for replication was rs17811627. The signal containing this variant    (45), we used the process described above to identify 30 open reading frames that include the minor allele and encode an amino acid sequence that differs from the major allele (Table S15), none of which map to an annotated coding region as determined by the BLATP tool. NetMHCpan4.1 screening identified a total of 151 8-11mer peptides each with predicted binding to at least 1 of the 12 HLA-A and B allotypes tested ( Figure 6B and Table S16). Each tested HLA-allotype was predicted to bind between 13 and 28 peptides, and 11 of the 151 peptides were predicted to bind between 3 and 5 of the HLAallotypes tested.

DISCUSSION
Our initial approach toward identification of mHAgs associated with outcomes after HCT was based on the simplifying assumption that mHAgs presented by HLA-A*02:01 could be efficiently identified by their association with clinical outcomes in recipients with at least 1 HLA-A*02:01 allele but not in other recipients. Our results show that this assumption is belied by LD and by promiscuous binding of HLA-A*02:01 peptide ligands to other HLA allotypes within and across HLA-A and B supertypes. We used rigorous peptide binding predictions to prespecify the most plausible discovery associations to test for replication. Even so, with 1 possible exception, none of the HLA-A*02:01 discovery associations was replicated in an independent cohort, whether prespecified or not.
We cannot exclude the possibility that differences between the FHCRC cohort and the CIBMTR cohort contribute to the lack of replication. On the other hand, previous studies have not identified clinical variables that interact with the association between recipient HLA-mismatching and the risks of GVHD. Regardless of the underlying disease, year of transplant, intensity of the conditioning regimen and other clinical variables, recipient HLA-mismatching is consistently associated with an increased risk of GVHD. Therefore, we have no reason to expect that clinical variables interact with the effects of recipient mHAg mismatching.
Other factors more likely explain our inability to identify RAMAs with acute GVHD, chronic GVHD or relapse after allogeneic HCT. First, the effect size of any GVHD associations is not likely to exceed 2.0. In one large study, the hazard ratios for the association of recipient HLA mismatching with grade 2-4 acute GVHD and chronic GVHD after related HCT at our center were 1.7 (95% CI, 1.49-2.03) and 1.24 (95% CI, 0.95-1.60), respectively (46). The increased risk of GVHD can be explained by the cumulative effect of T cell responses against recipient-specific HLA-epitopes and the distinct repertoires of peptides presented by the mismatched HLA allotypes in the recipient. Hence, the incremental effect of any given recipient mHAg mismatch must be smaller than the effect of an HLA mismatch. Second, the probability of recipient allele mismatching averages around 10% and is combined with event frequencies of 10-40%. With such small sample sizes, confidence  intervals are wide. Third, although in silico tools have some ability to predict peptide binding to specific HLA-allotypes, binding per se does not necessarily predict an immune response or an altered clinical outcome even if an immune response were to occur. Whether the risks of acute and chronic GVHD and relapse reflect a few mismatches that have large effects or many mismatches each having a small effect is not known, although our results suggest that the latter may be more plausible than the former, making it unlikely that mHAg matching could be used for donor selection or risk stratification. We did not expect to find a high proportion of discovery RAMAs with hazard ratios <1.0 for acute or chronic GVHD or >1.0 for relapse. No mechanism is available to explain how an MHC class-1presented mHAg could prevent immune responses against a wide variety of other class 1-presented mHAgs. It is conceivable that members of a linkage group tagged by a genomic variant could encode class II mHAgs that selectively trigger Treg cells. If so, the clinical outcomes could depend on the overall balance between activation and regulation of alloactivated donor T cells, but we have no evidence to support this speculation.
Beyond the unexpected hazard ratio, the HLA-A*02:01 results for rs62572859 and rs12554984 are difficult to explain. Although the minor allele of rs62572859 encodes a peptide, it is not predicted to bind HLA-A*02:01, and rs12554984 is an intron variant with no open reading frames that include the variant allele. Forty-five other variants have pairwise r 2 values >0.70 for association with rs62572859 in Europeans, but none of these is a coding variant. We cannot exclude the possibility that this association is explained by non-annotated open reading frames but given the unexpected hazard ratio and the low numbers of informative patients, we caution that this result should be considered as a statistical outlier until proven otherwise.
Our MHC-agnostic analysis does not allow any definitive conclusions regarding the association of individual genomic variants with the risk of acute GVHD. Results for the signal tagged by rs56040842 are based on low numbers of informative patients, raising concern that it represents a statistical outlier, even though this association was replicated in an independent cohort. PRKG1 protein is expressed in the skin, stomach and colon, tissues that are targets of acute GVHD. Results of the NetMHCpan4.1 analysis suggest that peptides corresponding to intronic open reading frames are associated with a variety of HLA-A and B allotypes, and the BLATP results exclude the possibility that these peptides can be attributed to annotated proteins. Results for the signal tagged by rs17811627 are based on larger numbers of informative patients, but the p-values did not meet the prespecified Bonferroni-adjusted threshold of statistical significance. The predicted HLA-A and B-binding peptides cannot be attributed to annotated proteins, but no evidence is available to demonstrate that any non-annotated open reading frames encoded within the intergenic region tagged by rs17811627 are expressed in GVHD target tissues.
Our results beg the question of how mHAgs should be defined. Under in vitro conditions, an individual mHAg can be identified as a complex composed of a defined-peptide and a defined-MHC allotype, which evokes an immune response. Efforts are in progress to determine whether T cells that recognize selected mHAgs could be used to eliminate malignant cells in HCT recipients (47)(48)(49)(50)(51)(52). From the perspective of allogeneic HCT between a pair of HLA-identical siblings, a mHAg manifests as a set of peptides originating from annotated proteins and nonannotated open reading frames, which i) are encoded by a group of recipient genomic mismatches in high LD, ii) bind to HLA allotypes in the recipient, and iii) evoke a donor immune response. Attribution of the immune response and clinical outcomes to individual components within a peptide x HLA allotype matrix as large and complex as the one tagged by rs17811627 would require extensive in vitro studies, and the results would likely differ from patient to patient according to their HLA types. Our results also support the possibility that peptides from annotated proteins and non-annotated open reading frames (44,45,53,54) both contribute to immune-mediated outcomes after HCT between HLAidentical siblings.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: BioSample, phs001918.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Fred Hutchinson Cancer Research Center. Written informed consent to participate in this study was provided by the participants or their legal guardian/next of kin.

AUTHOR CONTRIBUTIONS
JH, PM, DL, and BS developed the study concept and design. DL and BS managed data acquisition, variant imputation, informatics analyses, quality control and statistical analyses. DL and XZ provided HLA imputation and genomic mismatch calculations. DJ and BH provided a database of variant annotations. BN and DG assisted with identification of source proteins. SS provided samples for the CIBMTR cohort. CS and FW provided genotyping of CIBMTR samples. PM, DL, and BS interpreted results and wrote the manuscript. All authors other than JH revised draft manuscripts and approved the final version of the manuscript. JH died on July 31, 2019.