In silico Derivation of HLA-Specific Alloreactivity Potential from Whole Exome Sequencing of Stem-Cell Transplant Donors and Recipients: Understanding the Quantitative Immunobiology of Allogeneic Transplantation

Donor T-cell mediated graft versus host (GVH) effects may result from the aggregate alloreactivity to minor histocompatibility antigens (mHA) presented by the human leukocyte antigen (HLA) molecules in each donor–recipient pair undergoing stem-cell transplantation (SCT). Whole exome sequencing has previously demonstrated a large number of non-synonymous single nucleotide polymorphisms (SNP) present in HLA-matched recipients of SCT donors (GVH direction). The nucleotide sequence flanking each of these SNPs was obtained and the amino acid sequence determined. All the possible nonameric peptides incorporating the variant amino acid resulting from these SNPs were interrogated in silico for their likelihood to be presented by the HLA class I molecules using the Immune Epitope Database stabilized matrix method (SMM) and NetMHCpan algorithms. The SMM algorithm predicted that a median of 18,396 peptides weakly bound HLA class I molecules in individual SCT recipients, and 2,254 peptides displayed strong binding. A similar library of presented peptides was identified when the data were interrogated using the NetMHCpan algorithm. The bioinformatic algorithm presented here demonstrates that there may be a high level of mHA variation in HLA-matched individuals, constituting a HLA-specific alloreactivity potential.


INTRODUCTION
Graft versus host disease (GVHD) is a major impediment in achieving optimal outcomes in patients undergoing allogeneic stem-cell transplantation (SCT) from human leukocyte antigen (HLA) identical related and unrelated donors (URD) (1)(2)(3). Further, it remains unclear why with only relatively minor variation in GVHD prophylaxis, some patients with HLA-matched donors develop severe GVHD, whilst others with HLA-mismatched donors may not experience any (4)(5)(6). In HLA-matched donorrecipient pairs (DRP), a major contributor to GVHD occurrence are the peptides encoded by loci outside the major histocompatibility (MHC) locus on chromosome 6. These peptides, functionally defined as minor histocompatibility antigens (mHA), are presented by specific HLA molecules and are responsible for both the clinically beneficial graft versus tumor responses, and the deleterious GVHD (7)(8)(9)(10). As of 2012, around 49 mHA recognized by CD4+ or CD8+ T lymphocytes have been described (11). Further complicating this problem is the HLA specificity of various mHA, and the heterogeneity observed in the HLA distribution in various populations across the world (12,13). Therefore, in order to understand the biology and role of mHA in generating GVHD, it is critical to quantify the extent of genetic variation between individuals.
Exploring genetic variation outside the MHC locus is also important to understand why, with relatively simple adjustments to the treatment protocols patients successfully engraft when transplanted with HLA-mismatched donors. This is true for both URD umbilical cord blood transplant, and related haploidentical SCT (6). Moreover, completely HLA-mismatched solid organ transplants result in successful engraftment, albeit with low-level life-long immunosuppression. Furthermore, organs, such as kidney and heart tissues, are prone to rejection when transplanted; yet, these organs are seldom targeted in GVHD, even in its chronic form, which affects nearly all organ systems. This makes it imperative to understand the role of mHA in generating alloreactivity, and the extent to which the magnitude of genetic variation outside the MHC locus contributes to allograft complications, such as GVHD or graft rejection.
To examine these quantitative relationships, whole exome sequencing of SCT donor and recipients genomes was performed to measure the antigenic variability existing between them (14). A large number of single nucleotide polymorphisms (SNP) were identified between donors and recipients. These differences were classified as, either possessing, a GVH vector, polymorphisms present at loci in the recipient and absent in the donor, or, a HVG vector, present in the donor and absent in the recipient. The large number of SNPs in the exome, termed alloreactivity potential, suggests that in all individuals undergoing SCT, there is a very high probability of there being peptides, which may function as mHA. However, given the observed frequency of GVHD, seemingly, not all of these SNPs would lead to immunogenic peptides being generated, to yield clinically relevant mHA responses. This may be because, for HLA class I molecules on an antigenpresenting cell to present a peptide to an effector T lymphocyte, first, the endogenous protein must be cleaved by the proteasome, then the resulting peptides must bind HLA class I molecules to be presented. This would initiate either an immune response or result in tolerance, depending on the cellular and cytokine milieu at the time of antigen presentation (15).
It is possible to determine the genetic variation between SCT recipients and donors, and to then bioinformatically determine the amino acid sequence of peptides resulting from SNPs encountered in their exomes. Further, bioinformatic techniques have been developed to determine which peptide antigens may be presented by specific HLA molecules. The Immune Epitope Database (IEDB; http://www.iedb.org) has characterized hundreds of thousands of peptides that can bind several hundred MHC complexes. From this large dataset, researchers have developed tools to predict peptide-HLA binding probabilities (16). Initially, matrix-based methods such as stabilized matrix method (SMM) (17) were developed to determine binding affinities. More recently, neural network-based algorithms such as NetMHC can use binding information from neighboring residues to predict dissociation constants between HLA molecules and putative mHA (18). Finally, "pan-specific" algorithms have developed that are able to predict peptide-binding HLA alleles with limited experimental binding data (19).
In this paper, the putative mHA in HLA-matched DRP and the in silico determined HLA class I binding affinity of these peptides is explored utilizing a bioinformatic approach based on exome sequencing of donors and recipients of SCT. The algorithm developed, lays a framework for future analysis of large SCT patient cohorts, and defines a personalized HLA-specific alloreactivity potential. The alloreactivity potential concept is analogous to the idea of potential energy in physics, i.e., the stored energy in a system. Thus, HLA-specific alloreactivity potential would give an estimate of the likelihood that GVHD or graft rejection may develop in a HLA-matched DRP in the absence of immunosuppression. Our work demonstrates that the number of potentially immunogenic peptides varies considerably across HLA-matched related (MR) and URD, constituting a large alloreactivity potential.

WHOLE EXOME SEQUENCING
Patients with recurrent hematological malignancies enrolled in a Virginia Commonwealth University Institutional Review Board approved protocol (Clinicaltrial.gov identifier: NCT00709592) were included in this study. To identify all the potentially immunogenic differences that exist in a SCT DRP, whole exome sequencing was performed on previously cryopreserved DNA from the donors and recipients enrolled in this study as previously described (14).
Of the nine DRP examined, four were from HLA-A, B, C, and DRB1 MRD, and 5 from URD. Histocompatibility testing was performed using high-resolution typing for both HLA class I ( Table 1) and HLA class II loci (not shown). The whole exome sequence of individual donors and recipients was compared both within pairs, and to a reference genome to identify all the SNPs, which were subsequently characterized as either synonymous or non-synonymous. Next, all the non-synonymous SNP (nsSNP) present in the recipient, but absent in the donor were identified, and designated as possessing a graft versus host (GVH) vector (nsSNP GVH ).

DERIVING HLA-SPECIFIC ALLOREACTIVITY POTENTIAL
To derive the amino acid sequence of the oligopeptides, i.e., potential mHA, resulting from these nsSNPs and their binding affinity to the relevant HLA in each DRP, a bioinformatics pipeline was developed. This pipeline has the following components: (1) determine nsSNP GVH between the exomes of transplant donors and recipients; (2) generate putative immunogenic peptides in silico from these genomic differences; and (3) analyze the binding affinity of these polymorphic peptides to the HLA in that individual (Figure 1). This third step estimates the likelihood of these peptides to be presented by the six patient-specific HLA class I molecules to determine candidate mHA. A complete description of this bioinformatic pipeline follows.

CREATION OF PEPTIDE LIBRARIES
All the nsSNP GVH for each DRP were exported as variant call files (VCF) to the ANNOVAR software package (20). Next, using the DB SNP130 database and hg18 genome coordinates of the nsSNP GVH , amino acid sequences of the putative peptides were generated using the "seq_padding" option of the "annotate_variation" function in ANNOVAR. Endogenous peptides are presented by HLA class I molecules, and the average length of peptides binding HLA Frontiers in Immunology | Alloimmunity and Transplantation Starting with donor and recipient whole exome sequence data, non-synonymous SNP with a GVH vector (nsSNP GVH ) were identified, and peptide fragments generated using the ANNOVAR software package. These peptides, together with HLA data ( Table 1) were then analyzed with IEDB SMM and NetMHCpan algorithms separately. Individual DRP binding data were then analyzed and candidate mHAs cataloged.
class I is 9 amino acids. Therefore, for each polymorphism,ANNO-VAR returned 8 amino acids on either side of the nsSNP GVHencoded amino acid, resulting in a 17-mer peptide. This effectively generated nine nonamers from each nsSNP GVH -encoded polymorphism; thus, the resulting peptides would have the polymorphic amino acid at positions 1 to 9, from the C-to the N-terminal position (Figure 1).

IN SILICO VARIANT PEPTIDE-HLA BINDING AFFINITY DETERMINATION
The 17-mer peptides generated by ANNOVAR resulting from the nsSNP GVH were analyzed by the IEDB-MHC I-peptide binding prediction tools version 2.9.1, downloaded from (http://tools. immuneepitope.org/analyze/html_mhcibinding20090901B/down load_mhc_I_binding.html). Nine oligopeptides were created for each 17-mer peptide using a 9-mer sliding window. The binding affinity of each of these 9-mers to the patient-specific HLA-A, HLA-B, and HLA-C ( Table 1) were determined by running each 9mer independently through the IEDB-MHC I prediction software. The output of this iterative process included variables, such as, the gene name and coordinates, the polymorphic peptide sequence, and the calculated IC50 value via the SMM algorithm (a partial example of output in Table S1 in Supplementary Material). IC50 values in nano-Molar (nM) represent the concentration of the test peptide, which will displace 50% of a standard peptide from the HLA molecule in question. The lower the IC50 for a peptide, the stronger the binding affinity of that peptide for the HLA in question. The cutoff in our analysis to classify a putative peptide as being presented by HLA, is an IC50 of <500 nM (intermediate affinity binding; http://tools.immuneepitope.org/mhci/help/). Those peptides that bound to HLA with an IC50 of <50 nM were designated strongly presented (high affinity binding).
To validate the findings from the SMM algorithm, the ANNO-VAR generated 17-mer peptide libraries were next interrogated using the NetMHCpan software (http://www.cbs.dtu.dk/services/ NetMHCpan/). To accomplish this, two software programs were developed to analyze the peptide data and query NetMHCpan remotely. The first program sequentially sent packets of 30 protein sequences to NetMHCpan. The protein sequences were sent in order by patient and HLA, and a sliding 9-mer window was selected to interrogate HLA binding, similar to SMM IEDB algorithm. NetMHC then returned html results, which were then stored on the local server. The second program examined the returned html results and organized it in a comma-separated-value (.csv) file, which could then be opened in Microsoft Excel for further analysis.
Results from the SMM IEDB algorithm and NetMHCpan were compared in each DRP by HLA loci and polymorphic peptides. Specifically, HLA locus and polymorphic peptide were combined to make a single variable within each patient dataset, allowing for the removal of duplicate peptides and identification of unique polymorphic peptides found by both or one methods. Presented and strongly presented polymorphic peptides were compared between the two methods, and then combined to get a comprehensive list of unique polymorphic peptide-HLA complexes for each patient.

DERIVING HLA-SPECIFIC ALLOREACTIVITY POTENTIAL
Given the large number of peptides strongly binding HLA identified in each DRP, area under the curve for the IC50 of the strongly binding peptides was determined to summarize the data. The peptide-HLA IC50s were plotted in an ascending order (descending order of affinity). First the non-linear distribution function of the peptides up to an IC50 of 100 nM was computed (a polynomial function of the second order). To obtain the area under the curve depicting the peptide-HLA complexes and their corresponding dissociation constants, the definite integral of the curve was determined. The definite integral by definition is the area of the x-y plane bounded by the curve Eq.
www.frontiersin.org where f(x) denotes the function of the curve and a and b are the bounds on the x-axis, i.e., the lowest value of the IC50 recorded and the cutoff chosen.

TISSUE EXPRESSION OF POLYMORPHIC PEPTIDES
Relative gene (and protein) expression level is a critical factor contributing to HLA class I presentation of a peptide derived from the gene (21). To investigate the tissue-specific expression of genes incorporating presented peptides, software from the European Bioinformatics Institute, Illumina Body Map, (http://www.ebi.ac. uk/arrayexpress/experiments/E-MTAB-513/) was used to correlate presented peptides from the peptide library with relative gene expression in different tissues represented in this software.

CREATION OF POLYMORPHIC PEPTIDES
Whole exomes of 9 SCT DRP were sequenced, identifying an average of 6,445 nsSNP between donors and recipients. To determine the nsSNP that would be associated with possible mHA, peptide sequences were generated that incorporated the polymorphic amino acid at each position 1 to 9, in non-americ peptides using the ANNOVAR software. Theoretically, this could yield nine different peptides for each SNP (Figure 1). However, a nsSNP near either the 3 or 5 end of a sequence of a gene (N or C terminus of a protein) would lead to fewer peptides. The ANNOVAR output yielded on average 486,463 potential peptides encoded by nsSNPs and presented by the six HLA molecules in these patients (range: 1,043,514-366,426 peptides/DRP). This output was generally greater than the calculated possibilities since it also included peptides resulting from splice variants of the various proteins bearing SNP encoded amino acids. In all, these peptides constituted the total pool of variant peptides, which may be immunogenic in a DRP (Figure 2).

HLA-SPECIFIC ALLOREACTIVITY POTENTIAL
The 9-mer peptides bearing the polymorphic amino acid, with a GVH vector were then analyzed for their binding affinities to the individual HLA class I in each patient to determine the variant peptides potentially presented to the donor T-cells. The IEDB-SMM HLA class I binding prediction algorithm was utilized to calculate the binding affinity of the peptide output from ANNO-VAR, and to rank putative mHA for their ability to be presented by individual HLA. After filtering for splice variants and duplicate peptide representation in the dataset, a median of 18,396 (range: 1,926-72,294) peptides were identified that bound HLA-A, -B, and -C with an intermediate affinity (IC50 < 500 nM) in the nine DRP, Frontiers in Immunology | Alloimmunity and Transplantation and were designated as presented. Further, a median 2,254 (177-21,548) peptides were predicted to bind MHC class I with a high affinity (IC50 < 50 nM) and were designated as strongly presented (Figure 2). When separated by the donor type (MRD, n = 4, versus URD, n = 5), the HLA-matched unrelated DRPs had a significantly higher number of both presented and strongly presented peptides as determined by IEDB SMM algorithm (P = 0.016; Mann-Whitney U test) (Figure 3). The difference in the number of presented peptides between unrelated and related donors corroborated the large alloreactivity potential identified earlier in these donor types by whole exome sequencing (14). To summarize the mass of information regarding the numerous HLA-binding peptides and their binding affinities, the peptides were ranked according to their binding affinity, that is, the IC50 values, and the distribution of their binding affinities was determined (Figure 4). For the analysis reported here, this operation was performed without filtering duplicate peptide-HLA complexes resulting from splice variants. Area under the curve (AUC; nM•Peptide) for each DRP was then computed for peptides with an IC50 up to 100 nM. Once again, marked differences were observed in the calculated AUC between MRD and URD ( Table 2). This summarized measure hypothetically represents a HLA-specific alloreactivity potential for each unique DRP, and may be considered as an example of the cumulative mHA differences observed between the HLA-matched donors and recipients.
In a further analysis, when the reciprocal of the IC50 for each peptide (a more direct numerical reflection of the binding affinity) was plotted for each peptide, a Power distribution Lower IC50 values correspond to greater binding affinity between putative peptide and relevant HLA. IC50 distribution is non-linear and described as a polynomial function of the second order, forming a continuum. Marked difference observed between MRD and URD (see Table 2) for the AUC calculated from these plots.  was observed, analogous to T-cell clonal frequency distribution previously reported ( Figure S1 in Supplementary Material) (23).

VERIFYING HLA BINDING AFFINITY OF THE VARIANT PEPTIDE LIBRARY IN UNIQUE DRP
To confirm the IEDB-SMM algorithm findings, a second peptide-HLA binding affinity prediction tool, NetMHCpan was used to interrogate the variant peptide libraries from the unique DRP and its output compared with the IEDB SMM. The NetMHCpan yielded a median of 3,962 peptides categorized as presented and 989 peptides as strongly presented in the nine DRP studied (MRD versus URD, P = 0.063 and 0.11, respectively, Mann-Whitney U test) ( Table 3). The IEDB-SMM and NetMHCpan datasets were then combined and unique peptide-HLA complexes predicted to be presented by both algorithms determined (shared peptides). The median number of shared unique peptides presented/DRP was 2,065 (range: 417-4,881) ( Table 3). A representative data table depicting peptide sequences and respective IC50 values for binding to a single HLA locus, in a patient, predicted by both algorithms is given in Table S1 in Supplementary Material. Plotting the IC50 of unique presented peptide-HLA complexes derived utilizing both algorithms, demonstrated not only a very large number of complexes identified by both algorithms, but also that a large proportion of these complexes were categorized as strongly presented ( Figure 5). Furthermore, a weak, but significant correlation was identified between the IC50 predictions for both the algorithms in the shared peptide-HLA complex datasets (N = 9, median Pearson's correlation coefficient R = 0.62, P < 0.01). Additionally, when the distribution of peptides presented on the three class I HLA loci was examined, no discernable preference for particular HLA loci was observed in terms of likelihood of peptide presentation ( Figure S2A,B in Supplementary Material), except for a possible HLA-C dominance in URD recipients in the SMM algorithm.

TISSUE DISTRIBUTION OF PEPTIDES
For a peptide to be relevant in terms of its contribution to GVHD risk, in addition to its potential for presentation on the relevant HLA in a specific DRP, the relevant protein needs to be expressed in the tissues. When the putative mHA (presented peptides, IC50 < 500 nM) were cataloged, according to the tissuespecific expression of the genes they were derived from, most organ systems had genes with potential mHA (Figure 6). Further, although several antigens are expressed in organs, such as, colon, liver, and lungs, frequent target organs for GVHD; a large number of genes bearing potentially antigenic peptides are also expressed in other organ systems such as the kidney and adipose tissue seldom targeted by GVHD (Table S2 in Supplementary Material).

DISCUSSION
Allogeneic SCT represents a unique model system to study donor T-cell responses to neo-antigens encountered in the recipient. However, clinical transplantation is characterized by a vast repertoire of variant antigens, which in theory would result in a complex expansion of the T-cell repertoire (24,25). The findings reported here provide a direct estimate of the antigenic variation, which may be encountered by the donor cytotoxic T-cell (CTL) populations following SCT. Starting from nsSNPs in the exomes of donors and recipients, the reported analysis determined the resulting variant nonameric peptides and gave an in silico estimate of the binding affinity (reflected by the IC50) of these peptides to the relevant HLA in the transplant recipients. The existence of this very large library of immunogenic peptides in HLA-matched DRP, immediately raises the question as to why only some and not all the patients develop GVHD. If all the peptides in this large library of potential mHA were presented to non-tolerant T-cells, then GVHD would potentially develop in all SCT patients, particularly with URD, where the magnitude of immunogenic peptides is considerably larger than MRD. Supporting this notion is the observation that development of extensive chronic GVHD in patients is relatively common when conventional immunosuppressive regimens are used. Further, our findings offer a possible explanation for why most patients develop GVHD, despite having HLA identical donors, and do so more frequently when the donors are unrelated (26,27). Alternatively, the large magnitude of mHA between HLA-matched donors also gives an insight into why patients undergoing HLA-mismatched transplants such as haploidentical or mismatched URD transplants have Frontiers in Immunology | Alloimmunity and Transplantation FIGURE 5 | Unique peptide-HLA complexes (GVH vector) with IC50 < 500 nM predicted by both SMM and NetMHCpan. Scatter plots depict the IC50 for unique polymorphic peptide-HLA complexes predicted by the two different algorithms studied. Each circle corresponds to a unique peptide-HLA complex, with color depicting specific HLA. A large number of patient-HLA-specific strong-binding peptides identified by both programs, using SNP data derived from exome sequencing. Only shared peptide-HLA complexes predicted to have an IC50 < 500 nM by both algorithms included. www.frontiersin.org clinical outcomes, which are not dramatically different from those of HLA-MR donors, that is, if appropriate GVHD prophylaxis is used in the first few weeks of the transplant (28,29). This paradox may be understood, if one considers the mHA as the targets for GVHD and HLA as the mediators of this phenomenon. Thus, if the number of targets is relatively similar in HLA-matched and haploidentical-related donor, and in the HLA-matched andmismatched URD transplant recipients; the difference introduced by HLA mismatching is overcome by adjustments in the GVHD prophylaxis regimens. One may postulate that even though thousands of immunogenic peptides are present, the conditions at the time of transplantation determine eventual outcome following transplant, that is, whether tolerance will develop or GVHD ensue following the initial interaction between recipient mHA-HLA complexes and donor T-cells. As an example, when the proteasome inhibitor bortezomib is added to the conditioning regimen, by inhibiting peptide generation and consequently diminishing antigen presentation to donor T-cells in the very first weeks of the transplant, it reduces the risk of GVHD in URD SCT (6). If the model outlined above is correct, then the enormous magnitude of immunogenic peptides constituting the HLA-specific alloreactivity potential will constitute an antigenic"pressure"upon the non-tolerant donor T-cells when first encountered, influencing the evolving T-cell repertoire following SCT. This antigenic pressure may be mitigated by agents, which influence either antigen presentation (e.g., bortezomib) or the T-cell response (e.g., anti-thymocyte globulin, calcinuerin inhibitors, mycophenolate mofetil, post-transplant cyclophosphamide). An observation from this dataset that supports this hypothesis is that the frequency distribution of the binding affinities of the peptides to the HLA molecules follows the Power law ( Figure S1 in Supplementary Material). This frequency distribution is similar to the T-cell clonal frequency distribution observed when T-cell clonality is measured using high-throughput T-cell receptor β sequencing (23). This suggests that the T-cell repertoire and clonal frequency emerging after SCT may be proportional to the antigenic peptide-HLA binding affinities. Thus, peptides strongly bound to the HLA will elicit a strong T-cell clonal response, if they engage a T-cell receptor and appropriate co-stimulation is provided. And since the peptide antigen binding affinities form a continuum, rather than discrete clusters of high and low affinity, the T-cell repertoire frequency similarly forms a continuum, described by the Power law. Another conclusion to be considered from the non-discrete distribution of peptide-HLA binding affinity is that other non-recipient derived antigens, such as pathogen-associated peptides may also lie on this continuum. This may result in cross-reactivity between autologous antigens and pathogen-associated peptides (30). A manifestation of this in the transplant setting is the triggering of GVHD or graft rejection events by viral infections, such as cytomegalovirus or human herpes virus 6 virus infections (31,32).
Can these findings be used to develop a clinically relevant model for allogeneic SCT? One possible explanation of the variant outcomes following SCT is that post-transplant emergent T-cell clones either develop tolerance to the many antigens encountered or fail to do so depending on the milieu encountered in the host. Early interventions, such as administration of anti-thymocyte globulin, (33) bortezomib, or post-transplant cyclophosphamide have a large impact on late post-transplant outcomes. Similar tolerance induction is observed following cellular interventions such as regulatory T-cell infusion and conditioning, which up regulates NK-T-cells at the time of SCT (34). This suggest that if a large antigenic pressure from the HLA-specific alloreactivity potential exists in all patients, then tissue injury and cytokine milieu at the time of SCT may be influential in determining the development of GVHD. Thus, if there is tissue injury following SCT, even if it is sub-clinical, multiple antigens are presented, then in the absence of Frontiers in Immunology | Alloimmunity and Transplantation Whole exome sequencing identifies all the nsSNP with a GVH vector, yielding a putative alloreactivity potential, which may be a function (f ) of the cumulative influence of these polymorphisms. This is represented as a series, listing the sequence of polymorphic exome loci. Substituting individual nsSNP GVH in the equation by peptide-HLA binding affinity (reciprocal of IC50)*relative expression level of the gene bearing the nsSNP GVH (for each HLA molecule) yields the HLA-specific alloreactivity potential, in this Re is the relative expression of protein with nsSNP GVH and resulting peptides. In this series, the expression, Re p1 *(1/IC50 P1-HLA-A1 ) for each specific peptide-HLA complex, hypothetically represents the T-cell clone-specific AP. Multiple peptides constituting this series then drive a proportional oligoclonal T-cell expansion in GVHD, as many different mHA are presented by the HLA in an individual, the final distribution conforming to the Power law. Since T-cell clonal expansion in response to presented antigens may be influenced by factors such as tissue injury, cytokine milieu, and immunosuppression intensity; the GVHD likelihood, and its phenotype may in turn be determined not only by the ubiquitous mHA but also by the tissue volume and its state (inflammation/injury), and most importantly time at which organ injury/inflammation occurs relative to T-cell infusion.
adequate immunosuppression, the T-cell repertoire that develops results in the development of GVHD. On the other hand, if tissue injury is minimized and there is adequate immunosuppression, when the initial T-cell antigen-presenting cell interactions take place, peripheral (or central) tolerance would emerge. Following that, depending on the presence or absence of thymic tissue, Tcell clones developing from infused stem cells may perpetuate this process based on the prevailing T-cell population and target-tissue antigen presentation, perhaps influenced by the state of tissue injury (Figure 7). In such a model, inflammation provoked by the acute GVHD initiated by infused donor-derived T-cells reacting to recipient antigens is perpetuated in the form of "auto-reactivity" by the T-cells, developing from infused stem cells in the absence of normal thymic processing. This concept may not be novel in itself; however, our model provides a biologically plausible explanation reconciling mHA differences observed in HLA-matched DRP.
Correlating the variant peptides with tissue protein expression levels, in our dataset, the immunogenic peptides appear to be uniformly distributed in the major organ systems of the body. This raises the following question: why do solid organ transplant recipients develop rejection, but GVHD does not commonly affect most such organs, such as the kidney and heart? The data presented in this paper suggest a possible answer to this question if the above quantitative model of immunobiology of transplantation is considered. Hypothetically, in the days following SCT, when the infused donor T-cells encounter widespread variant www.frontiersin.org immunogenic recipient antigens in inflamed tissues with a large tissue interface for T-cell antigen-presenting cell interaction, i.e., skin, GI mucosa, liver, and lungs, there is a corresponding polyclonal T-cell allo-immune response, which may result in GVHD affecting the targeted organs. In contrast, the relatively smaller tissue interface in the absence of direct injury, in organs such as the heart and kidney, do not trigger an immunogenic response in the face of an ongoing, competing oligoclonal T-cell response elicited by the larger organ systems with injury. When solid organ transplantation is performed, tissue injury even if sub-clinical, in the transplanted organ resulting from the transplant procedure serves as the injury stimulus triggering graft rejection. Based on these data, a theoretical model has been proposed to investigate the notion of alloreactivity potential and its relationship with GVHD onset and propagation over time as in a "chaotic dynamical system" (35).
A potential therapeutic application of this analysis would be the ability to "titrate" the intensity of immunosuppressive therapy in the peri-transplant period based on the magnitude of the HLAspecific alloreactivity potential. This study supports the need for intensive immunosuppression in patients undergoing URD allogeneic SCT, making this algorithm a useful analysis for treatment planning (36). For example, if a patient has a high number of predicted mHA and these are over-represented in lung tissue, therapies can be specifically tailored for that patient and symptoms of lung GVHD treated more promptly. Alternatively, large-scale protein expression studies by Ponten et al. concluded that most proteins are expressed in most tissues, although in varying quantities (37). This raises the question of which parameter plays a larger role in peptide presentation by MHC class I HLA: the absolute molar amount of protein expressed in a tissue, or the binding affinity for a particular peptide; in theory, it may be a combination of the two (Figure 7).
As with any in silico work, this work can only be considered preliminary and the peptide-HLA class I combinations predicted in our work, will need experimental verification. Acknowledging this limitation, it should be noted that the accuracy of these algorithms has been reviewed and they have been found to be useful predictors of HLA presentation. A similar large number of peptides binding HLA in EBV-transformed B cell lines have been identified when directly characterizing the "ligandome" presented by these cells (38). Further, in a vaccinia virus challenge mouse model, the NetMHC algorithm was able to predict epitopes responsible for 95% of the CTL response with an IC50 threshold of <500 nM (39). Similarly, Armistead et al. found that with an IC50 threshold of <500 nM, all peptides predicted by SMM-IEDB algorithm bound HLA-A 0201 in their assays (40). To put our data in context, a database from all known nsSNPs that had been deposited in NCBI's dbSNP database is presented in Figure 2 and is labeled as all possible mHA in human beings (22,41). In light of these findings, it is not at all surprising that we find a large library of immunogenic mHA in each DRP, and there may exist a similar alloreactivity potential mediated by HLA class II.
In conclusion, the findings reported here demonstrate that whole exome sequencing, followed by in silico peptide generation and HLA binding affinity determination reveal a large and previously unmeasured HLA-specific alloreactivity potential. This potential is predictably larger in patients undergoing URD SCT and mirrors previously described T-cell clonal frequency distribution. We posit that these methodologies may be used to develop mathematical models to better understand the immunopathology of SCT from both HLA-matched and mismatched donors and may in the future allow more precise titration of the immunosuppression intensity in individual transplant recipients.