Original Research ARTICLE
HLA-A, -B, -C, -DRB1, -DQB1, and -DPB1 Allele and Haplotype Frequencies of 28,927 Saudi Stem Cell Donors Typed by Next-Generation Sequencing
- 1Saudi Stem Cells Donor Registry, King Abdullah International Medical Research Center, King Saud bin Abdulaziz University for Health Sciences, Ministry of National Guard Health Affairs, Riyadh, Saudi Arabia
- 2Department of Physiology, Istanbul Medical Faculty, Istanbul University, Istanbul, Turkey
- 3Department of Oncology, King Abdulaziz Medical City - Ministry of National Guard Health Affairs, King Abdullah International Medical Research Center, King Saud bin Abdulaziz University for Health Sciences, Riyadh, Saudi Arabia
- 4ZKRD Zentrales Knochenmarkspender–Register für die Bundesrepublik Deutschland, Ulm, Germany
- 5Department of Pathology and Laboratory Medicine, King Abdulaziz Medical City - Ministry of National Guard Health Affairs, King Abdullah International Medical Research Center, King Saud bin Abdulaziz University for Health Sciences, Riyadh, Saudi Arabia
Human leukocyte antigen (HLA) allele and haplotype frequency distribution varies widely between different ethnicities and geographical areas. Matching for HLA alleles is essential for successful related and unrelated stem cell transplantation. Among the Saudi population, data on HLA alleles and haplotypes are limited. A cross-sectional study was performed on 28,927 bone marrow donors. The most frequent HLA alleles were HLA-A*02:01:01G (20.2%), A*24:02:01G (7.5%); B*51:01:01G (19.0%), B*50:01:01G (12.3%); C*06:02:01G (16.7%), C*07:02:01G (12.2%); DRB1*07:01:01 (15.7%), DRB1*03:01:01G (13.3%); DQB1*02:01:01G (29.9%), DQB1*03:02:01G (13.2%); and DPB1*04:01:01G (35.2%), DPB1*02:01:02G (21.8%). The most frequent HLA-A~C~B~DRB1~DQB1 haplotypes were A*02:01:01G~C*06:02:01G~B*50:01:01G~DRB1*07:01:01G~DQB1*02:01:01G (1.9%) and A*02:05:01G~C*06:02:01G~B*50:01:01G~DRB1*07:01:01G~DQB1*02:01:01G (1.6%). The most frequent HLA-A~C~B~DRB1~DQB1~DPB1 haplotypes were A*02:01:01G~C*15:02:01G~B*51:01:01G~DRB1*04:02~DQB1*03:02:01G~DPB1*04:01:0G (1%) and A*02:01:01G~C*07:02:01G~B*07:02:01G~DRB1*15:01:01G~DQB1*06:02:01G~ DPB1*04:01:01G (0.9%). Based on these haplotype frequencies, we provide forecasts for the fraction of patients with full matching and single mismatched donors for 3 to 6 loci depending on the registry size. With one million donors, about 50% of the patients would find an 8/8 match and 90% a 7/8 match. These data are essential for registry planning, finding unrelated stem cell donors, population genetic studies, and HLA disease associations.
The human leukocyte antigen (HLA) system is encoded by the most polymorphic genes known, located on the short arm of chromosome 6 (1). The HLA system is divided into class I (HLA-A, -B, and -C) and class II (HLA-DR, -DQ, and -DP) loci. Recent advances in sequencing technologies helped in unraveling a vast number of new HLA alleles. So far, 27,589 different HLA allele sequences have been reported based on the latest IPD-IMGT/HLA Database 3.41.0 (2). The HLA system plays an essential role in activating the immune system through the presentation of processed antigens to CD4+ and CD8+ T cells. On the other hand, HLA is a key player in the success of organ and stem cell transplantation. Mismatch in both HLA class I and II alleles between donor and recipient is a major cause of organ rejection, graft failure, and graft-vs. host disease following hematopoietic stem transplantation (3). In addition, certain HLA alleles were found to be associated with autoimmune diseases, such as ankylosing spondylitis, Type I diabetes, Behçet's disease, celiac disease, and rheumatoid arthritis (4).
Saudi society has a very high rate (more than 70%) of consanguineous marriages (5). Together with having large families, the chances of finding an HLA-matched relative are very high. However, a proportion (30–40%) of patients cannot find a matched donor (6). Therefore, the Saudi Stem Cell donor registry (SSCDR) was launched in 2011 in Riyadh, Saudi Arabia. SSCDR is a national organization that manages unrelated stem cell donors and cord blood units as a source for allogeneic stem cell transplantation. The aim of SSCDR establishment was to provide stem cells for those patients without a matching family donor. Today, SSCDR has more than 75,000 registered donors who are willing to donate their stem cells for any patient in need worldwide. Currently, SSCDR works closely with 14 donor recruitment centers in Saudi Arabia and has facilitated more than 67 successful stem cell transplants, 40 for national patients and 27 for international patients, including from Germany, the United Kingdom, the United States, Spain, Italy, Turkey, Sweden, Norway, India, and Australia. The goal of any stem cell registry is to reflect the HLA allele polymorphism of the population it serves. This can be challenging in countries such as Saudi Arabia with high HLA diversity due to a large number of ethnic groups from different countries that have settled there over the years in the holy cities of Mecca and Madinah.
The aim of this study was to evaluate the HLA class I and class II alleles and haplotypes from the existing database of registered Saudi stem cell donors. This study is essential for stem cell transplantation programs as the level of matching between donor and recipient is very important for the outcome of hematopoietic stem cell transplantation.
Materials and Methods
A cross-sectional study was performed on 28,927 Saudi volunteers for bone marrow donors registered in the database of the SSCDR and typed for HLA-A, -B, -C, -DRB, -DQB1, and -DPB1 at a high-resolution level from 2013 to 2017. All donors included in the analysis fulfilled the eligibility criteria and were unrelated Saudi Arabs, 49% males and 51% females, 18–60 years old with an average age of 31 from different regions within the country, including Riyadh, Qasim, Dammam, Assire, Albaha, Jizan, Jeddah, and Madinah. These regions were not linked directly to each subject at that time; therefore, this cohort was not classified or grouped geographically.
HLA Class I and HLA Class II Typing
This project was approved by the local IRB at King Abdulaziz Medical City, Riyadh, National Guard Health Affair, Saudi Arabia. All donors were asked to sign a formal written consent. Blood samples, six drops on filter paper were collected. Total genomic DNA was extracted using QIAamp96 DNA Blood Mini Kit according to the manufacturer's instructions (Qiagen, Hilden, Germany). HLA-A, -B, -C, -DRB1, -DQB1, and -DPB1 genotyping was carried out by next-generation sequencing (NGS) at the laboratories of Histogenetics (Ossining, NY). Briefly, HLA alleles were typed using high-resolution (Class I: Exon 2 and 3, Class II: Exon 2) (7) using the Illumina platform (San Diego, CA). The sequencing templates (DNA libraries) for Illumina were prepared according to the manufacturer's protocols, involving two rounds of amplification. For sequence data analysis, MiSeq generates FastQ files that are used by the in-house NGSAutoTyper software to perform the final typing. HistoS software is used for further review and analysis, and the data is then transferred for final reporting to HistoTyper software. For certain typing results, we gave alternative combinations with G codes. Then, we selected the highest probable combination based on the frequency.
Homozygous typing results were controlled by the following method. For class I, exon 2 and 3 are repeated with alternative primers complementary to sequences in the middle of exon 2 and exon 3 to generate amplicons spanning mid exon 2–intron 2–exon 3 (bridge amplification). In addition known HLA-B–C linkages are checked. If the bridge amplicon or B–C association confirms the homozygosity, then results are reported. If the results indicate heterozygous typing, a 1-KB amplicon is repeated spanning the whole exon 2–intron 2–Exon 3 with PacBio sequencing technology. For class II, DRB1 is analyzed with one generic and two group-specific amplifications. DQB1 and DPB1 are analyzed with one generic and one group-specific amplification. In addition, known DRB1–DRB3/4/5 and DRB1–DQB1 linkages are checked. In the presence of base occurrence more than 300 and a known linkage, homozygous results are accepted. In the absence of the latter, sequencing is repeated by a long-range amplicon spanning exon 2–intron 2–exon 3 using PacBio technology.
Criteria for Common and Well-Documented (CWD) Alleles
In this study, the CWD alleles were defined according to CIWD 3.0.0 inclusion criteria (8). Common alleles are those with ≥1 in 10,000, and well-documented alleles are those with ≥5 occurrences. Intermediate alleles (>1 in 100,000) were not calculated due to low sample size.
The population analyses and genetic diversity measures were calculated using Arlequin 3.5 software (9). In particular, allele frequencies and three, five, and six locus haplotype frequencies were determined by the expectation-maximization (EM) algorithm (10).
A pairwise linkage disequilibrium (LD) test was performed with a Markov chain algorithm using estimated haplotypes for each individual obtained with the Excoffier–Laval–Balding (ELB) algorithm. Frequencies of common haplotypes were similar with both methods, and those with significant pairwise LD are shown in the tables. For testing the deviation from Hardy Weinberg equilibrium, a Markov chain algorithm was performed. A selective neutrality test could not be performed due to software allele and haplotype number limitations. Computed two loci (HLA-B-C and HLA-DRB1-DQB1) associations were listed as common (with observed frequency of more than 100 times) or rare (with observed frequency of <100 times).
The probability of finding a completely matching and single mismatched donor for 3–6 loci, depending on the registry size, was calculated based on the formula first shown in 1989 (11) and refined in 2014 (12). As is a common practice today (13), matching was based on the antigen recognition domain (exons 2 and 3 for class I and exon 2 only for class II).
Data used in this analysis are available in the Allele Frequencies Net Database (http://allelefrequencies.net/hla6006a.asp?hla_population=3685) and on a public website (https://www.ihiw18.org/).
HLA Allele Frequencies
A total of 103 HLA-A alleles, 154 HLA-B alleles, 76 HLA-C alleles, 96 HLA-DRB1 alleles, 37 HLA-DQB1 alleles, and 48 HLA-DPB1 alleles were found in our cohort. Table 1 shows the allele frequency for HLA-A, -B, -C, -DRB1, -DQB1, and -DPB1. For HLA class I, the following alleles were the most frequent alleles in HLA-A, -B, and -C, respectively:A*02:01:01G (20.2%), A*24:02:01G (7.5%), A*01:01:01G (7.1%), A*68:01:01G (6.3%), and A*03:01:01G (5.9); B*51:01:01G (19.0%), B*50:01:01G (12.3%), B*08:01:01G (6.9%), B*07:02:01G (5.0%), and B*53:01:01G (3.9%); C*06:02:01G (16.7%), C*07:02:01G (12.2%), C*04:01:01G (12.2%), C*15:02:01G (10.7%), and C*07:01:01G (9.7%). For HLA class II, the following alleles were the most frequent alleles in HLA-DRB1, -DQB1, and -DPB1, respectively:DRB1*07:01:01(15.7%), DRB1*03:01:01G (13.3%), DRB1*13:02:01G (6.7%), DRB1*15:01:01 (6.3%), and DRB1*13:01:01G (6.2%); DQB1*02:01:01G (29.9%), DQB1*03:02:01G (13.2%), DQB1*03:01:01G (12.1%), DQB1*05:01:01G (9.9%), and DQB1*06:03:01G (6.9%); DPB1*04:01:01G (35.2%), DPB1*02:01:02G (21.8%), DPB1*03:01:01G (11.6%), DPB1*04:02:01G (6.5%), and DPB1*17:01:01G (5.4%). All subjects were typed successfully. Four new alleles were identified in this Saudi cohort (HLA-A*02:433, HLA-A*02:434, HLA-C*14:02:13, and HLA-DRB1*14:145); the data are available on Allele frequency and published (14–16). Table 2 shows the results of the Hardy Weinberg equilibrium analysis. Heterozygosity in all six loci was observed in this large cohort significantly less than what is expected by chance.
Table 1. HLA-A, HLA-B, HLA-C, HLA-DRB1, HLA-DQB1, and HLA-DPB1 allele frequencies in the Saudi Stem cell donors registry.
An overview over the frequency distribution of all six loci is shown in Table 3 and Figures 1A,B. Apparently, HLA-DQB1 and HLA-DPB1 are much less polymorphic than the other loci, in particular due to a number of frequent alleles. HLA-A and HLA-DRB1 carry a very similar amount of information that is higher than in HLA-C but lower than in HLA-B, which is by far the most polymorphic of the six loci. As is well-known, a rare allele has to be seen at least three times in order to be more frequent than an unobserved one significantly with p < 0.05 according to the Poisson distribution. As a consequence, we have marked this frequency in Figure 1A and the corresponding cumulative frequency in Figure 1B. Figures 2A,B present two analogous plots for the haplotypes of 2–6 locus combinations frequently considered in donor–patient matching, and other combinations are shown in Supplementary Tables 1–3 and Supplementary Figures 1A,B, 2A,B. These figures demonstrate that our study covers more than 95% of all 2-locus haplotypes of the Saudi population; however, this fraction diminishes to about 80% for 4 and 5 loci and 70% for 6 loci.
Figure 1. (A) The alleles of each locus are ordered by descending frequency on the x-axis, and the curve shows their frequency. Both axes are in logarithmic scale. The dotted horizontal line shows the frequency corresponding to three copies in the sample. (B) For each number (n) on the x-axis in logarithmic scale, the curve depicts the cumulative frequency of the set of the n most frequent alleles of each locus. The closer a curve is to the left and top, the more homogeneous a population is for this locus. The dots on each curve mark the cumulative frequency of all alleles seen at least three times in the sample.
Figure 2. (A) The haplotypes of each set of loci are ordered by descending frequency on the x-axis, and the curve shows their frequency. Both axes are in logarithmic scale. The dotted horizontal line shows the frequency corresponding to three copies in the sample. (B) For each number n on the x-axis in logarithmic scale, the curve depicts the cumulative frequency of the set of the n most frequent haplotypes of each locus combination. The dots on each curve mark the cumulative frequency of all haplotypes seen at least three times in the sample.
The pairwise linkage disequilibrium (LD) parameters, frequency, observed, expected, D, D', and p values for each possible pair of two HLA alleles are estimated in Supplementary Table 4. Most alleles at HLA-A and HLA-B are in strong LD with HLA-C. The relative strength of LD between two HLA loci was calculated based on the pairwise LD parameters for all the allelic pairs. Common and rare associations between HLA-DRB1-DQB1 and HLA-B-C in the Saudi population are presented in Tables 4, 5, respectively. CWD alleles are presented in Table 6. CWD alleles represent 60% of the allelic distribution in the Saudi population. The data used in this analysis are available in the Allele Frequencies Net Database (http://allelefrequencies.net/hla6006a.asp?hla_population=3685) and on a public website (https://www.ihiw18.org/).
Table 6. Number of common (C) and well-documented (WD) alleles (CWD) in 28,927 individuals from the Saudi Stem Cell Donor Registry.
A total of 3,430 HLA-A-C-B haplotypes, 1443 HLA-DRB1-DQB1-DPB1 haplotypes, 10,665 HLA-A-C-B-DRB1-DQB1 haplotypes, and 16,394 HLA-A-B-C-DRB1-DQB1-DPB1 haplotypes were estimated. Seven HLA-A~B~DRB1 haplotypes with a frequency more than 1% are shown in Supplementary Table 5. Fourteen HLA-A~C~B haplotypes with a frequency more than 1% are shown in Supplementary Table 6. Twenty-four HLA-DRB1~DQB1~DPB1 haplotypes with a frequency more than 1% are shown in Supplementary Table 7. Seven HLA-A~C~B~DRB1~DQB1 haplotypes are shown in Supplementary Table 8. Thirteen HLA-A~C~B~DRB1~DQB1~DPB1 haplotypes are shown in Table 7 with a frequency >0.5%. Only one HLA-A~C~B~DRB1~DQB1~DPB1 haplotype was of >1% frequency in our population: HLA-A*02:01:01G~C*15:02:01G~B*51:01:01G~DRB1*04:02~DQB1*03:02:01G~DPB1*04:01:0G. A full list of all HLA-A~C~B~DRB1~DQB1 and HLA-A~C~B~DRB1~DQB1~DPB1 haplotypes (observed ≥3 times) is shown in Supplementary Table 9.
Table 7. The most Frequent HLA-A~C~B~DRB1~DQB1~DPB1 haplotypes in the Saudi Stem Cell Donor Registry (out of 3,588 haplotypes observed ≥3 times).
For haplotypic diversity, the mean expected heterozygosity was 1.382 (±0.186). In haplotype-level computations, gene diversity and average gene diversity over loci were found to be 0.999 and 0.892 (±0.476), respectively.
Applications for Matching
The 6-locus haplotype frequencies were mapped to P-codes (implicitly ignoring the rare intron-based null alleles), and then, protein-level phenotype frequencies were derived for the purpose of the matching extrapolations. Figure 3 shows that, for example, with one million donors, only about 50% of the patients would find an 8/8 match, but already 90% would get a 7/8 match. Overall, registry sizes required for identical rates of full matches for 3–6 loci are roughly in proportions of 1:2:2:10, and with one mismatch accepted, this ratio is 1:4:10:40.
Figure 3. Registry Size and Donor Identification Rates. For each hypothetical registry size, the curves show the fraction of patients finding a fully matched and single mismatched donor when considering 3, 4, 5, and 6 loci.
This is the first report on HLA-A, -B, -C, -DRB1, -DQB1, and -DPB1 alleles and haplotype frequency in a large cohort of Saudis. Our data present 28,927 Saudi individuals from different regions within Saudi Arabia. The allele and haplotype frequencies are similar to what has been reported previously from our donors (17, 18). Seven HLA-A alleles show frequencies higher than 5.0%, including A*02:01:01G, A*24:02:01G, A*01:01:01G, A*68:01:01G, A*03:01:01G, A*31:01:02G, and A*26:01:01G. Those alleles represent 57.6% of the allelic diversity observed at this locus.
For HLA-B, only four alleles show a frequency >5%. These alleles are B*51:01:01G, B*50:01:01G, B*08:01:01G, and B*07:02:01G. These four alleles account for 43.4% of HLA-B diversity in this cohort. HLA-B*50:01 is frequently seen in Arab populations. In Saudi Arabia, B*50:01:01 accounts for almost all B*50 alleles (19). In this cohort, HLA-B*50:01 is the most common B*50 allele (seen in 7170 individuals), and only 17 individuals carry the B*50:02 allele. Moreover, B*50:01:01G is also frequently seen in Caucasians, North Africans, and West-South Asians (20). HLA-B*15 exhibits the highest number of alleles as 33 different alleles are detected in this cohort; however, B*15 is not a common allele in Saudis. B*51 alleles are the second highest polymorphic with 12 different alleles—only B*51:01 is the most common. This has implications on finding a matched donor for those individuals carrying the rare B*51 or B*15 alleles.
For HLA-C, only five alleles show a frequency of >5%. The most frequent HLA-C alleles are C*06:02:01G, followed by C*07:02:01G, C*04:01:01, C*152:0, and C*07:01:01G. These five alleles account for 61.4% of HLA-C diversity in this cohort. HLA-C*07 exhibits the highest number of alleles; 13 different alleles are detected in in this cohort with C*07:02 and C*07:01 as the most common alleles. In a study comparing serology with molecular typing, we find the same HLA-C common alleles, including HLA-C*15, which was not detected by serology (21).
For HLA-DRB1, seven alleles show a frequency of >5%. The most frequent HLA-DRB1 alleles are DRB1*DRB1*07:01:01G, DRB1*03:01:01G, DRB1*13:02:01G, DRB1*15:01:015G, DRB1*13:01:01G, DRB1*04:03:0G, and DRB1*04:05. These seven alleles account for 59.6% of HLA-DRB1 diversity in this cohort. HLA-DRB1*11 and HLA-DRB1*14 exhibit the highest number of alleles; 16 different alleles are detected for each of them:15 alleles for HLA-DRB1*13 and 13 alleles for HLA-DRB1*04. Both HLA-DRB1*07:01:01G and DRB1*03:01:01G are common in the Central and Eastern regions of Saudi Arabia as previously published (17, 18). In addition, both alleles are frequent in Arabs from Tunisia and Jordan (20). HLA-DRB1*13:02:01G is also frequent in some African populations, such as Nigerian and Libyan (20).
For HLA-DQB1, seven alleles showed a frequency of >5%. These alleles include DQB1*02:01:01, DQB1*03:02:01, DQB1*03:01:01, DQB1*05:01:01, DQB1*06:03:01, DQB1*06:02:01G, and DQB1*05:02:01. These seven alleles account for 83.9% of HLA-DRB1 diversity in this cohort. HLA-DQB1*06 and HLA-DQB1*03 exhibit the highest number of alleles; 16 different alleles are detected for DQB1*06 and 10 alleles for HLA-DQB1*03. HLA-DQB1*02:01:01 and DQB1*03:01:01 are frequent in Arabs, and HLA-DQB1*03:02:01 is frequent in Arabs, Chinese, Indians, and Caucasians (20).
For HLA-DPB1, only five alleles show a frequency of >5%. The most frequent HLA-DPB1 alleles are DPB1*04:01:01G, DPB1*02:01:02G, DPB1*03:01:01, DPB1*04:02:01, and DPB1*17:01:01. These five alleles account for 80.4% of HLA-DPB1 diversity in this cohort. All top 3 DPB1 alleles are frequent in Asians, especially Chinese, and in Caucasians (20). No single DPB1 type shows diversity; this might be a reflection of the DPB1 nomenclature, which unlike the other HLA genes, does not depend on exon 2. Typing for polymorphisms within DPB1 may not be resolved at the allele level using the current NGS solutions, in which an average fragment size of 200 bp may not resolve cis/trans polymorphisms. This issue, however, can be sorted by third-generation long-read technology (22).
Hardy Weinberg equilibrium analysis shows an excess of homozygotes in this large cohort. This phenomenon was also observed previously by our group (17, 18) and others (23) and might be explained by the high consanguinity marriages in the Saudi population (5). We observed different allele and haplotype frequencies between the central and Eastern provinces of Saudi Arabia (18, 24). Thus, it is important to study different regions of Saudi Arabia independently. However, as with other populations, Saudis are not restricted to their area of origin and many move to Central, Eastern, and Western provinces seeking jobs.
The most common HLA-A~C~B~DRB1~DQB1 haplotype seen in this study is A*02:01:01G~C*06:02:01G~B*50:01:01G~DRB1*07:01:01G~DQB1*02:01:01G. This haplotype is not common in other populations; however, it is seen at low frequency in American Hispanics, Indian Tamil Nadu, and Columbia cord blood (20). The second most common haplotype is A*02:05:01G~C*06:02:01G~B*50:01:01G~DRB1*07:01:01G~DQB1*02:01:01G. This haplotype differs from the previous haplotype by the HLA-A*02:05 allele and seems to have similar distribution (20). Interesting to note are the common HLA A~C~B~DRB1~DQB1~DPB1 haplotypes; the top three haplotypes account for 3%, and all of them are A*02:01:01G-based haplotypes, thus reflecting the high frequency of A*02:01:01G, which accounts for 1/5 of the population. These three haplotypes are A*02:01:01G~C*15:02:01G~B* 51:01:01G~DRB1*04:02~DQB1*03:02:01G~DPB1*04:01:0G, A*02:01:01G~C*07:02:01G~B*07:02:01G~DRB1*15:01:01G~ DQB1*06:02:01G~DPB1*04, and A*02:01:01G~C*06:02:01G~B*50:01:01G~DRB1*07:01:01G~DQB1*02:01:01G~DPB1*04:01:01G. In addition, this result shows that there is strong LD between these alleles with a minimum of crossover events, a phenomenon worth investigating further in the Saudi population. Two locus LD studies reveal that there is a higher frequency of DPB1 association with DRB1 and DQB1, thus reflecting the observation of the extended haplotype across all class I and II gene loci.
HLA null alleles are rare in this population. The following null alleles are found: HLA-C*06:49N (two individuals), B*37:03N (one individual), A*24:312N (one individual), and A*30:76N (one individual) (Table 1).
Here, we show the number of CWD alleles. We applied CIWD catalogs 3.0 (8). CWD alleles can be calculated but not intermediate as intermediate alleles require a much larger sample size to be calculated (>1 in 100,000 sample size). CWD alleles for all loci, in our Saudi cohort are in the range of 60%. HLA-B has the highest number of common and highest number of well-documented alleles, and HLA-DQB1 has the lowest numbers for both categories. Our raw data is available through version 3.0 of the CWD catalogs (8).
In any statistical analysis, detail and precision are competing properties, which is reflected in our study by the changing coverage of the gene pool with the significantly positive alleles and haplotypes. Although we are getting more than 99.5% for all individual loci and 97–99% for all two-locus combinations, this is gradually decreasing to 72% for 6-locus haplotypes. We compare that to the German population (12) in which a sample of about 30,000 individuals would cover a very similar part of the polymorphism now well-described based on a sample of several millions.
The different spread between the full match and the one-mismatch curves in Figure 3 is primarily due to the strong linkage disequilibrium between HLA-B and -C as well as between HLA-DRB1 and -DQB1. As a consequence, single mismatches are much less likely than double mismatches. Basically, with a registry size of one million, 10/10 matches might be found for about half of the Arab patients while 60% of patients would find a 9/10 donor among only 100,000 donors. The main caveat in those theoretical calculations is that algorithms and formulae deriving haplotype frequencies from phenotypes and then applying those frequencies back to diploid individuals are all based on the concept of a Hardy Weinberg equilibrium, which our population does not fulfill due to regional subpopulations and non-random mating. On the other hand, it is probably the best extrapolation that can be made on the basis of today's data, and one of the major developments of our century in most regions of the globe will be the abrasion of deviations from HWE in all regions and at all scales.
In conclusion, the results of this study present information that can be used as a tool to identify a hematopoietic stem cell unrelated donor recruitment and selection strategy as well as a helpful tool for population genetic studies and HLA disease associations. Furthermore, knowledge of population-specific allele and haplotype frequency provides hypothetical estimation of the chances of finding matched donors in the registry. There are limitations to our study as we could not stratify our subjects geographically as we have people moving routinely between regions. However, this will be looked at carefully in the future at the time of new donor registration. This may be achieved by asking the donors about the place of birth of both parents and grandparents.
Data Availability Statement
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.
The studies involving human participants were reviewed and approved by Institutional Review Board (IRB), National Guard Health Affairs. Riyadh, Saudi Arabia. The patients/participants provided their written informed consent to participate in this study.
AH and DJ: hypothesis and research question. DJ, AA, and AH: research proposal. FU, CM, and AH: data analysis. AH, DJ, CM, and AA: discussing the data. DJ, CM, and AH: writing the manuscript. CM: providing the figures. DJ, FU, AA, CM, and AH: final paper review. All authors: contributed to the writing and analysis of the manuscript.
Conflict of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The handling editor declared a past collaboration with the authors.
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fimmu.2020.544768/full#supplementary-material
3. Dehn J, Arora M, Spellman S, Setterholm M, Horowitz M, Confer D, et al. Unrelated donor hematopoietic cell transplantation: factors associated with a better HLA match. Biol Blood Marrow Transplant. (2008) 14:1334–40. doi: 10.1016/j.bbmt.2008.09.009
6. Jawdat DM, Al Saleh S, Sutton P, Al Anazi H, Shubaili A, Tamim H, et al. Chances of finding an HLA-matched sibling: the Saudi experience. Biol Blood Marrow Transplant. (2009) 15:1342–4. doi: 10.1016/j.bbmt.2009.06.013
8. Hurley CK, Kempenich J, Wadsworth K, Sauter J, Hofmann JA, Hofmann D, et al. Common, intermediate and well-documented HLA alleles in world populations: CIWD version 3.0.0. HLA. (2020) 95:516–31. doi: 10.1111/tan.13811
9. Excoffier L, Laval G, Schneider S. Arlequin (version 3.0): an integrated software package for population genetics data analysis. Evol Bioinform Online. (2007) 1:47–50. doi: 10.1177/117693430500100003
11. Sonnenberg FA, Eckman MH, Pauker SG. Bone marrow donor registries: the relation between registry size and probability of finding complete and partial matches. Blood. (1989) 74:2569–78. doi: 10.1182/blood.V74.7.2569.2569
13. Dehn J, Spellman S, Hurley CK, Shaw BE, Barker JN, Burns LJ, et al. Selection of unrelated donors and cord blood units for hematopoietic cell transplantation: guidelines from the NMDP/CIBMTR. Blood. (2019) 134:924–34. doi: 10.1182/blood.2019001212
14. Fakhoury HA, Jawdat D, Alaskar AS, Al Jumah M, Cereb N, Hajeer AH. Two novel alleles HLA-A*02:433 and HLA-A*02:434 identified in Saudi bone marrow donors using sequence-based typing. Int J Immunogenet. (2014) 41:338–9. doi: 10.1111/iji.12131
15. Fakhoury HA, Cereb N, Jawdat D, Al Jumah M, Alaskar AS, Hajeer AH. Two novel alleles HLA-DRB1*11:150 and HLA-DRB1*14:145 identified in Saudi individuals. Int J Immunogenet. (2014) 41:340–1. doi: 10.1111/iji.12133
16. Fakhoury HA, Jawdat D, Alaskar AS, Al Jumah M, Cereb N, Hajeer AH. Three new HLA-C alleles (HLA-C*14:02:13, HLA-C*15:72 and HLA-C*15:74) in Saudi bone marrow donors. Int J Immunogenet. (2015) 42:359–60. doi: 10.1111/iji.12218
18. Jawdat D, Al-Zahrani M, Al-Askar A, Fakhoury H, Uyar FA, Hajeer A. HLA-A, B, C, DRB1 and DQB1 allele and haplotype frequencies in volunteer bone marrow donors from Eastern Region of Saudi Arabia. HLA. (2019) 94:49–56. doi: 10.1111/tan.13533
20. Gonzalez-Galarza FF, McCabe A, Melo Dos Santos EJ, Takeshita L, Ghattaoraya G, Jones AR, et al. Allele frequency net database. Methods Mol Biol. (2018) 1802:49–62. doi: 10.1007/978-1-4939-8546-3_4
22. Klasberg S, Lang K, Gunther M, Schober G, Massalski C, Schmidt AH, et al. Patterns of non-ARD variation in more than 300 full-length HLA-DPB1 alleles. Hum Immunol. (2019) 80:44–52. doi: 10.1016/j.humimm.2018.05.006
Keywords: bone marrow registry, Saudi Arabia, population genetics, haplotype frequencies, unrelated donors
Citation: Jawdat D, Uyar FA, Alaskar A, Müller CR and Hajeer A (2020) HLA-A, -B, -C, -DRB1, -DQB1, and -DPB1 Allele and Haplotype Frequencies of 28,927 Saudi Stem Cell Donors Typed by Next-Generation Sequencing. Front. Immunol. 11:544768. doi: 10.3389/fimmu.2020.544768
Received: 22 March 2020; Accepted: 18 August 2020;
Published: 22 October 2020.
Edited by:Christian Chabannon, Aix-Marseille Université, France
Reviewed by:Esteban Arrieta-Bolaños, Essen University Hospital, Germany
Martin Maiers, National Marrow Donor Program, United States
Copyright © 2020 Jawdat, Uyar, Alaskar, Müller and Hajeer. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Ali Hajeer, firstname.lastname@example.org
†ORCID: Ali Hajeer orcid.org/0000-0003-2727-9964
Dunia Jawdat orcid.org/0000-0002-3615-2909
F. Aytül Uyar orcid.org/0000-0003-0391-3780
Carlheinz R. Müller orcid.org/0000-0002-5359-9606
Ahmed Alaskar orcid.org/0000-0002-0648-3256