Genetic Diversity of HLA Class I and Class II Alleles in Thai Populations: Contribution to Genotype-Guided Therapeutics

Human leukocyte antigen (HLA) class I and II are known to have association with severe cutaneous adverse reactions (SCARs) when exposing to certain drug treatment. Due to genetic differences at population level, drug hypersensitivity reactions are varied, and thus common pharmacogenetics markers for one country might be different from another country, for instance, HLA-A*31:01 is associated with carbamazepine (CBZ)-induced SCARs in European and Japanese while HLA-B*15:02 is associated with CBZ-induced Stevens–Johnson syndrome/toxic epidermal necrolysis (SJS/TEN) among Taiwanese and Southeast Asian. Such differences pose a major challenge to prevent drug hypersensitivity when pharmacogenetics cannot be ubiquitously and efficiently translated into clinic. Therefore, a population-wide study of the distribution of HLA-pharmacogenetics markers is needed. This work presents a study of Thai HLA alleles on both HLA class I and II genes from 470 unrelated Thai individuals by means of polymerase chain reaction sequence-specific oligonucleotide (PCR-SSO) in which oligonucleotide probes along the stretches of HLA-A, -B, -C, -DRB1, -DQA1, and -DQB1 genes were genotyped. These 470 individuals were selected according to their regional locations, which were from North, Northeast, South, Central, and a capital city, Bangkok. Top ranked HLA alleles in Thai population include HLA-A*11:01 (26.06%), -B*46:01 (14.04%), -C* 01:02 (17.13%), -DRB1*12:02 (15.32%), -DQA1*01:01 (24.89%), and -DQB1*05:02 (21.28%). The results revealed that the distribution of HLA-pharmacogenetics alleles from the South had more HLA-B75 family that a typical HLA-B*15:02 pharmacogenetics test for SJS/TEN screening would not cover. Besides the view across the nation, when compared HLA alleles from Thai population with HLA alleles from both European and Asian countries, the distribution landscape of HLA-associated drug hypersensitivity across many countries could be observed. Consequently, this pharmacogenetics database offers a comprehensive view of pharmacogenetics marker distribution in Thailand that could be used as a reference for other Southeast Asian countries to validate the feasibility of their future pharmacogenetics deployment.

Human leukocyte antigen (HLA) class I and II are known to have association with severe cutaneous adverse reactions (SCARs) when exposing to certain drug treatment. Due to genetic differences at population level, drug hypersensitivity reactions are varied, and thus common pharmacogenetics markers for one country might be different from another country, for instance, HLA-A*31:01 is associated with carbamazepine (CBZ)-induced SCARs in European and Japanese while HLA-B*15:02 is associated with CBZ-induced Stevens-Johnson syndrome/toxic epidermal necrolysis (SJS/TEN) among Taiwanese and Southeast Asian. Such differences pose a major challenge to prevent drug hypersensitivity when pharmacogenetics cannot be ubiquitously and efficiently translated into clinic. Therefore, a population-wide study of the distribution of HLA-pharmacogenetics markers is needed. This work presents a study of Thai HLA alleles on both HLA class I and II genes from 470 unrelated Thai individuals by means of polymerase chain reaction sequence-specific oligonucleotide (PCR-SSO) in which oligonucleotide probes along the stretches of HLA-A, -B, -C, -DRB1, -DQA1, and -DQB1 genes were genotyped. These 470 individuals were selected according to their regional locations, which were from North, Northeast, South, Central, and a capital city, Bangkok. Top ranked HLA alleles in Thai population include HLA-A*11:01 (26.06%), -B*46:01 (14.04%), -C* 01:02 (17.13%), -DRB1*12:02 (15.32%), -DQA1*01:01 (24.89%), and -DQB1*05:02 (21.28%). The results revealed that the distribution of HLA-pharmacogenetics alleles from the South had more HLA-B75 family that a typical HLA-B*15:02 pharmacogenetics test for SJS/TEN screening would not cover. Besides the view across the nation, when compared HLA

INTRODUCTION
Human leukocyte antigen (HLA) gene is located on chromosome 6p21, which was considered the most polymorphic of human genetic system (Shiina et al., 2009). HLA encodes cell surface molecules that present antigenic peptides to the T-cell receptor (TCR) on T cells (Sette and Sidney, 1998). There are two main classes of HLA allele. HLA class I and II encode cell surface heterodimers that play a role in antigen presentation, tolerance, and self/nonself-recognition. HLA class I molecules gather peptides that have been synthesized within the individual nucleated cell, three main HLA class I genes including HLA-A, HLA-B, and HLA-C (Howell et al., 2010). Whereas HLA class II molecules gather exogenously synthesized peptide ligands by endocytic pathway and expressed with antigen-presenting cells (APCs), six main HLA class II genes including HLA-DPA1, HLA-DPB1, HLA-DQA1, HLA-DQB1, HLA-DRA, and HLA-DRB1 (Ulvestad et al., 1994). The HLA system plays a critical role in regulating the immune response, tissue or organ transplantation, autoimmunity, vaccine development, susceptibility or resistance disease, and pharmacogenomics (Anania et al., 2017;Illing et al., 2017;Petersdorf, 2017). Over the past decade, there have been reported associations between various HLA alleles and different adverse drug reactions, especially severe cutaneous adverse reactions (SCARs).
One prominent report on drug-induced SCARs is the association between HLA-B*15:02 allele and carbamazepine (CBZ)-induced Stevens-Johnson syndrome/toxic epidermal necrolysis (SJS/TEN) in Han Chinese, Thai, and Southeast Asians (Ferrell and McLeod, 2008;Locharernkul et al., 2008;Hung et al., 2010;Chang et al., 2011;Nguyen et al., 2015). Conversely, Japanese and European populations were shown to have the association of HLA-A*31:01 and CBZ-induced hypersensitivity reactions . While HLA-B*58:01 could be used as a pharmacogenetic risk prediction marker for allopurinolinduced SJS/TEN in many populations . In addition, the association between HLA class II and adverse drug reactions were reported, such as amoxicillin-clavulanate that induces liver injury was found to be associated with HLA-DRB1*15:01, HLA-DRB5*01:01, and HLA-DQB1*06:02 haplotype in European (Hautekeete et al., 1999;Lucena et al., 2011). Thus, the distributions of HLA alleles and pharmacogenetic markers that could vary among different populations might affect incidences of adverse drug reactions or drug dosage responses (Donnell PH and Dolan, 2009).
Although there is a clear need to investigate at a population level HLA alleles, a study on a distribution of Thai HLA alleles was limited. Puangpetch et al. (2015) previously reported only HLA-B polymorphisms from 986 Thai individuals. The top five of such HLA-B alleles consisted of HLA-B*46:01 (11.51%), HLA-B*58:01 (8.62%), HLA-B*40:01 (8.22%), HLA-B*15:02 (8.16%), and HLA-B*13:01 (6.95%). However, from this work, there were no reports of HLA class I and II allele in Thai population. There are other HLA alleles that play important roles in predicting various adverse drug reactions. Hence, the aim of this study was to comprehensively investigate both HLA class I (HLA-A, -B, and -C) and II (HLA-DRB1, -DQA1, and -DQB1) distribution of alleles in Thailand and the potential association with adverse drug reactions of these alleles.

Subjects
We recruited 470 unrelated healthy Thai individuals from the 4th National Health Examination Survey in Thailand during August 2008 and March 2009, and the information was obtained from National Health Examination Survey Office, Health System Research Institute, Ministry of Public Health, Thailand. The 470 Thai individuals were randomly chosen according to their selfreported origins which can be characterized into five regional groups: (n = 70) Bangkok, (n = 100) Central, (n = 100) Northeastern, (n = 100) Northern, and (n = 100) Southern. Since we want this study to represent majority of Thai people, subjects for each group must have lived in the aforementioned regions for more than three generations. Furthermore, these healthy individuals must have no history of cutaneous adverse drug reactions (CADRs). Thailand is a country located at the center of Southeast Asia, sharing boundaries with Myanmar (west), Laos (north east), Cambodia (east), and Malaysia (south). This study was approved by the Ethical Review Committee on Research Involving Human Subjects, Faculty of Medicine, Ramathibodi Hospital, Mahidol University. Written informed consent was obtained from all participants.

HLA Class I and II Genotyping
Recruited genomic DNA samples were isolated from EDTA blood using the MagNAprue Compact Nucleic Acid Isolation kits (Roche Applied Science, Mannheim, Germany). The quality of genomic DNA was measured by NanoDrop ® ND-1000 (Thermo Scientific, Wilmington, USA). HLA class I alleles, comprising HLA-A, HLA-B, and HLA-C, and HLA class II alleles, comprising HLA-DRB1, HLA-DQA1, and HLA-DQB1, were genotyped using sequence-specific oligonucleotides (PCR-SSOs). Briefly, the DNA samples obtained from patients of the five regions in Thailand were amplified by polymerase chain reaction (PCR). The PCR products were hybridized against a panel of SSO probes on coated polystyrene microspheres that had sequences complementary to the stretches of polymorphism within the target HLA class I and II alleles using the Lifecodes HLA SSO typing kits (Immucor, West Avenue, Stamford, USA). The amplicon-probe complex was then visualized using a colorimetric reaction and fluorescence detection technology by the Luminex ® IS 100 system (Luminex Corporation, Austin, Texas, USA). Interpretations of HLA class I and II alleles from the probe signals were performed using MATCH IT DNA software version 1.2.2 (One Lambda, Canoga Park, CA, USA).

Statistical Analysis
The allele frequency and statistical analyses were performed using the Arlequin program version 3.1 for Hardy-Weinberg equilibrium testing. We used SPSS Compare allele frequencies between each region in Thailand using the SPSS software for Windows version 16.0 (SPSS Inc., Chicago, IL). A given pair of each region in Thai population was determined significant difference if the p-value was less than 0.05.

Population Structure Analysis
There were only six HLA haplotypes from class I and II to be used in this population structure analysis which were not enough to investigate substructure from these 470 individuals. We employed HLA probe polymorphism signals obtained directly from each "stretch" of the six HLA haplotypes. The total number of probes used in this experiment was 403 polymorphism probes distributed across six HLA haplotypes. The numbers of probes for six HLA haplotypes are as follows: HLA-A (72 probes), HLA-B (92 probes), and HLA-C (77 probes) for HLA class I and HLA-DRB1 (91 probes), HLA-DQA1 (23 probes), and HLA-DQB1 (48 probes) for HLA class II.
We concatenated the raw data containing probe signals for each HLA-haplotype into one tab-delimited file (Supplemental text: mergeallHLA.txt) containing a 470 403 HLA-probe matrix that was used in both principal component analysis (PCA) (Chaichompoo et al., 2018) and STRUCTURE analysis (Pritchard et al., 2000;Kopelman et al., 2015).

Principal Component Analysis
Before performing PCA, the HLA-probe data entries were normalized based on z-score calculation. In particular, a probe signal X is converted to X0 = (X − X)=SD where SD represents a standard deviation value of each probe column. The normalization step was done to minimize HLA-typing batch effects. We used cal.pc.linear function with default options from KRIS R package version 1.1.1 (Chaichompoo et al., 2018) to perform PCA. The PCA visualization was done using command plot3views from KRIS R package to display three main PCA perspectives, namely, PC1 vs. PC2, PC2 vs. PC3, and PC1 vs. PC3.

STRUCTURE Analysis
The normalization of the raw probe data was done similar to PCA with an extra step to round all X′ values into integer. The conversion was done so that STRUCTURE could treat these normalized signals as a type of variation patterns similar to that of microsatellites. For STRUCTURE analysis, we used 30,000 burnin length with 80,000 MCMC iterations. The analyses were done from K = 1 to K = 10 (K represents a number of Bayesianinferred clusters in a given population) each of which had 30 repeats with the same parameter setting. The total 100 STRUCTURE analysis results were used as the input to CLUMPAK (Kopelman et al., 2015), which helped determine the optimal K based on an ad hoc quantity Delta (K) approach (Evanno et al., 2005). The STRUCTURE visualization was done using pophelper function from pophelper R package version 2.30 (Francis, 2017).
HLA-B*13:01 (IMGT/HLA ID: HLA00152) allele has been reported to be associated with dapsone and salazosulfapyridineinduced drug reaction with eosinophilia and systemic symptoms (DRESS). The frequencies of HLA-B*13:01 were similar among Thai population (p-value = 0.7450) and higher than African Americans, Caucasians, Hispanics, and North American      Tables 7, 8A). Moreover, co-trimoxazole-induced SJS/TEN associated with HLA-B*15:02 (7.66%) and -C*08:01 (10.32%) alleles was higher within Thai population than that of African Americans, Caucasians, Hispanics, and North American. Furthermore, HLA-A*33:03 allele which is associated with ticlopidine-induced liver injury appeared to be higher (11.17%) than the allele frequencies from Caucasians , Hispanics , and North American . The allele frequencies of HLA-DQB1*06:02 (IMGT/HLA ID: HLA00646) allele which is associated with amoxicillinclavulanate-induced liver injury were found to have higher frequency in African Americans and Caucasians in contrast to the allele frequency from Thai population (Table 8B).

Population Structure Analysis
We used PCA to observe potential subgroups within a given 470 Thai individuals. Instead of using only six HLA halplotypes (from both class I and II), we extract 403 polymorphism probes and used them in the analysis. PCA revealed three prominent subpopulations in which samples from Northeastern (NE:yellow) and Southern (South:pink) groups were somewhat separated (see plots of PC1 vs. PC2 and PC2 vs. PC3 in Figure 2). The third subpopulation-placed in the middle of Northeastern and Southern groups-comprised samples from Central (sky blue), North (green), and Bangkok (BKK:red). The PCA plot between PC1 vs. PC3 did not clearly show subpopulations. STRUCTURE uses Bayesian to infer/predict a contribution to potential subpopulations (K) from estimated genetic variation frequencies. In our case, 403 HLA polymorphism probe signals were used to represent the genetic variations. Since this work recruited volunteers from five regions based on recent demographic information, we set K = 10 to cover these five demographic groups. CLUMPAK reported eight as the best number of genetic groups (subpopulations) for both Evanno's (D K) and Pritchard's (likelihood K). Figure 3 shows patterns of individuals' assignments to K populations. Each row in this figure shows proportion of individuals' contribution to K subpopulations, from K = 2 to K = 9. In particular, K colors on each vertical bar reveal genetic composition (admixture).     Caucasians (n = 265)  Hispanics (n = 234)  North American (n = 187)  Asians (n = 358)  Carbamazepine HLA-B*15:02 SJS/TEN 7.66 0.20 0.00 0.00 0.00 4.87 (Nahoko and Yoshiro, 2013;Sukasem et al., 2014;Su et (Cristallo et al., 2011;Su et al., 2016;Sukasem et al., 2016) (Chantarangsu et al., 2009;Sukasem et al., 2014;Lauren et al., 2014) (Continued)    Caucasians (n = 265)  Hispanics (n = 234)  North American (n = 187)  Asians (n = 358)  Co-trimoxazole HLA-B    et al., 1996) Caucasians (n = 265) (Begovich et al., 1992) Japanese (n = 371) (Saito et al., 2000) Han Chinese (n = 10,689)  Carbamazepine HLA-DRB1 HLA-DRB1, human leukocyte antigen-DRB1; HLA-DQA1, human leukocyte antigen-DQA1; HLA-DQB1, human leukocyte antigen-DQB1; DILI, drug-induced liver injury; SJS, Stevens-Johnson syndrome; TEN, toxic epidermal necrolysis; N/A, not available. Figure 3 shows the admixture profiles for eight genetic groups (K = 8). All 470 individuals are represented by vertical bars, which are grouped according to their reported five geographical origins. Two distinct admixture patterns of Northeastern and Southern regional groups could be observed, while Bangkok and Central regional groups' admixture patterns tend to be similar.
Previous report demonstrated that Thai population contains four prominent substructures using 435,503 single-nucleotide polymorphisms (SNPs) collected from two independent studies comprising 992 Thai individuals (Wangkumhang et al., 2013). The analysis from their work revealed two important concepts: 1) the three main subpopulations are localized in Northern, Northeastern, and Southern parts of Thailand while the members from the fourth group are scattered throughout the country and 2) place of origins of Thai individuals might be discordant with the genetic similarity profile of that place. In other words, people from the north could migrate to the south and stay there for more than three generations. Similar findings were shown in our population genetic analyses in which Northeastern and Southern groups were genetically different while Bangkok and Central groups were mixed and/or scattered to Northern, Northeastern, and Southern groups. In terms of HLA-haplotype distributions over the five regions, we found that the haplotype frequencies in these five regions were slightly different. However, we did not observe novel HLAhaplotypes specific to any subregion. The PCR probes used to call these HLA-CLASS I and CLASS II rely on some known collections of HLAs and might not be able to predict novel HLA haplotypes.
Ethnic-specific genetic variation database is vital to identification of good pharmacogenetic markers in Asian countries such as CYP2C9*3 associated with PHT-induced SJS/ TEN in Taiwanese, Japanese, and Malaysians (Chung et al., 2014). Further studies should be done to confirm pharmacogenetics markers from the ethnic-specific SNP databases. Since there were only six haplotypes and 470 individuals, the challenge of this study was the data analyses obtained from the platform called PCR-SSO probe. To address the lack of genetic polymorphisms in our population genetic study, we extracted probe signals laid across six stretches of HLA. Using the probe signals, we observed some distinct as well as cline subpopulations. This discrepancy should be clarified in further study by performing high-resolution DNA typing and recruiting more Thai individuals. In this study, we identified both HLA class I and II alleles in Thai population. Furthermore, many HLA class I and II alleles were associated with pharmacogenetics markers which might appear exclusively in many populations. Particularly, a database containing distribution of specific HLA class I and II alleles will support the development of the pharmacogenetics markers for screening drug hypersensitivity reactions.

DATA AVAILABILITY STATEMENT
The datasets used in this study can be found here https://www. ebi.ac.uk/ipd/imgt/hla using the accession numbers HLA-

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Ethical Clearance Committee on Human Rights Related to Research Involving Human Subjects Faculty of Medicine Ramathibodi Hospital Mahidol University. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
PS had substantial contributions to the conception, design, analysis, and interpretation of the data, drafting the manuscript, and agrees to be accountable for all aspects of the work. PJ, TJ, NK, CC, JP, CNa, WA, ST, CN and AW had substantial contributions to the conception and analysis of the data and drafting the manuscript. CS had substantial contributions to the conception and design of the work, drafting the work, and agrees to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved and provide approval for publication of the content.