High-Resolution HLA Typing of HLA-A, -B, -C, -DRB1, and -DQB1 in Kinh Vietnamese by Using Next-Generation Sequencing

Human leukocyte antigen (HLA) genotyping displays the particular characteristics of HLA alleles and haplotype frequencies in each population. Although it is considered the current gold standard for HLA typing, high-resolution sequence-based HLA typing is currently unavailable in Kinh Vietnamese populations. In this study, high-resolution sequence-based HLA typing (3-field) was performed using an amplicon-based next-generation sequencing platform to identify the HLA-A, -B, -C, -DRB1, and -DQB1 alleles of 101 unrelated healthy Kinh Vietnamese individuals from southern Vietnam. A total of 28 HLA-A, 41 HLA-B, 21 HLA-C, 26 HLA-DRB1, and 25 HLA-DQB1 alleles were identified. The most frequently occurring HLA alleles were A∗11:01:01, B∗15:02:01, C∗07:02:01, DRB1∗12:02:01, and DQB1∗03:01:01. Haplotype calculation showed that A∗29:01:01∼B∗07:05:01, DRB1∗12:02:01∼DQB1∗3:01:01, A∗29:01:01∼C∗15:05:02∼B∗07:05:01, A∗33:03:01∼B∗58:01:01∼DRB1∗03:01:01, and A∗29:01:01∼C∗15:05:02∼B∗07:05:01∼DRB1∗10:01:01∼DQB1∗05:01:01 were the most common haplotypes in the southern Kinh Vietnamese population. Allele distribution and haplotype analyses demonstrated that the Vietnamese population shares HLA features with South-East Asians but retains unique characteristics. Data from this study will be potentially applicable in medicine and anthropology.


INTRODUCTION
Human leukocyte antigen (HLA) genes, which encode major histocompatibility complex proteins in humans, are located in the short arm of chromosome 6 (Alper et al., 2006). These encoded HLA proteins are displayed on the cell surface and can be classified into two distinct classes. Class I HLA proteins (A, B, and C) present intracellular antigens originating from viruses or tumors to cytotoxic T lymphocytes. Class II HLA proteins (DR, DQ, and DP) present extracellular antigens to T-helper cells. HLA genes are highly polymorphic and play an important role in immune-mediated diseases, tumor-development processes, transplanted organ or tissue survival determination, and drug hypersensitivity (Dawson et al., 2001;Dhaliwal et al., 2003;Hung et al., 2005;Avila-Rios et al., 2009;Chen et al., 2015;Thao et al., 2018).
HLA genotyping is a complex procedure due to the extreme degree of polymorphism in the major histocompatibility complex family. The most polymorphic regions, known as the core exons, are exons 2 and 3 in HLA class I genes and exon 2 in HLA class II genes. The sequences of the core exons are the most popular targets for genotyping as they are believed to be essential determinants of antigen specificity, which is informative for transplantation. However, in population genetic and evolutionary studies, many polymorphisms in other exons, introns, and UTRs have been identified and contribute to creating HLA nomenclature (Marsh and WHO Nomenclature Committee for Factors of the Hla System, 2012). Currently, HLA typing is performed using DNAbased methods, including SSP-(sequence-specific primer), SSO-(sequence-specific oligonucleotide), and RFLP-PCR (restriction fragment length polymorphism polymerase chain reaction) and sequence-based typing (SBT) (Tait et al., 2009;Bontadini, 2012;Erlich, 2012). SBT was considered the gold-standard method for high-resolution HLA genotyping, although this technique may produce uncertain results due to insufficient sequencing and ambiguous haplotype phasing (Erlich, 2012). Recent advancements in next-generation sequencing (NGS) technologies have significantly impacted the HLA-typing process (Abbott et al., 2006;Bentley et al., 2009;Erlich et al., 2011;Erlich, 2012;Shiina et al., 2012;Hosomichi et al., 2013Hosomichi et al., , 2015Schöfl et al., 2017). These new approaches can overcome the usual phase ambiguity of HLA alleles and enable massive, parallel, high-resolution HLA-typing. Different NGS-based HLAtyping methods have been established, such as amplicon-based HLA sequencing (Boegel et al., 2012;Shiina et al., 2012;Hosomichi et al., 2013;Schöfl et al., 2017), target enrichment of HLA genes (Wittig et al., 2015), and whole exome or genome sequencing data-derived typing (Liu et al., 2012;Major et al., 2013).
Only a few studies have been performed to analyze HLA allele and haplotype frequency in the Vietnamese population (Vu-Trieu et al., 1997;Busson et al., 2002;Hoa et al., 2008). Moreover, these studies failed to present detailed HLA information due to low-resolution or incomplete loci description. There is an urgent need for an HLA-typing procedure that can yield accurate and detailed HLA allele distribution. Previous studies have investigated HLA allele distribution among the Kinh population in northern Vietnam, but this study aimed to perform highresolution HLA typing (3-field) via NGS and determine the frequency of specific alleles and haplotypes of HLA-A, -B, -C, -DRB1, and -DQB1 in southern Kinh Vietnamese populations.

Subjects
A descriptive, cross-sectional study was conducted involving 101 unrelated healthy individuals. All subjects, who originated from Ho Chi Minh City and the surrounding Mekong delta provinces, were self-identified as Kinh Vietnamese and were recruited at the University of Medicine and Pharmacy, Ho Chi Minh City, Vietnam from August to October 2017. The study was approved by the Ethics Committee of the University of Medicine and Pharmacy at Ho Chi Minh City, Vietnam. All subjects were counseled and provided written informed consent for the study.

DNA Extraction
Venous blood (2 ml) was collected from each subject using an EDTA anticoagulant tube. Genomic DNA was extracted from peripheral blood leukocytes using the QIAamp DNA Mini Kit (Qiagen, Hilden, Germany) according to the manufacturer's protocol, and samples were stored at −20 • C until analysis.
Genomic DNA quality was assessed by measuring absorbance at 260 nm using a NanoDrop 2000 (Thermo Scientific, MA, United States), and the optical density (OD) ratio (260/280 nm) was calculated to evaluate sample purity. The recommended purified genomic DNA concentration (≥30 µg/µL) and OD ratio (≥1.8) for library preparation were ascertained.

Library Preparation
The HLA TruSight kit (CareDx, Brisbane, CA, United States) was used for library preparation. Library construction began with a long-range PCR for full-length HLA-A, -B, -C, -DRB1, -DQB1 loci. All amplicons were normalized to prevent sequencing bias between samples by using magnetic beads consisting of carboxy-coated paramagnetic particles (Hawkins et al., 1994). The beads bound saturating amounts of DNA, and the DNA concentration was normalized to a similar concentration across samples after the washing and elution steps (Hosomichi et al., 2014). Subsequently, the DNA amplicons were fragmented into approximately 2-kb pieces, indexed, and pooled for sequencing on the MiniSeq platform (Illumina, San Diego, CA, United States). The pooled library was quantitated before loading on MiniSeq as the library concentration determines cluster density, which is an important parameter for data quality. As instructed in the Illumina protocol, a Qubit 3.0 fluorometer (Thermo Scientific, Waltham, MA, United States) was used for library quantitation. The pooled library was loaded unto the MiniSeq system when its concentration was ≥10 ng/µL.

Sequencing
Next-generation sequencing was performed via the MiniSeq system. Each sample was examined for average depth of coverage and Q30 quality scores, which were >200 and 85, respectively, for all five loci. The sequences were subsequently analyzed using an Assign TruSight HLA v2.0 (CareDx, Brisbane, CA, United States).
HLA Assigned by Assign TruSight HLA v2.0 Qualified FASTQ files from the MiniSeq system were analyzed by Assign TruSight HLA v2.0 (CareDx, Brisbane, CA, United States). Results with 0 core exon mismatch and phasing ≤2 were accepted. Although full-length HLA loci were sequenced, the maximum resolution that the software Assign TruSight HLA v2.0 can provide is 3-field. Higher resolution (4-field) can be achieved if other analysis tools are applied to assign HLA alleles.

Statistical Analysis
For single-locus analysis, allele frequencies were calculated by direct counting, deviation from Hardy-Weinberg (HW) proportions was calculated via chi-square test, and the Ewens-Watterson (EW) homozygosity test of neutrality was also performed via Monte-Carlo implementation of the exact test (Ewens, 1972;Watterson, 1978;Slatkin, 1996). The calculation was executed in PyPop: Python for Population Genomics (Lancaster et al., 2007). For multiple-locus analysis, haplotype frequencies were estimated using an expectation-maximization algorithm by Arlequin ver. 3.5 with default settings (Excoffier and Lischer, 2010); linkage disequilibrium (LD) between all HLA allele pairs was analyzed in PyPop, in which D and Wn of specific allele pairs were calculated (Lancaster et al., 2007). LD between all HLA loci pairs was further calculated and plotted using conditional asymmetric linkage disequilibrium (ALD) measures (Thomson and Single, 2014). The principal component analysis (PCA) of HLA-A, -B, and -DRB1 was performed using Excel 2010 to compare allele distribution between our data (n = 101) and HLA allele frequency data of the Vietnamese Hanoi Kinh population 2 (n = 170), Chinese Canton Han population (n = 264), Indonesian Sundanese and Javanese population (n = 201), Thai population (n = 142), Japanese population 3 (n = 1018), South Korean population 3 (n = 485), and Malaysian Peninsular Malay population (n = 951), which were retrieved from the Allele Frequencies Net Database (allelefrequencies.net) (González-Galarza et al., 2015). Due to the unavailability of 3field HLA data in previous studies, we converted 3-field to 2-field data. For example, HLA-A * 24:02:01, A * 24:02:13, and A * 24:02:40 were converted to HLA-A * 24:02 with a frequency (0.13861) that was the sum of the three 3-field alleles (0.12871, 0.00495, and 0.00495, respectively). PCA results were plotted using BioVinci software (BioTuring Inc., San Diego, CA, United States).

RESULTS
Advancements in NGS offer the ability to distinguish between a set of alleles that share two field names and differ in the third field, such as A * 24:02, C * 07:01, and DQB1 * 05:02, in one sequencing batch. As the polymorphisms of A * 24:02:40, A * 24:02:13, C * 07:01:02, and DQB1 * 05:02:02 are not in the core exons, several traditional PCR and sequencing reactions were required to determine these alleles before NGS methods became available.
No tested loci showed any significant departure from the Hardy-Weinberg equilibrium;

Population Genetic Analysis
Pairwise LD estimates are given in Table 6 with D and Wn. The LD of allele pairs was always statistically significant with 1,000 permutations. LD plots based on ALD measures for HLA loci are shown in Figure 1. Generally, the associations between HLA loci within HLA classes were stronger than between HLA loci in different classes, except for the case of B & DRB1 loci. Both symmetric and asymmetric LD showed that the strongest genetic linkages were between C & B loci and DRB1 & DQB1 loci. The PCA plot of eight Asian populations is shown in  (Hoa et al., 2008). Japanese and South Korean also presented a similar distribution of HLA alleles.

DISCUSSION
In recent years, various HLA-typing methods using different NGS approaches have been performed. NGS-based HLA typing can provide high-resolution, unambiguous, phase-defined HLA alleles, avoiding several limitations compared to traditional sequence-based typing methods (Carapito et al., 2016). Our study showed the distribution of HLA-A, -B, -C, -DRB1, and -DQB1 alleles and haplotypes among the southern Kinh Vietnamese population using high-resolution NGS typing (reported at 3field resolution, which remains ambiguous in many cases).
Highly polymorphic sequences at both HLA class I and class II loci resulted in 28 alleles for HLA-A, 41 alleles for HLA-B, 21 alleles for HLA-C, 26 alleles for HLA-DRB1, and 25 alleles for HLA-DQB1. The most frequent HLA-A alleles found in this study were A * 11:01:01 and A * 24:02:01. The high frequency of HLA-A * 11:01 and A * 24:02:01 is consistent with previous typing results of northern Kinh Vietnamese and other Asian populations, such as the Chinese, Thai, Indonesian, Korean, and Japanese (Lee et al., 2005;Hoa et al., 2008;Yuliwulandari et al., 2009;Shen et al., 2014;Ikeda et al., 2015;Nakkam et al., 2018). Among HLA-C alleles identified in this study, C * 07:02:01 was found to be widely distributed globally, while C * 01:02:01 was common in Asians (Lee et al., 2005;Shen et al., 2014;Ikeda et al., 2015;Nakkam et al., 2018). The predominance of HLA-B * 15 alleles is a major distinguishing characteristic of the Kinh population from the Thai and Chinese groups (Shen et al., 2014;Nakkam et al., 2018). However, this predominance is similar in the Indonesian population (Yuliwulandari et al., 2009). Detailed comparison of B * 15 alleles among the Vietnamese and Indonesians showed similar popularity of B * 15:02, while the second most-frequent B * 15 alleles were B * 15:25:01 and B * 15:13, respectively. HLA-B * 07:05:01, the only B * 07 allele found in Kinh Vietnamese, was the sixth most-frequent HLA-B allele, whereas it is a minor allele in other Asian groups (Whang et al., 2001).
Previously, HLA typing of Asian populations were mainly based on SSO-PCR (Lee et al., 2005;Yuliwulandari et al., 2009;Shen et al., 2014;Ikeda et al., 2015;Nakkam et al., 2018). Due to the finite amounts of probes designed to recognize the polymorphisms in the core exons, this technique only allows certain allele typing with 2-field resolution. Alleles were then assigned by software based on SSO-PCR patterns. Hence, the number of alleles determined by SSO-PCR is limited. With full-length HLA sequences provided by NGS, HLA-typing software programs align sequence reads to the entire IMGT/HLA Database to find the best-matching alleles. NGS-based typing, therefore, can provide diversified HLA assignments. In our study, the number of identified alleles (141 alleles) in 101 subjects was higher compared to the previous study in northern Kinh Vietnamese (115 identified alleles in 170 subjects) (Hoa et al., 2008). Similar results were obtained in the Thai population, in which the number of HLA alleles determined by NGS and SSO-PCR were 156 and 144, respectively (Geretz et al., 2018;Nakkam et al., 2018).
Recently, it has been shown that both high-resolution HLA typing and haplotyping are important in hematopoietic stem cell transplantation for both unrelated and related donors in reducing post-transplantation adverse outcomes (Agarwal et al., 2017;Buhler et al., 2019); a single high-resolution  HLA mismatch may lead to a similar negative effect on outcomes as a low-resolution one (Fuji et al., 2015;Armstrong et al., 2017). Therefore, it has been suggested that highresolution HLA typing can reduce the likelihood of missing a clinically significant mismatch compared to traditional lowresolution typing, especially in developing countries where highresolution HLA typing methods are not widely available (Agarwal et al., 2017). With a 3-field resolution, our typing process can distinguish between HLA-A * 24:02:01, HLA-A * 24:02:13, and HLA-A * 24:02:40 and between HLA-C * 07:01:01 and HLA-C * 07:01:02, which are considered high-resolution mismatches. Although traditional SBT can separate these alleles, it is time and resource-consuming.
Our study had several limitations that should be considered in interpreting the results. First of all, the absence of other class II HLA descriptions (HLA-DQA1, -DPA1, and -DPB1) makes the study less informative, especially for population genetic purposes. Second, the study sample size was relatively small. This may increase the risk of missing rare HLA alleles in Kinh Vietnamese and reduce the significance of statistical analysis. These limitations will necessitate further studies with comprehensive allele descriptions and larger sample sizes.
It is now also well-recognized that HLA molecules are strongly associated with the pathophysiology of adverse drug reactions, including severe cutaneous adverse reaction (SCAR), agranulocytosis, and liver injury. High prevalence of HLA-B * 15:02, B * 58:01, B * 38:02, DRB1 * 08:03, and C * 03:02 suggests that the Kinh Vietnamese population is at a high risk of developing carbamazepine-induced SCAR, allopurinolinduced SCAR, methimazole-induced agranulocytosis, and methimazole-induced liver injury, respectively (Hung et al., 2005;Chen et al., 2015;Thao et al., 2018;Li et al., 2019), while the risk of developing dapsone or abacavir-induced hypersensitivity is low due to the low prevalence of HLA-B * 13:01 and B * 57:01 (Mallal et al., 2008;Sousa-Pinto et al., 2015;Tempark et al., 2017). Therefore, HLA information is important to clinicians for treatment modality adoption and to healthcare policymakers for constructing personalized medicine strategies.

CONCLUSION
To our knowledge, this is the first report of high-resolution HLA-A, -B, -C, -DRB1, and -DQB1 allele and haplotype frequencies in southern Kinh Vietnamese individuals. These data display the homogenous distribution of HLA between the northern and southern Kinh population in Vietnam. Although the characteristics of HLA class I and II alleles and haplotypes in the Kinh Vietnamese are similar to those in the Thai, Malaysian, and Indonesian populations, they still retain unique characteristics. Data from this study will be useful in anthropology, immune-mediated diseases, transplantation therapy, and drug hypersensitivity.

DATA AVAILABILITY STATEMENT
Raw data supporting the conclusions of this article are available on NCBI SRA with accession PRJNA609593. The data on HLA allele frequencies and haplotypes presented in this study are available on allelefrequencies.net with accession Vietnam Kinh (n = 101).

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by The Ethics committee of University of Medicine and Pharmacy at Ho Chi Minh City, Vietnam. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
TM and MD designed the study, wrote the manuscript. MD, LL, and VN performed the experiments. TD, HV, NN, MD, and TM analyzed the data.

FUNDING
The study was supported by the Department of Science and Technology, Ho Chi Minh City, Vietnam (Grant Number 101/2017/HD-SKHCN).