Haplotypic Associations and Differentiation of MHC Class II Polymorphic Alu Insertions at Five Loci With HLA-DRB1 Alleles in 12 Minority Ethnic Populations in China

The analysis of polymorphic variations in the human major histocompatibility complex (MHC) class II genomic region on the short-arm of chromosome 6 is a scientific enquiry to better understand the diversity in population structure and the effects of evolutionary processes such as recombination, mutation, genetic drift, demographic history, and natural selection. In order to investigate associations between the polymorphisms of HLA-DRB1 gene and recent Alu insertions (POALINs) in the HLA class II region, we genotyped HLA-DRB1 and five Alu loci (AluDPB2, AluDQA2, AluDQA1, AluDRB1, AluORF10), and determined their allele frequencies and haplotypic associations in 12 minority ethnic populations in China. There were 42 different HLA-DRB1 alleles for ethnic Chinese ranging from 12 alleles in the Jinuo to 28 in the Yugur with only DRB1∗08:03, DRB1∗09:01, DRB1∗12:02, DRB1∗14:01, DRB1∗15:01, and DRB1∗15:02 present in all ethnic groups. The POALINs varied in frequency between 0.279 and 0.514 for AluDPB2, 0 and 0.127 for AluDQA2, 0.777 and 0.995 for AluDQA1, 0.1 and 0.455 for AluDRB1 and 0.084 and 0.368 for AluORF10. By comparing the data of the five-loci POALIN in 13 Chinese ethnic populations (including Han-Yunnan published data) against Japanese and Caucasian published data, marked differences were observed between the populations at the allelic or haplotypic levels. Five POALIN loci were in significant linkage disequilibrium with HLA-DRB1 in different populations and AluDQA1 had the highest percentage association with most of the HLA-DRB1 alleles, whereas the nearby AluDRB1 indel was strongly haplotypic for only DRB1∗01, DRB1∗10, DRB1∗15 and DRB1∗16. There were 30 five-locus POALIN haplotypes inferred in all populations with H5 (no Alu insertions except for AluDQA1) and H21 (only AluDPB2 and AluDQA1 insertions) as the two predominant haplotypes. Neighbor joining trees and principal component analyses of the Alu and HLA-DRB1 polymorphisms showed that genetic diversity of these genomic markers is associated strongly with the population characteristics of language family, migration and sociality. This comparative study of HLA-DRB1 alleles and multilocus, lineage POALIN frequencies of Chinese ethnic populations confirmed that POALINs whether investigated alone or together with the HLA class II alleles are informative genetic and evolutionary markers for the identification of allele and haplotype lineages and genetic variations within the same and/or different populations.

The analysis of polymorphic variations in the human major histocompatibility complex (MHC) class II genomic region on the short-arm of chromosome 6 is a scientific enquiry to better understand the diversity in population structure and the effects of evolutionary processes such as recombination, mutation, genetic drift, demographic history, and natural selection. In order to investigate associations between the polymorphisms of HLA-DRB1 gene and recent Alu insertions (POALINs) in the HLA class II region, we genotyped HLA-DRB1 and five Alu loci (AluDPB2, AluDQA2, AluDQA1, AluDRB1, AluORF10), and determined their allele frequencies and haplotypic associations in 12 minority ethnic populations in China. There were 42 different HLA-DRB1 alleles for ethnic Chinese ranging from 12 alleles in the Jinuo to 28 in the Yugur with only DRB1 * 08:03, DRB1 * 09:01, DRB1 * 12:02, DRB1 * 14:01, DRB1 * 15:01, and DRB1 * 15:02 present in all ethnic groups. The POALINs varied in frequency between 0.279 and 0.514 for AluDPB2, 0 and 0.127 for AluDQA2, 0.777 and 0.995 for AluDQA1, 0.1 and 0.455 for AluDRB1 and 0.084 and 0.368 for AluORF10. By comparing the data of the five-loci POALIN in 13 Chinese ethnic populations (including Han-Yunnan published data) against Japanese and Caucasian published data, marked differences were observed between the populations at the allelic or haplotypic levels. Five POALIN loci were in significant linkage disequilibrium with HLA-DRB1 in different populations and AluDQA1 had the highest percentage association with most of the HLA-DRB1 alleles, whereas the nearby AluDRB1 indel was strongly haplotypic for only DRB1 * 01, DRB1 * 10, DRB1 * 15 and DRB1 * 16. There were 30 five-locus POALIN haplotypes inferred in all populations with H5 (no Alu insertions except for AluDQA1) and H21 (only AluDPB2 and AluDQA1 insertions) as the two predominant haplotypes. Neighbor joining trees and principal component analyses of the Alu and HLA-DRB1 polymorphisms showed that genetic diversity of these genomic markers is associated strongly with the population characteristics of language family, migration and sociality. This comparative study of HLA-DRB1 alleles and multilocus, lineage POALIN frequencies of Chinese

INTRODUCTION
The human major histocompatibility complex (MHC) class II genomic region on the short-arm of chromosome 6 contains highly polymorphic classical and non-classical human leukocyte antigen (HLA) class II genes (HLA-DRB1, -DRA, -DQA1, -DQB1, -DQA2, -DQB2, -DPA1, and -DPB1) involved in the regulation of the innate and adaptive immune system, autoimmunity, and transplantation (Shiina et al., 2004(Shiina et al., , 2009Vandiedonck and Knight, 2009;Trowsdale, 2011). The extensive polymorphism of the HLA class II genes is studied widely and used to provide a better understanding of the diversity in population structure and the effects of evolutionary processes such as recombination, mutation, genetic drift, demographic history, and natural selection (Meyer et al., 2006;Traherne, 2008;Pierini and Lenz, 2018;Manczinger et al., 2019). For example, there are at least 2,909 HLA-DRB1 alleles distributed worldwide with the official sequences and designations provided by the IMGT/HLA database (Robinson et al., 2020). Consequently, the HLA-DRB1 alleles are genetic markers that are utilized often for the assessment of population structure and differentiation as well as providing information on interpopulation genetic exchange (gene flow) and other demographic events (Di and Sanchez-Mazas, 2011;Sanchez-Mazas et al., 2013Sanchez-Mazas and Meyer, 2014;Gonzalez-Galarza et al., 2020). Moreover, the HLA-DRB1 alleles present intracellular or exogenous antigen peptides to CD4 + T cells that trigger and regulate the downstream immune responses to defend against pathogen invasion (Chaplin, 2010). Therefore, this highly polymorphic genomic marker might reveal changes associated with pathogen-mediated pressure on highly heterogenous and diverse populations (Sun et al., 2015;Weiskopf et al., 2016).
In addition to polymorphic HLA class II genes, the MHC class II region has a number of polymorphic Alu insertions (POALINs) that are informative population ancestral lineage markers. They are insertion/deletions (either present or absent) at integration sites, which carry characteristic alleles or haplotypes inherited from different ancestral populations (Bennett et al., 2004;Kulski and Dunn, 2005;Ray et al., 2007). Alu retroelements (short interspersed nuclear elements) are among the class of genomic repetitive DNA elements that first appeared in primates about 65 million years ago and then amplified by retrotransposition to the present estimated one million copies per human genome (Lander et al., 2001;Batzer and Deininger, 2002). POALINs are useful lineage and evolutionary genetic markers for studying the origin and genomic diversity of human populations because (1) their allelic frequency distributions vary significantly among geographically different human populations (Deininger and Batzer, 1999;Jorde et al., 2000;Watkins et al., 2001), and (2) they have an inherited identity by descent arising from a known initial ancestral state (no Alu insertion), whereby their presence and/or absence define the ancestral lineages within a population (Antunez-de-Mayolo et al., 2002).
Some MHC Alu family members were used previously as evolutionary molecular markers to infer the ancestral duplication history of HLA class I and class II gene copies (Mnukova-Fajdelova et al., 1994;Svensson et al., 1996;Kulski et al., 1999Kulski et al., , 2000. Also, several studies reported on the frequencies and distribution of human-specific POALIN loci within the HLA class I region and on their inferred haplotypic associations with HLA-A, -B and -C loci in different populations (Dunn et al., 2002(Dunn et al., , 2003(Dunn et al., , 2005a(Dunn et al., ,b, 2007Yao et al., 2009Yao et al., , 2010Kulski et al., 2011Kulski et al., , 2019Mastana et al., 2017;Singh et al., 2019). These associations reflect in part the different haplotypic structures of the MHC class I and class II regions and the linkage of multiple polymorphic loci, especially when extended over long stretches (1-3 Mb) of conserved genomic sequences in human populations known as ancestral haplotypes  or conserved extended haplotypes (Alper et al., 2006;Larsen et al., 2014). Although comparative DNA sequence analysis of the entire MHC genome region between two homozygous HLA haplotypes has indicated the presence of POALIN within the MHC class II region (Stewart et al., 2004), five human-specific POALIN (AluDPB2, AluDQA2, AluDQA1, AluDRB1, and AluORF10) frequencies at five loci in the MHC class II genomic region were determined previously only for Japanese, Australian Caucasians  and Chinese Han in Yunnan province (Shi et al., 2014) populations. By comparing the data of the MHC class II five-loci POALINs in Chinese Han with Japanese and Caucasian data, marked differences were observed between the three ethnic groups at the allelic or haplotypic levels. In addition, each POALIN was in significant linkage disequilibrium (LD) and/or haplotypically associated (Kulski et al., 2020(Kulski et al., , 2021 with a variety of HLA-DRB1 alleles in Chinese Han in Yunnan province (Shi et al., 2014). These results showed that POALINs whether investigated alone or together with the HLA class II alleles are informative genetic markers for the identification of allele and haplotype lineages and variations within the same and/or different populations.
Beside the Chinese speaking Han majority, there are 55 officially recognized minority ethnic populations of China, which contribute to about 8% of the overall Chinese population and provide abundant genetic resources for POALIN-HLA inferred haplotype studies (Yao et al., 2010). The minority ethnic groups living in the south and southwest of China can be traced back to three major ancient groups: Di-Qiang, Bai-Pu, and Bai-Yue that speak the Tibeto-Burman, Mon-Khmer and Daic language subfamilies, respectively (Table 1); whereas in the northwest of China, most ethnic groups speak the language of the Mongolian and Tujue Manchu-Tungusic subfamily, which is the Altaic language family (Guo, 2000). Although the anthropological, cultural and linguistic characteristics of some of these ethnic populations have been studied in detail (You, 1994;Guo, 2000;Chu et al., 2006), there are few published comparative investigations on the genetic diversity of these populations by genome-wide sequencing or genotyping methods (Di and Sanchez-Mazas, 2011). Therefore, the analyses of robust and reproducible genetic markers such as the POALINs and HLA-DRB1 alleles in small and isolated ethnic minority remains an important task to better understand the human genome and its genetic variability throughout the world.
The aim of present study was to elucidate the inferred haplotypic association between the MHC POALINs and classical HLA class II alleles by determining (1) genetic structures of the five MHC class II POALIN dimorphisms and HLA-DRB1 allele and haplotype frequencies in 12 minority ethnic populations in China, and (2) correlations between the genetic diversity and the four language families of these populations ( Table 1). Among these 12 minority ethnic populations, 8 of them settled in Yunnan province together with the Han people (Han-Yunnan). The Han-Yunnan, speaking Chinese of the Sino-Tibetan language family, migrated from the northern region by various routes and at different times to Yunnan province and exhibited genetic characteristics of both northern and southern Chinese groups (Shi et al., 2006). Thus, we included the published data of Han-Yunnan, Japanese and Caucasians as reference populations in order to compare and correlate the genetic differentiation of the HLA-DRB1 alleles and the five POALINs between the populations and the language families by using the DA genetic distance measure in phylogeny and principal component analysis (PCA).

Ethics Statement
This study was approved by the Committee on the Ethics of Institute of Medical Biology, Chinese Academy of Medical Sciences, the batch number is YIKESHENGLUNZI [2012]12. Moreover, the protocol employed by this investigation was in accordance with the principles expressed in the Helsinki Declaration of 1975, which was revised in 2008. Written informed consents were obtained from each participant.

Subjects and Samples
A total of 1,201 unrelated individuals were recruited from 12 Chinese minority ethnic populations in China (Figure 1). The geographic location, sample size of each population, the language family to which they belong, and the ancient groups from which they originated are listed in Table 1. These populations are descended from four ancient Chinese groups and belong to four different language subfamilies (Guo, 2000;Yao et al., 2010;Di and Sanchez-Mazas, 2011) as outlined in the introduction and Table 1. The geographic origin, nationalities, and pedigree (unrelated through at least three generations) of each individual were ascertained before sampling.

Alu and PCR Assay
The sense and antisense primers used for the PCR of the POALINs located in MHC II regions were previously reported Shi et al., 2014). As some of the previously published primers used for the PCR of the POALINs located in MHC II region have mutations in the Chinese Han in Yunnan province, new sense and antisense primer pairs were designed and used for the PCR of five POALINs located in the MHC II region (Supplementary Table 1). Supplementary Figure 1 shows a map of the locations of the five POALINs with the HLA class II regions of the MHC on chromosome 6p21.3. The PCR products were analyzed according to the fragments of different sizes by the presence or absence of an electrophoretic specific band in 2% agarose gel stained with ethidium bromide and visualized by ultraviolet light. The Alu-PCR methods clearly differentiate between an insertion and absence of insertion in heterozygous individuals based on distinctly different sized PCR products as shown in Supplementary Figure 2. The POALIN alleles are dimorphic structures whereby the absence of the Alu insertion at the Alu locus is the Alu * 1 allele and the presence of the Alu insertion is the Alu * 2 allele. The overall frequencies of the Alu * 2 (insertion) allele at each of the five loci were estimated from the genotypes as described below in the statistical section.

Allele Linkage Controls for Assessment of HLA-DRB1 Allele and POALIN Associations
To better assess the haplotypic associations between the POALINs and the HLA-DRB1 alleles, we examined their sequence linkages in 95 different MHC haplotype sequences (Kulski et al., 2021) that were sequenced, partially annotated and assembled from HLA-homozygous cell lines by Norman et al. (2017).
The FASTA files of the 95 MHC class I, II and III genomic sequences were downloaded from the archives at NCBI BioProject with the accession number PRJEB6763 1 and submitted to the RepeatMasker webserver 2 for output files of annotated members of the interspersed repetitive DNA families, their locations in the sequence and their relative similarity or identity in comparison to reference sequences of SINEs, LINEs, LTRs, ERVs, DNA elements, small RNA, and simple repeats (Kulski et al., 2021). The five MHC class II POALINs were easily identified within the RepeatMasker outputs on the basis of their location and flanking sequences and/or other repeats as previously described . The HLA-DRB1 alleles for all of the 95 cell line sequences were determined and reported by Norman et al. (2017). Supplementary Table 2 is a summary of the sequence linkages between the 5 POALIN and the HLA-DRB1 alleles that were determined in 90 of the sequenced haplotypes (Kulski et al., 2021). These were used as a comparative reference control to assist with a better interpretation of our results obtained for our haplotypic association analyses in 15 different populations.

Statistical Analysis
The frequencies of five POALINs were calculated from the genotyping data by the direct-counting method. For each locus, Hardy-Weinberg's equilibrium was assessed using the Guo and Thompson method (Guo and Thompson, 1992). The haplotypes were estimated by the maximum-likelihood method using the Pypop software (Lancaster et al., 2003(Lancaster et al., , 2007. Pairwise LD of POALINs and HLA allele were calculated using the SHEsis software 3 (Shi and He, 2005). The percentage association between a POALIN insertion and an HLA allele was calculated as the percentage of the total HLA allele frequency that was associated with the presence of the POALIN insertion at an inferred HLA class II gene/POALIN haplotype using the haplotype frequency data generated by the Pypop software (Lancaster et al., 2003(Lancaster et al., , 2007. Percentage associations between HLA allele and POALIN insertion frequencies were considered to be very strong if between 80 and 100%, strong if over 50% and less than 79%, moderate if between 20 and 50%, and low or absent if less than 20% Shi et al., 2014). The differences in significance between the POALIN and its haplotype frequencies were determined by a contingency test (Fisher's exact test). Bonferroni correction was used for multiple testing. Statistical significance was defined at the 5% level.

Phylogenetic Analysis
Based on the POALIN allele, HLA-DRB1 allele frequencies and DRB1/AluDRB1 haplotypes of the different population, the DA was calculated using the Dispan software (Nei, 1973(Nei, , 1978. The Mega 7.0 software was used to reconstruct the neighbor-joining (NJ) trees according to the DA (Tamura et al., 2007). Principal component analysis (PCA) was also performed based either on POALIN allele, or HLA-DRB1 frequencies using SPSS 16.0 software. POALIN allele and HLA-DRB1 allele frequencies were obtained from additional Japanese, Caucasian and Han-Yunnan populations Shi et al., 2014) for comparative phylogenetic analysis with the frequencies obtained for the 12 Chinese ethnic populations in this study.

POALIN Allele Frequencies and Hardy-Weinberg's Equilibrium (HWE)
The five POALIN allele frequencies and the genotype counts in 12 Chinese minority populations, shown in Table 3, were compared statistically to those reported previously for the Japanese, Australian Caucasians  and Chinese Han in Yunnan (Shi et al., 2014). The frequencies of five POALINs in 12 Chinese minority populations ranged from 0.359 to 0.514 (AluDPB2), 0 to 0.127 (AluDQA2), 0.777 to 0.995 (AluDQA1), 0.1 to 0.455 (AluDRB1) and 0.084 to 0.368 (AluORF10). The differences in significance between two populations for each POALIN frequency are shown in Supplementary Table 3.

LD Analysis and Percentage Haplotypic Association Between POALINs and HLA Alleles
D values for global LD between the five POALINs were calculated in twelve ethnic populations and are shown in Figure 2. LD values between the Alu loci were variable between the ethnic populations ranging from the absence of strong LD (D < 54%) between any of the Alu in the Yugur and Tu Mongolian populations to a strong LD (D > 0.8) between four or five Alu in the Jinuo, Nu, Bulang and Wa. The Hani, Lisu and Jingpo had strong LD (D > 0.8) between two or three Alu insertions, whereas the Dai, Maonan and Zhuang of the ancient Baiyue tribe and the Daic subfamily language had only two Alu in strong LD.
Supplementary Table 7 shows a summary of the comparative percentage association between HLA-DRB1 alleles and the Alu class II POALINs in 12 ethnic populations (this study), Chinese Han in Yunnan (Shi et al., 2014), Japanese and Caucasians  from previous studies. Overall, there was a strong similarity of haplotypic associations between AluDQA1 and HLA-DRB1 alleles in all fifteen populations. Table 5 shows a summary of the percentage association between HLA-DRB1 alleles and AluDRB1. Overall, all the populations except for the Hani and the Dai have 83 to 100% association between the AluDRB1 insertion and HLA-DRB1 * 15 and HLA-DRB1 * 16. In comparison, the AluDRB1 insertion was linked to six of six homozygous cell lines with HLA-DRB1 * 01, seven of seven cell lines with -DRB1 * 16, 10 of 11 cell lines with -DRB1 * 15 and to none of the other 66 cell lines with nine other DRB1 lineage alleles (Supplementary Table 2). For the other Alu insertions, HLA-DRB1 * 09 was not found in the Wa, but it had a moderate to very strong association (51-100%) with AluDPB2 in thirteen populations and a low association (31.7%) in the Lisu. For a comparison of the haplotypic associations with actual genomic sequence linkages, Supplementary Table 2 shows the percentage linkage between these five POALIN with HLA-DRB1 alleles detected in the MHC class II haplotype sequences of 90 homozygous cell-lines (Kulski et al., 2021). Because of ancestral recombination at sites between various Alu loci and the DRB1 allelic loci, the linkages detected in the cell lines were not present in all the different Chinese ethnic populations, although the general trends are maintained between and within populations.

Phylogenetic Trees and PCA Plots
To compare the diversity of these ethnic populations, we constructed phylogenetic trees (Figure 3) and PCA plots (Figure 4) based on POALIN alleles, HLA-DRB1 alleles and DRB1-AluDRB1 haplotype frequencies. The topology for the NJ tree constructed using the DA of POALIN alleles (Figure 3A), revealed two distinct clusters: (1) the Dai, Zhuang and Maonan of the Daic subfamily in the Sino-Tibetan language family, and (2) the Jingpo of the Tibeto-Burman subfamily in the Sino-Tibetan language family with the Bulang stemming from the Wa, which are both part of the Mon-Khmer subfamily in the Austo-Asiatic language family. A third cluster was the stepwise grouping of Lisu, Nu, Hani and Jinuo of the Tibeto-Burman with the Mongolian Yugur of the Tujue subfamily in the Altaic language family inserted between the Hani and the Jinuo. The Han from Yunnan province grouped at the lower extremity of the 12 Chinese minority ethnic groups and away from the Japanese and the Caucasians that had grouped at the opposite end of the tree to that of the Daic cluster.
The topology of the NJ trees based on HLA-DRB1 allele frequencies and DRB1/AluDRB1 haplotypes were similar to each other (Figures 3B,C) and both revealed two distinct clusters: (1)  (Shi et al., 2014). c The data in Japanese and Caucasian were published previously (Kulski et al., , 2011. Han-Yunnan population grouped between the Chinese minority populations and the Japanese and at a genetic distance away from the Mongolian Tu and Yugur and the Caucasians. In this regard, the POALIN and HLA-DRB1 allele frequencies both grouped the 13 Chinese ethnic populations into their respective subfamilies and language families. The main exception was that the POALIN frequencies separated the Tu and Yugur at a greater distance from each other (Figure 3A), whereas the HLA-DRB1 allele frequencies placed them more closely together between the Japanese and the Caucasians (Figures 3B,C). The PCA plots for the POALIN alleles (Figure 4A), HLA-DRB1 alleles ( Figure 4B) and DRB1-AluDRB1 haplotypes ( Figure 4C) showed that the distinct linguistic clusters of the 15 populations in each of four quadrants are similar to those revealed by the NJ trees (Figure 3). These plots have placed the Jingpo closer to the Mon-Khmer subfamily than to the Tibeto-Burman subfamily from which the Jingpo are believed to have originated, and the genetic distance between the Mongolian Tu and Yugur is greater for the POALIN alleles than the HLA-DRB1 alleles and DRB1-AluDRB1 haplotypes. Also, the Caucasians are the genetic outgroup in relation to the 13 Chinese ethnic populations and the Japanese.

DISCUSSION
In this study, we examined the genetic variations of the five POALIN and HLA-DRB1 allele and haplotype frequencies to further elucidate the association between the MHC class II POALIN and the classical HLA-DRB1 allele frequencies in 12 Chinese minority populations. The HLA-DRB1 alleles are used widely and commonly for assessing the genetic structure and differences within and between different populations (Di and Sanchez-Mazas, 2011;Sun et al., 2015;Weiskopf et al., 2016;Gonzalez-Galarza et al., 2020). The frequency of the HLA-DRB1 alleles within the 12 Chinese minority populations were similar to previous reports (Ogata et al., 2007;Shi et al., 2008Shi et al., , 2010bShi et al., , 2011Sun et al., 2015;Tao et al., 2020). On the other hand, the previous studies on the distribution and frequency of the MHC class II POALIN dimorphisms were limited to only three populations, the Caucasian, Japanese , and Chinese Han in Yunnan (Shi et al., 2014), and this published data provided the three outlying comparative populations for the present study. Therefore, we have provided new data on the POALIN frequencies for 12 Chinese minority populations that were selected for genetic analysis because of their culture, known ancient history and connection to five distinct language subfamilies, the Tibeto-Burman, Mon-Khmer, Daic, Mongolian and Tujue (Table 1).
Phylogenetic trees and PCA (Figures 3, 4) show that the Alu insertion dimorphism, HLA-DRB1 alleles and the DRB1-AluDRB1 haplotype diversity are associated strongly with the population characteristics of language family, migration and sociality. The Daic family, including the Dai, Zhuang and Maonan, always clustered closely together based on the POALIN dimorphisms, HLA-DRB1 alleles and HLA DRB1-AluDRB1 haplotypes. The Tibeto-Burman subfamily of the Jinuo, Hani, Lisu and Nu have certain shared population characteristics due to their migration from the north, and therefore are genetically closer to the Yugur and Tu northern populations, which belong to Mongolian tribal family. Surprisingly, the Jingpo from Tibeto-Burman subfamily are genetically closer to the Mon-Khmer family (Bulang and Wa) than to other populations from Tibeto-Burman subfamily probably because these three populations have long lived closely together in the mountains of the western part of Yunnan and have been infected by similar pathogens from the infectious environment. For example, malaria is a serious infectious disease prevalent in China since 2700 BC, and Yunnan Province is a high incidence area of malaria, especially in the border area between China and Myanmar (Cox, 2010;Bi et al., 2013;Diouf et al., 2014). Similarly, the Jinuo and Bulang who live closely together within this same area, also may have undergone high selective pressure from malaria.
The five different POALIN dimorphic frequencies provide unique evolutionary and genetic information on the relationships between the 12 Chinese minority populations. The frequencies of AluDPB2, AluDQA2 and AluDQA1 in the Jingpo had significant differences with the other four populations (P < 0.01 after Bonferroni's correction) of the Tibeto-Burman subfamily. This suggests an expansion of these Alu insertions in the Jingpo people as a consequence of their different population histories or environmental effects. In comparison, the Bulang, a member of the Mon-Khmer family, had the highest POALIN frequency (0.995) for AluDQA1 in all 15 populations. This is the highest and closest to subpopulation genetic fixation for any of the MHC POALIN frequencies in world populations suggesting substantial long term population isolation. The frequencies of AluORF10 were higher in Dai, Maonan, and Zhuang (Daic subfamily in the Sino-Tibetan language family) than in the other nine Chinese minority populations. AluDQA1 was the highest POALIN frequency (0.777 and 0.903, respectively) in the Tu and the Yugur with a significant difference between these two populations (P < 0.01 after Bonferroni's correction). HLA-DRB1 * 09:01 had the strongest association (100%) with AluDQA1, and was the highest frequency (0.127 and 0.134) in the Tu and Yugur, respectively. According to historical records, all the Altaic language speaking groups such as the Tu and the Yugur who speak the Mongolian, Tujue, or Manchu-Tungusic sub-languages originated from the people and places overrun by the Mongol Empire and from the border adjacent to Northeastern China in the 13th century (Guo, 2000;Chu et al., 2006). HLA-DRB1 * 12:02 also had strong associations (88.7-100%) with AluDQA1 with a high frequency (0.160-0.550) in eight populations (Hani, Jinuo, Lisu, Nu, Jingpo, Bulang, Wa, and Maonan).
It is reported that the distribution of DRB1 allele frequencies for a Mongolian subpopulation in Yunnan was different to a Mongolian population of inner Mongolia and much closer to the Hani population of Yunnan (Sun et al., 2015). They hypothesized that the difference between the two Mongolian populations was due partly to gene flow and pathogen driven selection. We found a large differentiation between two Mongolian populations for the Alu alleles, but not for the HLA-DRB1 alleles. The Alu analysis placed the Mongolian Yugur within a cluster of the Di-Qiang subfamilies and at a substantial distance away from the Mongolian Tu, whereas the DRB1 allele frequencies for the two Mongolian populations placed them closer together at a genetic distance between the Japanese and Caucasians (Figures 3, 4). We attribute this difference between the two Mongolian populations for the Alu analysis mainly to a twofold difference in the AluDQA1 * 1 frequencies (Table 3). However, it is possible that the frequencies of particular DRB1 alleles of the two distinct Mongolian populations may have placed them closer together because of pathogen driven selection at that particular individual gene in contrast to the more independent and possibly less effective Alu loci. In this regard, the inheritance of identical by descent or identical by state genomic loci and/or haplotypes may in part be driven by selection, gene flow and various social and geographic factors, but has yet to be defined and investigated using a greater variety of different genomic markers for comparative analyses.
Overall, the branching patterns of the interrelationships between the populations and population clusters were similar for the Alu and DRB1 allelic frequencies, although the genetic distances between particular populations were substantially different. Most of these similarities are likely due to the haplotypic characteristics between the Alu dimorphism and the DRB1 alleles (Kulski et al., , 2021, as exemplified in this study with a comparison between the NJ trees of the HLA-DRB1 alleles and HLA-DRB1-AluDRB1 haplotypes (Figure 3). It is clear from this and previous studies that the closer the dimorphic Alu is to the HLA-DRB1 locus the stronger the haplotypic linkage/association and recombination resistance (Kulski et al., , 2011(Kulski et al., , 2021. This seems to be the case for AluDRB1 that is most strongly associated with HLA-DRB1 * 15 and -DRB1 * 16 ( Table 5) and is located within 14 kb of the HLA-DRB1 locus. In contrast, AluORF6 and AluDP2, which are 233 kb and 536 kb, respectively, from the HLA-DRB1 locus ( Supplementary  Figure 1), are associated with many different DRB1 alleles possibly because many more recombination events had occurred between their loci. The five genotyped and haplotyped 'lineage by descent, ' dimorphic Alu described in this study provide clues to the diversity of the MHC class II region of the 12 Chinese minority populations. However, further studies using fully phased genomic sequences of the MHC class II region within these historically small ethnic communities that are still strongly linked together by ancestry, culture and language might provide a better understanding of these POALIN haplotypic associations within the context of human MHC class II diversity, identity by descent (and/or by chance or state), haplotype shuffling and ancestral recombinations Alper et al., 2006;Larsen et al., 2014).
The POALINs in the current study are all members of the young Alu subfamily, with AluDQA1 and AluDRB1 belonging to the AluY subgroup and AluDQA2, AluDPB2 and AluORF10 belonging to the youngest AluYa5 or AluYb8 subgroup . AluDQA1 appears to be the oldest of the five POALINs on the basis of having the highest POALIN frequency in the 15 populations (Table 3) and its association with most of the HLA-DRB1 supertypes (Supplementary Table 7). Thus, the AluDQA1 insertion was distributed widely in the Chinese ethnic populations and associated strongly as a haplotype with all or most of the HLA-DRB1 alleles. The frequency of AluDQA2 was higher in the Caucasians than in the Chinese populations or Japanese. The hypothesis that AluDQA2 may have originated in Caucasians Shi et al., 2014) is confirmed by the present study.
The frequencies of AluDRB1 were the highest in the Dai, Maonan, Zhuang, which belong to Tibeto-Burman language subfamily. The AluDRB1 with the frequency range from 0.10-0.455 had a strong association with HLA-DRB1 * 15 and -DRB1 * 16 in most populations. However, there was a significantly lower % association of <55% between the AluDRB1 insertion and HLA-DRB1 * 15 or HLA-DRB1 * 16 in Hani and Dai compared to the other 11 Chinese ethnic groups including the Han-Yunnan, and the Japanese and Caucasians (Table 5). This could be due to primer mutation with allelic dropout, an AluDRB1 deletion, recombination events, or a high level of interbreeding among members of the population with the HLA-DRB1 * 15 or HLA-DRB1 * 16 haplotype that was missing the AluDRB1 insertion in the founding group. By comparison, the AluDRB1 insertion is very much limited by linkage (or association) to the HLA-DRB1 * 01, -DRB1 * 10 (DR1 supertypes), -DRB1 * 15, and the -DRB1 * 16 (DR51 supertypes) allelic lineages, which occurred after their separation from the DR8, DR52 and DR53 supertypes . On this basis, the AluDQA1 insertion must have happened much earlier than the AluDRB1 insertion during human evolution and population expansions. These results confirm that the AluDRB1 insertion probably originated in an ancestral HLA-DRB1 allele as a progenitor of the DR51 supertypes , which contained HLA-DRB1 * 15 and -DRB1 * 16 (Andersson, 1998;Gibbons et al., 2004).
AluDPB2 has a frequency range from 0.278 to 0.574 in fifteen populations, with low-to high-level percentage associations with many different HLA-DRB1 alleles (Supplementary Table 7). This greater number of associations between AluDPB2 and HLA-DRB1 than between AluDRB1 and HLA-DRB1 is probably because the AluDPB2 locus is 536 kb from the HLA-DRB1 locus with the likelihood of numerous ancient recombination events occurring in between the two loci ( Supplementary Figure 1; Kulski et al., 2021). The AluORF10 had a strong association with HLA-DRB1 * 15 only in Caucasians (89.1%). In contrast, the AluORF10 was associated strongly with HLA-DRB1 * 16 in eight East Asian populations (Jingpo, Wa, Maonan, Zhuang, Tu, Yugur, Han-Yunnan, and Japanese); whereas HLA-DRB1 * 16 was absent in the Caucasian population. This suggests at least one or more recombination events at an unidentified junction between the AluORF10 and HLA-DRB1 locations in the ancestral progenitors of the DR51 supertypes.
Although this study focused on Alu and HLA-DRB1 evolutionary genetic markers and population structure and was not related directly to medical or health issues, it is noteworthy that the Alu indels could have enhancer and other regulatory roles that affect the expression of HLA class II genes and/or other genes in the MHC and elsewhere in the human genome (Hasler and Strub, 2006;Moolhuijzen et al., 2010;Spirito et al., 2019;Goubert et al., 2020;Kulski et al., 2021). Many Alu elements of the AluJ, AluS, and AluY subfamilies are transcriptionally active with highly expressed selfcleaving ribozyme activity during T-cell activation and thermal and endoplasmic reticulum stress (Hernandez et al., 2020). Furthermore, Wang et al. (2017) identified two Alu indels Alu-5072 and Alu-5075 in the class II region as potential enhancers for HLA-DRB5, and HLA-DQB1-AS1 associated with phenotypes of lymphoma, Hodgkin lymphoma and chronic hepatitis B infection, respectively (Wang et al., 2017). In this regard, Alu-5057 is probably the AluDRB1 indel at the 5 end of HLA-DRB1 (Supplementary Figure 1). Thus, the question remains whether the other four Alu indels described in this study also have enhancer functions as Wang et al. (2017) reported for Alu-5072 and Alu-5075 (Wang et al., 2017). On the basis of these published findings, the transcriptional activity and role of Alu in the human MHC during epigenetic regulation needs to be investigated and better defined. Also, the Alu indels both as genotypes and haplotypes within the MHC could have important functions in cancer, autoimmunity and immunity to infections that have yet to be addressed and investigated.
We used a set of five-locus POALINs from the MHC class I region as lineage markers in a previous study to determine the haplotypic association and differentiation of MHC class I polymorphic Alu insertions and HLA-B/Cw alleles in seven Chinese ethnic populations (Yao et al., 2010). The POALIN markers that we used in this study were limited to five loci in the MHC class II region, but were a sufficient number to effectively micro-differentiate between 15 populations. The advantages of these POALIN lineage markers within the MHC class I and class II regions are their applicability -they are well defined, cheap to prepare and administer in the laboratory, and they produce results that are reasonably easy to interpret. In future work, the more widely studied MHC class I 5-loci POALINs (Yao et al., 2010;Abeid et al., 2019;Kulski et al., 2019), other autosomal Alu loci (Antunez- de-Mayolo et al., 2002), and STR loci (Garcia-Obregon et al., 2011) could be included in the haplotype analyses to broaden the genetic distances and diversity between and within the various populations.
In conclusion, the unique finding in this study, not previously reported, is that the MHC class II POALIN and HLA-DRB1 allele frequencies both grouped the 12 Chinese minority ethnic populations into their respective subfamilies and language families. When compared with the previously reported data of the Chinese Han in Yunnan, Japanese and Caucasians, it is evident that the POALINs in MHC class II, like the polymorphic class I and class II HLA genes, are informative genetic and haplotype markers, which can be used cheaply and simply in studies of population diversity, forensic medicine and disease research.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

AUTHOR CONTRIBUTIONS
LiS and YY conceived and designed the research. YC, SL, JY, YT, and XZ performed the experiments. YC and JK analyzed the data. SL and YT collected the samples. YC, YY, LiS, and JK wrote and revised the manuscript. All authors read and approved the final version of the manuscript. All authors contributed to the article and approved the submitted version.

FUNDING
This work was supported by the grant from the Yunnan Provincial Science and Technology Department (2008CC021) and the Special Funds for high-level health talents of Yunnan Province (D-201669, L-201615, and H-2018014). The funders had no role in the design of the study, data collection and analysis, decision to publish, or preparation of the manuscript.