Genetic Diversity of MHC B-F/B-L Region in 21 Chicken Populations

The chicken major histocompatibility complex (MHC) on chromosome 16 is the most polymorphic region across the whole genome, and also an ideal model for genetic diversity investigation. The MHC B-F/B-L region is 92 kb in length with high GC content consisting of 18 genes and one pseudogene (Blec4), which plays important roles in immune response. To evaluate polymorphism of the Chinese indigenous chickens as well as to analyze the effect of selection to genetic diversity, we used WaferGen platform to identify sequence variants of the B-F/B-L region in 21 chicken populations, including the Red Jungle Fowl (RJF), Cornish (CS), White Leghorns (WLs), 16 Chinese domestic breeds, and two well-known inbred lines 63 and 72. A total of 3,319 single nucleotide polymorphism (SNPs) and 181 INDELs in the B-F/B-L region were identified among 21 populations, of which 2,057 SNPs (62%) and 159 INDELs (88%) were novel. Most of the variants were within the intron and the flanking regions. The average variation density was 36 SNPs and 2 INDELs per kb, indicating dramatical high diversity of this region. Furthermore, BF2 was identified as the hypervariable genes with 67 SNPs per kb. Chinese domestic populations showed higher diversity than the WLs and CS. The indigenous breeds, Nandan Yao (NY), Xishuangbanna Game (XG), Gushi (GS), and Xiayan (XY) chickens, were the top four with the highest density of SNPs and INDELs. The highly inbred lines 63 and 72 have the lowest diversity, which might be resulted from a long-term intense selection for decades. Collectively, we refined the genetic map of chicken MHC B-F/B-L region, and illustrated genetic diversity of 21 chicken populations. Abundant genetic variants were identified, which not only strikingly expanded the current Ensembl SNP database, but also provided comprehensive data for researchers to further investigate association between variants in MHC and immune traits.


INTRODUCTION
China has the most abundant resources of domestic chickens, which possess highly genetic polymorphisms. As a highly polymorphic region across the genome, major histocompatibility complex (MHC) is a group of tightly linked genes, encoding major histocompatibility antigens, which discriminate between self and non-self cells and regulate immune responses. MHC is closely related to disease resistance and production performance of animals (Bacon, 1987). Chicken MHC-B and MHC-Y regions reside on the micro-chromosome 16, which are located on the same side of nucleolar organizing region (NOR) (Miller and Taylor, 2016). MHC-B and MHC-Y are inherited independently because they are separated by a GC rich region (Miller et al., 2004). Until now, 46 genes within a 2,41,833 bp in MHC-B were identified (Shiina et al., 2007), including three highly polymorphic gene groups: BF, BL, and BG, which encode class I, class II, and class IV glycoproteins on cell surface, respectively (Lamont et al., 1987). The encoded proteins play important roles in rapid allograft rejection, immune response and determining susceptibility/resistance to pathogen infections (Pazderka et al., 1975;Plachy et al., 1992;Kaufman, 2018).
The BF/BL region (Guillemot et al., 1988), a 92-kb core region with a relatively high GC content (Shiina et al., 2007), has been described as a minimal essential MHC region since it contains two classical class I (B-F), two classical class II B (B-LB) genes, and some other genes involved in antigen processing and presentation (Bacon, 1987). This region contains 18 genes and 1 pseudogene (Blec4) (Lamont et al., 1987;Kaufman et al., 1995), of which, two MHC class I genes BF1 and BF2 encode α chain of BF antigen (Ewald and Livant, 2004). BLB1 and BLB2 encode β chains of classic MHC class II molecules (Lamont, 1989;Zekarias et al., 2002). MHC is a polymorphic region that has numerous SNPs and INDELs (Wong et al., 2004), and its variations can represent the whole genomelevel variation to evaluate genomic diversity and population history (Yuhki and Obrien, 1990). The microsatellite locus LEI0258, a tetranucleotide repeat system located within the BF/BL region, has been used to analyze population genetic diversity, structure, distinctiveness, and relationships between populations (McConnell et al., 1999;Fulton et al., 2006;Izadi et al., 2011;Chang et al., 2012). Huang et al. (2016) used LEI0258 to examine the genetic diversity and evolutionary history of south China domestic chickens and revealed that south China domestic chickens mainly originated from the Red Jungle Fowl (RJF). Han et al. (2013) found high genetic diversity of the MHC region in Chinese indigenous chickens by assessing genetic information on LEI0258. In addition to LEI0258, there are also many microsatellite loci such as MCW0370 and MCW0371. The use of these high polymorphism markers is of great significance for the study of chicken MHC diversity (Esmailnejad et al., 2017;Manjula et al., 2021).
However, to the best of our knowledge, there is no singlebase-resolution polymorphism description of MHC BF/BL region in Chinese domestic chickens. In this study, we comprehensively evaluated the polymorphisms of BF/BL region among 16 Chinese domestic populations, as well as White Leghorns (WLs), Cornish (CS), RJFs, and two high inbred lines (lines 6 3 and 7 2 ) (Waters, 1945;Stone, 1975;Bacon et al., 2000) using the WaferGen platform, which is a more costeffective method for identifying variations in targeted regions (De Wilde et al., 2014).

Ethics Statement
All experimental procedures and used animals were approved by the Animal Care and Use Committee of China Agricultural University.

Sample Selection
A total of 195 chickens from 21 populations were used in this study. Sixteen were Chinese domestic chicken populations (Table 1), including Dagu (DG, n = 9), Wenchang (WC, n = 10), Wuding (WD, n = 10), Xiayan (XY, n = 10), Xishuangbanna Game (XG, n = 10), Luxi Game (LX, n = 7), Gushi (GS, n = 10), Wenshang Barred chicken (WS, n = 10), Nandan Yao (NY, n = 10), Yunyang Black-bone chicken (YY, n = 10), Tibetan (TB, n = 10), Beijing You (BY, n = 10), Chahua (CH, n = 10), Dongxiang Blue-eggshell chicken (DX, n = 10), Shouguang (SG, n = 10), and Silkie (SK, n = 10). Two introduced populations were CS(n = 10) and WL(n = 10). The blood samples of first ten local breeds were collected previously by our colleagues from local chicken breed preservation farm. The blood samples of latter six local breeds as well as CS and WL were from Experimental Farm of Poultry Resource in China Agricultural University. The individuals were randomly selected for the assay. RJF (n = 3) was also included in this study, and DNA was kindly provided by Dr. Xiaoxiang Hu. Two highly inbred WL lines 6 3 (n = 8) and 7 2 (n = 8) were that were cultivated by the Avian Disease and Oncology Laboratory (ADOL) of the United States Department of Agriculture. After decades of breeding, the inbreeding coefficient of the two lines reached 99%. Serological tests showed that the two lines have the same MHC B haplotype (B2). Although they have the same haplotype, there were significant differences in Marek's disease (MD) resistance: line 6 3 is resistant to MD tumors but susceptible to both Marek's disease virus (MDV) and avian leukosis virus (ALV). Conversely, the line 7 2 is resistant to ALV infection but susceptible to both MDV and MD tumors (Waters, 1945;Stone, 1975;Bacon et al., 2000). DNA of line 7 2 and 6 3 were extracted from spleens.

Target Enrichment Sequencing
Genome DNA were extracted from blood or tissue samples using DNA extraction kit (TIANGEN). The purity, concentration, and volume of the extracted DNA were tested. WaferGen Smartchip platform was used to perform highly parallelized PCR for the target region and the procedure was followed that by De Wilde et al. (2014). There were total of 431 amplicons were designed using tiling settings, taking into account known SNP positions and with a target annealing temperature of 60 • C to cover target region (approximately 92 kb B-F/B-L region on chromosome 16). Primer3 (Version 2.3.7) global settings for primer design, length setting of amplicons were as follows: 250-300 bp and 301-350 bp, primer length: 18-30 nt, primer pair_max_diff_Tm: 3 • C, and primer TM: 58-62 • C (60 • C as optimal). Primers were listed in Supplementary Table 1. Specific barcodes were added to each sample. Parallelized PCR were cycled in the SmartChip Cycler (WaferGen Biosystems). Library was constructed by PCR pool. The concentration of PCR pool was measured using the dsDNA assay kit on the Qubit fluorometer (Invitrogen) and fragment analysis occurred on BioAnalyzer 2100 (Agilent). The library was sequenced by MiSeq platform.

Reads Mapping and Variant Calling
Chicken reference genome assembly (Galgal6 version) was download from NCBI. 1 Trimmomatic (Trimmomatic 0.39) was used to obtain clean reads, and the parameter of quality control was as follows: SLIDINGWINDOW:5:20 LEADING:3 TRAILING:3 MINLEN:36 (Bolger et al., 2014). SAMtools (Samtools 1.9) was used to create an index of the reference sequence (Li et al., 2009). Burrows-Wheeler Alignment tool (BWA-0.7.17) was used to mapping reads to the reference genome with the command "mem -t 10 -M" (Li and Durbin, 2010). The alignment data stream was piped to SAMtools and converted to a BAM file, which was then sorted according to the reference genome. The MarkDuplicates module in Genome Analysis Toolkit (GATK-4.1.3.0) was used to remove duplicated reads (McKenna et al., 2010). GATK HaplotypeCaller module and GATK GenotypeGVCFs module were used to convert BAM files to GVCF files and then switched to VCF files. The GATK VariantFiltration module was used to hard-filter SNP and INDEL. For SNP, we excluded potential false variants following these criteria:

Comparison Between Chickens Showing MD Resistance and Susceptibility
Comparisons were conducted between the line 6 3 and line 7 2 . Allele frequency and frequency difference of each SNP were calculated with VCFtools (versions 4.0) -freq and -diff-site (Danecek et al., 2011). In each of the lines, loci with genotypic information was retained from more than four individuals. The difference of allele frequency among two populations was subjected to chi-squared testing.

Sequencing and Mapping Summary
On average, ∼1,14,000 (80%) clean reads were obtained from ∼1,42,000 raw reads for each individual. Eighty-eight percentage of reads were aligned to the chicken genome assembly (galGal6) ( Table 2), and the captured specificity reached 99%. The coverage  of target regions ranged from 70 to 84%, with an average of 80%. However, the target region coverage of the line 6 3 and line 7 2 was the lowest, about 70% (Figure 1). The average depth of the 92 kb target regions reached over 240×, among them, the coverage of 55 kb was more than 30×, and the coverage of 64 kb was more than 10×.
The SNP database at Ensembl (release 102) had 4,804 known variations in this region, including 4,572 SNPs and 232 INDELs. In this study, a total of 3,319 SNPs and 181 INDELs were identified in the 21 populations (Table 3). Among them, there were 1,262 known SNPs (38%) and 22 known INDELs (12%); and the rest 2,057 SNPs (62%) and 159 INDELs (88%) were discovered for the first time. The Chinese domestic populations GS, NY, XB, and XY had the most abundant SNPs and INDELs than others, and the highly inbred line 6 3 and 7 2 showed the fewest SNPs and INDELs among the populations. There were no breed-specific SNPs or INDELs identified in this study, and most of them were mutually shared between some populations. We also calculated the allele frequency to analyze the difference in the BF/BL region between the MD resistant line 6 3 and the MDsusceptible line 7 2 . There were only 12 loci with allele frequency differences greater than 0.3, and the largest difference was 0.5, seven of them differed at significant levels (P < 0.05, chi-squared testing). There was no SNP locus that completely separates the two lines (Table 4).
Since the coverage of the different populations was different, we calculated variation density as follows: the numbers of SNPs and INDELs were divided by the coverage of each population. The population diversity was assessed as the number of SNP per kb and the number of INDEL per 10 kb. The top five populations with the highest density of SNPs were NY, XB, GS,XY,and WD,with 20.87,20.29,20.26,19.28,and 18.57 per kb, respectively. The lowest density of SNP was found in RJF, inbred line 6 3 and line 7 2 , with 2.13, 2.35, and 8.16, respectively. Similarly, NY, GS, XB, XY, and WC five populations showed the highest density of INDELs, and the three populations with the lowest density of INDELs were RJF, the line 6 3 , and line 7 2 . The SNP and INDEL densities in the two introduced populations, CS and WL, were lower than those of most of the Chinese indigenous populations (Figure 1).

Characteristic and Distribution of SNPs in Genes
The whole BF/BL region showed a very high genetic diversity, with an average of about 36 SNPs per kb. Among a total of 18 genes, the most diverse genes were BF2 and TAP2, with more than 50 SNPs per kb. About 86% of the SNPs identified in BF2 were newly discovered. The coding region variation was primarily concentrated in the second and third exon of BF2. BF1 and C4 followed with more than 40 SNPs per kb. CENPA, BLEC2, and two BLB genes had the lowest SNP rates of about 20 SNPs per kb (Figure 2). Among the 21 populations, RJF, line 6 3 and line 7 2 had the lowest diversity, while NY, GS, and XG were observed with the highest diversity in both BF genes (Figure 3). The diversity of CS and WL were lower than most of the Chinese domestic breeds. In addition, we also counted the number of variants in other genes (Supplementary Table 2). We then annotated all detected SNPs using VEP and found that most SNPs (54%) were located on the intron, 21% in coding region, about 7 and 4% in the downstream and upstream flanking regions, respectively, 3% in splice region and 5% were situated at UTR regions ( Figure 4A). In the coding region, more than half of the SNP were missense variants; 43% were synonymous variants ( Figure 4B).

Characteristic and Distribution of INDELs in Genes
The ratio of deletion (64%) was significantly higher than that of insertion (36%). Both the largest deletion and insertion detected in this study were 60 and 25 bp, respectively, and both the smallest deletion and insertion were 1 bp. A large proportion (89%) of INDELs were less than 10 bp. The majority was the single basepair INDEL and accounted for 58% of all detected INDELs.  In addition, INDELs were spread out of a 826 bp segment, accounting for 1% of the target region, and the region affected by deletions was larger than that by insertions (Figure 5).
To explore the distribution of INDELs across the 18 genes in the BF/BL region, insertion and deletion of different genes were counted, and the INDEL rate was calculated as the number of INDEL per kb for each gene. We observed that the average of INDEL rate on the target region was 2 per kb. The BLEC2 and TAP2 showed the lowest INDEL rate. The two genes, BF1 and BF2, coding the components of MHC class I molecules, showed the highest INDEL rate. Both showed more than 5 INDELs per kb. Our data showed that the BF1 and BF2 did not have large insertions and deletions, and the length of the INDELs in these genes was within 15 bp ( Table 5). In order to have a deeper insight on the distribution of INDELs in the gene regions, we annotated all the detected INDELs using VEP. We found that 62% INDELs were located on introns, 14% in coding exon, 15% in downstream and upstream flanking region, 5% in the UTR regions, and 2% in splice region (Figure 6A). In the coding region, more than threequarters of the identified variants were frameshift variant, 8% were inframe deletion variant, 7% were inframe insertion variant, and the remaining 3% were protein altering variant ( Figure 6B).

DISCUSSION
Next generation sequencing (NGS) has been widely used for identification of genome-wide genetic variation (Davey et al., 2011;Nielsen et al., 2011). However, chromosome 16 has not been elucidated well by NGS analysis because the sequencing coverage of chr16 was relatively poor (only about 50%), much lower than that of the other chromosomes (Yan et al., 2014). Additionally, NGS is not economically suitable for large-scale samples due to the high cost, therefore many cost-efficient methods have been developed to analyze subsets of genome-scale data, such as target enrichment sequencing of specific genes, which can significantly reduce the costs (Dapprich et al., 2016). WaferGen, a target region sequencing platform, is a PCR based platform that can generate as many as 5,184 amplification reactions on one chip (De Wilde et al., 2014). In this study, we conducted target enrichment sequencing to evaluate the polymorphism in the BF/BL region of 21 diverse chicken populations. Our capture efficiency was 99%, which was higher than a previous study in turkey (capture efficiency 53%) using a custom MHC SureSelect capture array (Reed et al., 2016). The average depth of our data in the target region reached about 200×, and the coverage of target region with more than 30× were 55 kb on average. The mean depth of the target region in this study was three times deeper than similar reported research in human (Jones and Good, 2016), which indicated high accuracy and reliability of the variations identified in this study.
A total of 3,500 variations were detected in the BF/BL region of the 21 chicken populations in this study. Among them, 2,057 SNPs and 159 INDELs were novel, which drastically expand the current Ensembl SNP database. The number of novel INDELs detected in this study was quite large, accounted for 5% of the total variations, but most of them privately appeared in some populations, which suggested that using additional populations with large genetic background differences may detect more genetic variation of the genome. We identified an average of 36 SNPs and 2 INDELs per kb in the BF/BL region, which is higher than that reported by Brandstrom and Ellegren (2007). International Chicken Polymorphism Map Consortium reported an average mutation density of 5 SNP/kb in the chicken genome (Wong et al., 2004), and that is much lower than what we found in the region, which also indicated high polymorphism of MHC BF/BL region. More than one-third of the variations have a minimum allele frequency above 0.05 in the populations, which meets the requirements of molecular markers for genetic analysis and may be implemented in breeding programs for genetic improvement (Mills et al., 2006(Mills et al., , 2011Vali et al., 2008;Maw et al., 2013).
Among these 18 genes, BF gene, and TAP gene showed high diversity, which is consistent with their antigen presenting roles. The BF genes encode typical MHC class I glycoproteins.
The exon 2 and exon 3 of BF genes encoding α1 and α2 domains of MHC-I are related to the resistance of many avian diseases (Hunt and Fulton, 1998;Kaufman et al., 1999;Rogers et al., 2003). TAP molecules are a part of the MHC class I antigen-processing pathway (Pamer and Cresswell, 1998). TAP heterodimer formed by TAP1 and TAP2 transports the antigen peptides into endoplasmic reticulum (ER; Deverson et al., 1990;Trowsdale et al., 1990;Spies et al., 2008). In human, one TAP heterodimer, four tapasin molecules, and four class I-β2m molecules form a peptide-loading complex that assists efficient loading of peptide onto class I molecules (Ortmann et al., 1997). Once the peptide in the lumen of the ER associates with class I molecules, the complex of MHC class I, β2m and the peptide will move to the cell surface, and the peptide will be presented to CD8 + T cells (Pamer and Cresswell, 1998).
As key components of the MHC-I antigen presenting pathway, super high polymorphism of BF gene and TAP gene potentially contribute to highly effective antigen presentation and broadspectrum host resistance to pathogens. Therefore, the high diversity that we found in most of the Chinese local populations implies that they potentially possess a broad-spectrum of disease resistance capability.
Among the domestic populations, NY chicken, GS chicken, XG fowl, and XY chicken were found with the highest polymorphism. These breeds were not yet subjected too strong artificial selection. NY chicken, a local breed in Guangxi province of China, showed the most abundant diversity, but the information about this chicken is limited. Guangxi local poultry resource book described that it was once a pheasant population in   Guangxi, China. The habitat of the NY chicken was primarily in the forests, and it was domesticated after capture by the natives. Previous research suggested that RJFs showed extraordinarily high diversity (Hoa et al., 2016), but it was observed with fewer variations in our study. These might result from that only three RJFs were used in this study. High polymorphism of Chinese domestic chickens was also observed by Qu et al. (2006). They found the average heterozygosity of 78 Chinese indigenous chicken breeds was 0.622 by analyzing 27 microsatellite markers. The diversity of WL and CS in our research is lower than that of most Chinese domestic chicken populations, which might be resulted from historical selection for performance characteristics. Izadi et al. (2011) reported that intensively selected commercial populations seem to have low genetic variability in the MHC regions compared with that of the non-commercial flocks, which are less intensively selected. Manjula et al. (2020) reported that  compared with commercial breeds, Korea native breeds had high genetic diversity in the MHC-B region, and they showed immune capabilities and genetic potential for resistance to many different pathogens. Hillel et al. (2003) used 22 microsatellite markers to analyze the polymorphisms of 52 chicken breeds from different countries and found that the average heterozygosity of local breeds was 0.50, but that of the commercial chicken breeds was only 0.29, which indicated that long-term intensity selection reduces population diversity. In this study, we found the polymorphisms of line 6 3 and line 7 2 were among the lowest, which must be resulted from high-pressure selection and is in good agreement with the other studies (Waters, 1945;Stone, 1975;Bacon et al., 2000).

CONCLUSION
We identified abundant genetic variations in the chicken MHC-B-F/B-L region, which not only strikingly expanded the current Ensembl SNP database, but also provided comprehensive data for researchers to further investigate association between variants in MHC and immune traits. The Chinese domestic breeds showed high diversity, which suggest their broad-spectrum resistance to pathogens. Long-term intensity selection significantly reduced diversity of MHC B-F/B-L region of the highly inbred line 6 3 and 7 2 chickens.

DATA AVAILABILITY STATEMENT
All raw sequence data had been deposited in NCBI Sequence Read Achieve (SRA) under the Bioproject number PRJNA728310. The experiment numbers for the 195 chickens are SAMN19075170-SAMN19075364.

ETHICS STATEMENT
The animal study was reviewed and approved by the Ethics Review Committee for Laboratory Animal Welfare and Animal Experiment of China Agricultural University. Written informed consent was obtained from the owners for the participation of their animals in this study.

AUTHOR CONTRIBUTIONS
YY performed the experiments, analyzed the data, and wrote the manuscript. HZ supported the samples and provided comments on the manuscript. GY provided comments on the data analysis and revised the manuscript. KW provided comments on the data analysis. ZY, CZ, HY, and JL prepared and detected the samples. NY supported the samples and provided comments on the experiment. LL conceived and designed the experiment, provided comments on the data analysis, manuscript writing, and revision. All authors contributed to the article and approved the submitted version.