SNP-Density Crossover Maps of Polymorphic Transposable Elements and HLA Genes Within MHC Class I Haplotype Blocks and Junction

The genomic region (~4 Mb) of the human major histocompatibility complex (MHC) on chromosome 6p21 is a prime model for the study and understanding of conserved polymorphic sequences (CPSs) and structural diversity of ancestral haplotypes (AHs)/conserved extended haplotypes (CEHs). The aim of this study was to use a set of 95 MHC genomic sequences downloaded from a publicly available BioProject database at NCBI to identify and characterise polymorphic human leukocyte antigen (HLA) class I genes and pseudogenes, MICA and MICB, and retroelement indels as haplotypic lineage markers, and single-nucleotide polymorphism (SNP) crossover loci in DNA sequence alignments of different haplotypes across the Olfactory Receptor (OR) gene region (~1.2 Mb) and the MHC class I region (~1.8 Mb) from the GPX5 to the MICB gene. Our comparative sequence analyses confirmed the identity of 12 haplotypic retroelement markers and revealed that they partitioned the HLA-A/B/C haplotypes into distinct evolutionary lineages. Crossovers between SNP-poor and SNP-rich regions defined the sequence range of haplotype blocks, and many of these crossover junctions occurred within particular transposable elements, lncRNA, OR12D2, MUC21, MUC22, PSORS1A3, HLA-C, HLA-B, and MICA. In a comparison of more than 250 paired sequence alignments, at least 38 SNP-density crossover sites were mapped across various regions from GPX5 to MICB. In a homology comparison of 16 different haplotypes, seven CEH/AH (7.1, 8.1, 18.2, 51.x, 57.1, 62.x, and 62.1) had no detectable SNP-density crossover junctions and were SNP poor across the entire ~2.8 Mb of sequence alignments. Of the analyses between different recombinant haplotypes, more than half of them had SNP crossovers within 10 kb of LTR16B/ERV3-16A3_I, MLT1, Charlie, and/or THE1 sequences and were in close vicinity to structurally polymorphic Alu and SVA insertion sites. These studies demonstrate that (1) SNP-density crossovers are associated with putative ancestral recombination sites that are widely spread across the MHC class I genomic region from at least the telomeric OR12D2 gene to the centromeric MICB gene and (2) the genomic sequences of MHC homozygous cell lines are useful for analysing haplotype blocks, ancestral haplotypic landscapes and markers, CPSs, and SNP-density crossover junctions.


INTRODUCTION
The human major histocompatibility complex (MHC), also referred to as human leukocyte antigen (HLA), is investigated continuously because of its importance in the regulation of the innate and adaptive immune system, autoimmunity, and transplantation Vandiedonck and Knight, 2009;Lokki and Paakkanen, 2019). The genomic region of the human MHC encompasses approximately 160 coding genes including three distinct structural regions: class I with the classical and non-classical HLA class I genes (HLA-A, -B, -C, -F, -G, and -E) and ∼39 non-HLA genes, class II with the classical and non-classical HLA class II genes (HLA-DRB1, -DRA, -DQA1, -DQB1, -DQA2, -DQB2, -DPA1, and -DPB1) and class III that harbours more than 60 genes including the complement genes, TNF, NFKBIL2, and many other genes that code for cytokines, transcription factors, structural and developmental proteins ). The MHC class I and class II gene clusters contain numerous sequence duplications, insertions and deletions and considerable sequence diversity or polymorphisms (Trowsdale and Knight, 2013) that have accumulated into distinct multilocus haplotypes with relatively high population frequencies (>1%) Degli-Esposti et al., 1992;Dawkins et al., 1999;Yunis et al., 2003;Goodin et al., 2018). These date from at least the beginning of human expansion and dispersal out of Africa, 50,000-100,000 years ago (Henn et al., 2012;López et al., 2015). The MHC multilocus haplotypes have been associated strongly with many diseases (Lokki and Paakkanen, 2019). On the basis of the large number of known HLA-B alleles, more than 20,000 different MHC multilocus haplotypes might be distributed worldwide in human populations, with less than a hundred in certain localised populations such as the Europeans (Steele and Lloyd, 2015;Jensen et al., 2017;Goodin et al., 2018). The common Northern European HLA haplotype HLA-A1-B8-C7-DRB3-DQ2 (8.1AH) is estimated to have diverged from a single common ancestor about 23,500 years ago (Smith et al., 2006).
Segmental shuffling is a meiotic recombination or crossing over process between different haplotypes Traherne et al., 2006) that often occurs within nucleotide sequences in regions between the alpha, beta, epsilon and delta frozen polymorphic blocks Traherne et al., 2006;Romero et al., 2007), although breakpoints have been reported also within and between the HLA class I genes within the alpha (Lam et al., 2013) and beta blocks (Nair et al., 2006) and between the HLA class II genes within the delta block (Jeffreys et al., 2004;Larsen et al., 2014). Many MHC recombinant haplotypes appear to have originated in relatively recent times (Smith et al., 2006;Lam et al., 2013) due to the pressures of bottlenecks, migrations and gene flow, inbreeding, and outbreeding in various times of abundance and deprivation (van Oosterhout, 2009;Lobkovsky et al., 2019;Wang et al., 2020). With the formation of new human MHC haplotypes during and/or after speciation, many of the high-frequency (>1%) AHs were preserved over numerous generations and migrations; even across different ethnic populations as deduced from the European (Contu et al., 1989;Aly et al., 2006;Bilbao et al., 2006;Smith et al., 2006) and Asian haplotypes (Lam et al., 2013(Lam et al., , 2015(Lam et al., , 2017. Much of genomic sequence diversity, including haplotype diversity, is driven by molecular mechanisms such as DNA repair, replication, single point mutations, indels, recombination, duplication, conversion, transposition, and segmental rearrangements (Gu et al., 2008;Brawand et al., 2014;Lin and Gokcumen, 2019). In addition, interspersed repeat sequences that contribute to >50% of the human genomic content (de Koning et al., 2011) have been implicated in a variety of these DNA molecular processes (Moolhuijzen et al., 2010;George and Alani, 2012;Raviram et al., 2018;Lu et al., 2020). The identification of transposable elements (TEs) near the junctions of duplicated genes (Kulski et al., 1997(Kulski et al., , 1999b and at ectopic and meiotic recombination sites (Myers et al., 2010;Altemose et al., 2017;Kent et al., 2017) emphasise their role in driving genomic diversity. Interspersed REs, because of their mobility, hypermutability, and potential role in meiotic recombination, are an integral part of molecular drive (Dover, 1982) that together with point mutations, gene conversion (Madrigal et al., 1993;Adamek et al., 2015) and balancing selection (van Oosterhout, 2009) with a component of multiplicative fitness (Lobkovsky et al., 2019) probably have generated and maintained haplotypic polymorphisms in the MHC class I regions. This multifunctional role for active REs is evidenced in part by the structural biallelic Alu, SVA, LTR, and HERVs located near to or within putative recombination hotspots throughout the MHC class I, II, and III genomic regions (Kulski et al., 2011). In recent years, the proposed broad roles for TE and polymorphisms in the regulation of meiotic recombination (a mechanism that undoubtedly generated the MHC haplotype diversity in humans) has gained increasing attention (Myers et al., 2008;Zamudio et al., 2015;Altemose et al., 2017;Kent et al., 2017;Bourgeois and Boissinot, 2019).
Both first-generation and second-generation sequencing methods have produced phased genomic sequences of representative MHC haplotypes by using MHC homozygous cell lines (Horton et al., , 2008Stewart et al., 2004;Traherne et al., 2006;Norman et al., 2017). These phased MHC genomic sequences are important reference DNA sequences that provide representative haplotypes for better informed large population studies and for mapping heterozygous sequence reads such as by inference graphs (Dilthey et al., 2016), SNP-LD based haplotype frequencies (Romero et al., 2007) and EEH tests (Lam et al., 2015), especially for disease associations (Alper and Larsen, 2017;Lokki and Paakkanen, 2019). Although Norman et al. (2017) produced an important database for 95 MHC homozygous cell lines of assembled and resolved MHC genomic sequences, they limited their own analysis to the multilocus alleles and haplotypes of the HLA classical class I and class II genes, MUC22 and the structural diversity of C4 duplications. Missing from their analysis are the many REs, repeats and retrotransposable subfamilies, as well as the amplified and duplicated members of the genomic DNA that make up >50% of the human DNA content and that contribute to disease (Ayarpadikannan and Kim, 2014;Payer et al., 2017;Payer and Burns, 2019), gene regulation and recombination (Moolhuijzen et al., 2010;Myers et al., 2010;Altemose et al., 2017;Chuong et al., 2017) and to the duplicated segmental organisation of the human and other primate MHC genomic structures (Kulski et al., 1997(Kulski et al., , 1999aAnzai et al., 2003;Kulski et al., 2004).
The purpose of the present study was to extend the Norman et al. (2017) analysis by investigating the haplotypic linkages between the MHC class I genic and intergenic regions including HLA-F, HLA-G, MICA, MICB, eight HLA pseudogenes (HLA-V, -P, -H, -T, -K, -U, -W, and -J) and a set of previously published biallelic REs, AluOR (Kulski et al., 2014), AluHF, AluHG, AluHJ, AluTF, AluMICB Kulski et al., 2019), HERVK9 (Kulski et al., 2008), MER9 ) and four biallelic SVA haplotypic markers, SVA-HA, SVA-HC, SVA-HB, and SVA-HF (Kulski et al., , 2011. A further aim was to identify and characterise the ancestral SNP-density crossover (XO) loci in DNA sequence alignments of different haplotype blocks or segments from the GPX5 gene in the OR gene region telomeric of HLA-F to the centromeric MICB gene within the MHC class I genomic region. The overall results of the study suggest that the SNP XOs are indicators of haplotype XO, which in turn point to putative ancestral recombination sites that are widely distributed across the 2-Mb-MHC class I genomic region from telomeric of HLA-F to centromeric of MICB.

MATERIALS AND METHODS
The haplotype data of 95 MHC genomic sequences sequenced and assembled from HLA-homozygous cell lines by Norman et al. (2017) at NCBI BioProject with the accession number PRJEB6763 (https://www.ncbi.nlm.nih.gov/bioproject/) were downloaded as Fasta files and used for the analyses described below. The other MHC genomic sequences used in haplotype analyses were the GRChr38.p13 (GCF_000001405.39) of the chromosome 6 reference NC_000006.12 at the NCBI (https:// www.ncbi.nlm.nih.gov/assembly/GCF_000001405.39/), UCSC (https://genome.ucsc.edu/cgi-bin/hgGateway) and eEnsembl (http://asia.ensembl.org/Homo_sapiens/Info/Index) browsers and databases, the eight human reference haplotypes described by Horton et al. (2008), the chimpanzee sequence of Anzai et al. (2003) and the gorilla sequence of Wilming et al. (2013). All of the Fasta sequences downloaded from the public archives were submitted to the RepeatMasker webserver (http://www. repeatmasker.org/cgi-bin/WEBRepeatMasker) for output files of annotated members of the interspersed repetitive DNA families, their locations in the sequence and their relative similarity or identity in comparison to reference sequences of SINEs, LINEs, LTRs, ERVs, DNA elements, small RNA, and simple repeats. For the online analysis, RepeatMasker used the Dfam database (3.0) for the repeat sequence comparisons (Hubley et al., 2016) (http://www.dfam.org) because since 20th May 2019, it no longer had access to the RepBase library of repetitive elements (Bao et al., 2015) previously provided by GIRI (https://www.girinst. org/repbase/). The main difference between the Dfam database and RepBase for our analysis was that Dfam listed many Alu-like short sequences as SVA, whereas we were interested only in the SVA mosaic of 500 to 1,800 bp in RepBase with structures similar to those described by Shen et al. (1994). Thus, we used four dimorphic SVA sequences (SVA-HA, SVA-HC, SVA-HB, and SVA-MIC) previously reported by Kulski et al. (2010) and added three new dimorphic SVA sequence markers to this analysis ( Table 1). Norman et al. (2017) provided the alleles of the HLA-A, -B, and -C class I genes for all the 95 cell line sequences shown in Supplementary Table 1. We confirmed the alleles of the HLA class I genes and included the alleles of HLA-E, -F, and -G, and the MICA and MICB genes and eight HLA-A class I pseudogenes (Supplementary Table 2) in the 95 cell line sequences by comparing them to the IMGT HLA allele sequences (IMGTRelease 3.38.0) using the DNA sequence assembly software Sequencher ver.5.0 (Gencode http://www. genecodes.com). The alleles that were not in the IMGT HLA allele databases (Robinson et al., 2019) at https://www.ebi.ac.uk/ ipd/imgt/hla/ are reported here as "new" without providing any further information about the novel nucleotide or amino acid differences. We also found that the FTQW01000001.1 sequence provided by Norman et al. (2017) as the chimpanzee "Clint" (Pan troglodytes genome assembly, contig: 1_COX_Oct2016_Scaffold, whole genome shotgun sequence) has strong identity with the COX cell line sequence that harbours the 8.1AH haplotype A * 01:01:01:01/ B * 08:01:01:01/ C * 07:01:01:01 (Horton et al., 2008).
We added a laboratory identifier number (ID_1 to ID_95) to each of the Norman et al. (2017) sequences (Supplementary Table 2) for ease of identification in comparative sequence analysis. A shorthand identifier for the MHC CEH/AH haplotypes based on the HLA-B allele such as 7.1CEH/AH, 8.1CEH/AH, 13.1CEH/AH was used as previously described (Degli-Esposti et al., 1992, Dorak et al., 2006. The alleles of the HLA class I genes, MIC genes, HLA class I pseudogenes and HLA class II genes were determined also for the GRChr38p13 genomic reference sequence, which corresponds to the 7.1AH of the PGF homozygous cell line (Horton et al., 2008), shown in Supplementary Table 3. The dimorphic RE and microsatellite markers that were searched for and identified by RepeatMasker in the 95 MHC genomic sequences are shown in Table 1. The RE dimorphisms (absence or presence) were easily recognised in each of the RepeatMasker outputs because of their positions within or close proximity to other TE elements and short tandem repeats (STRs). For example, the MER9/HERVK9-int/MER9 insertion at nucleotide positions (nts) 160655 to 166834 in Supplementary Table 4 is flanked by a string of telomeric LTR16B2/MLT1F1/ STR/AluY/STR/L1ME3 elements and a string of centromeric Charlie9 (nts, 89-303)/Charlie9 (nts, 1283-1803)/L1PA10/LMLT1F1/THE1C elements that are easily identified in the RepeatMasker outputs with the solitary MER5 and HERVK9-int deletion at their corresponding locations.
Comparative sequence alignments between two or more sequences to evaluate SNP densities and determine XO regions between SNP-poor regions (SPR) of <20 SNPs per 100 kb and SNP-rich regions (SRR) of >100 SNPs per 100 kb were performed with the web-based MultiPipMaker alignment program (http:// pipmaker.bx.psu.edu/cgi-bin/multipipmaker) by uploading the Fasta sequence files, a RepeatMasker output file and using the MultiPipMaker setting for single coverage as described by Schwartz et al. (2000) to generate the optimal sequence alignment. SNPs in the alignments were counted twice manually, and an average number was presented in the results. Obvious assembly errors, polynucleotides, simple microsatellite repeats and indels were not counted as SNPs. Also, a series of many adjoining SNPs (e.g., >5 SNPs in a string of 50 nucleotides) or SNPs within 50 bp of obvious sequencing errors with runs of unspecified nucleotides (Ns) and/or inconsistent long strings of deletions were not counted. The length of sequence alignments usually ranged between 50 and 500 kb depending on (1) the segments targeted for the analysis and the ease of SNP manual counting in the pdf outputs of the nucleotide alignments and/or (2) the length of the Percentage Identity Plot (PIP) output for reproduction as a convenient and readable image. The targeted sequences were selected and trimmed from the Fasta files that had been previous downloaded from the NCBI BioProject, accession number PRJEB6763. The software program Genetyx ver.20 (GENETYX Co., Tokyo, Japan) was used with the Selector function set to select and trim to obtain the required Fasta file sequences with the genomic sequence target positions taken from those listed in the RepeatMasker output text file (Supplementary Table 3). The T-Coffee multiple sequence alignment tool at EMBL-EBI (https://www.ebi.ac.uk/Tools/msa/ tcoffee/) was used to construct multiple sequences of ERV3-16A3int in the Fasta format and the CLUSTALW (1.83) format.

MHC Haplotype Sequences
Of the 95 human MHC haplotypes sequenced by Norman et al. (2017), 82 differed at least at one of the 9 loci, HLA-A, -C, -B, -DRB1, -DRB345, -DQA1, -DQB1, -DPA1, and -DPB1. However, there were 46 sequences representing 18 haplotypes that had the same combination of HLA-A, -C, and -B alleles for at least one haplotype pair. Furthermore, 70 sequences represented 19 different HLA-C/-B haplotypes and 67 sequences represented 23 different HLA-A/-C haplotypes (Supplementary Table 1) with a homologous alignment for at least one haplotype pair.
In this study, the haplotypic alleles of 56 loci were analysed, ranging between the OR gene region and the MHC class I Frontiers in Genetics | www.frontiersin.org  Norman et al. (2017) sequences and chimpanzee (Anzai et al., 2003), and therefore are fixed in humans.
region including the classical HLA-A, -B, and -C loci, the non-classical HLA-F, -G, and -E loci, 8 HLA pseudogenes, MICA and MICB, 8 Alu loci, 13 SVA loci, 2 MER9 loci, the HERVK9 locus, 5 LTR or MER5 loci, and 12 STR loci (Tables 1-3, Supplementary Table 2). The 9.5-kb MER5/LTR33 indel between the HLA-C and HLA-B loci also contained  Figure 1). There were numerous other indels ranging between 1 and 40 kb within the beta block sequences (Supplementary Figures 2-4) that were not included as allelic markers in this study. To assess the MHC class I haplotypic integrity of the 95 cell lines, the additional allelic haplotype combinations that we typed were sorted and grouped according to alpha block haplotypes (
(3) A single sequence sample with the HLA-A allele A * 02:05:01 had no AluHG insertion, but had a variant (CAGAGA)n microsatellite number, and different alleles for all the alpha block HLA pseudogenes except for HLA-U * 01:03.
(5) The AluHG insertion with the (CAGAGA)n microsatellite deletion was linked also to HLA-A * 30 in one haplotype and to HLA * A31 with the HERVK9 deletion in another haplotype, but not to the other three with the HERVK9 insertion, probably as a result of past recombinations or conversions.
(8) The HERVK9 insertion was present in 25 cell lines, whereas the other 70 cell lines had the signatory deletion marker, a solitary MER9 that is the deletion product of a recombination between the 5 ′ MER9 and 3 ′ MER9 flanking the 6-kb HERVK9 internal sequence.

Allelic Lineages Within Beta Block Haplotypes
Supplementary Table 6 shows the genic and non-genic allelic markers for the beta block haplotype sequences of 95 homozygous cell lines from the telomeric locus of HERVK9/MER9 microsatellite (TTTC)n known as M13 (Kulski et al., 1997) to the centromeric locus of MICB including five other microsatellite loci (M11, M9, Msx, MSa, and Msb), six dimorphic indels (SVA-HC, SVA-BC, 9.5-kb indel, SVA-HC, and AluP5) and four SNP loci (HLA-C and -B alleles and MICA and MICB alleles). There are 12 HLA-C, 21 HLA-B, 16 MICA, and 7 MICB allelic lineages that are linked together to form at least 54 HLA-C/HLA-B/MIC haplotype lineages. These haplotype lineages were sorted in the sequential order for the absence (allele 1) and presence (allele 2) of the SVA-HB insertion, and the alleles of HLA-C, HLA-B, MICA, and MICB, respectively ( Table 3). The SVA-HB insertion (allele 2) is missing from the chimpanzee and gorilla MHC (data not shown), and its absence is assumed to be the ancestral allele.
The HLA-C lineages with the SVA-HB insertion were 2 C * 01:02:01 linked to HLA-B * 27, 3 C * 02, 9 C * 03, 9 C * 09, 11 C * 06, 6 C * 07:01 and 1 C * 07:18, 9 C * 12, 4 C * 16, and 2 C * 17. Only C * 01, C * 03 and C * 07 had crossed over to be represented by both the absence and presence of the SVA-HB. Eighteen of 56 SVA-HB positive sequences contained SVA-HB duplications and LTR10/HERVI/LTR10 rearrangements that did not correlate with any particular HLA-C lineage or haplotype, suggesting that these variants were likely sequencing assembly errors. Nevertheless, there are two distinct haplotype evolutionary histories for the beta block that are based on the absence or presence of the SVA-HB insertion.
The different HLA-B lineage haplotypes with the SVA-HB insertion were partitioned further into another two lineages: those with the 9-kb deletion between AluY-(AT)n and AluJb-(TTAT)n and those without the 9-kb deletion ( Table 3, Supplementary Figure 1). The 18 HLA-B haplotypes with the 9-kb deletion were all linked to either HLA-C * 06:02:01 or -C * 12:03:01 (Table 3).  (Contu et al., 1989, Bilbao et al., 2006 in the cell lines EJ32B and DUCAF has two specific dimorphic Alu insertions, AluOR and AluOR1 (Supplementary Table 2), located ∼300 kb from the HLA-F gene (Figure 1). This finding confirms that the CPS of some MHC class I haplotypes and AH like the 18.2AH extend well into the OR gene cluster telomeric of the HLA-F gene and the MHC alpha block by at least 1,185 kb. The AluOR insertion was found also in one of two HLA-A * 29 sequenced samples (cell lines PITOUT and MOU, respectively), the HLA-A * 02:05:01 cell line WT49, the HLA-A * 11:01 cell line WT100BIS and the HLA-A * 23:01 cell line WT51 (Supplementary Table 2). The four Warao South American Indian cell lines LZL, AMALA, SPL and RML (Supplementary Tables 1, 2) present an interesting ethnic contrast to the Caucasoid cell lines (Supplementary Table 7). The Warao people who inhabit the rainforests of Orinoco Delta of northeastern Venezuela and western Guyana are an ancient ethnic minority with an extant population of ∼50,000 people. The Warao 62.xAH and 51.xAH have the HLA-A alleles A * 2:17:02, A * 02:04, and A * 02:12 rather than the common Caucasoid A * 02:01:01, but they also carry the AluHG insertion that is linked to most of the Caucasoid A * 02 lineages ( Table 2). One of the Warao cell lines (SPL, ID73) has the HLA-A * 31:02:02 allele linked to the HERVK9 insertion and SVA-HB deletion, which is markedly different to the Caucasoid and Oriental 62.xAHs, but with an alpha block haplotypic structure similar to two Caucasoid HLA-A * 31:02 lineages represented by the English cell line JHAF (ID46) and the Australian cell line MT14B (ID82) (Supplementary Tables 1, 2). The other HLA-A * 31:02 haplotype represented by the European Caucasoid cell line DEU (ID35) is with an AluHG insertion, a HERVK9 deletion ( Table 2) and an SVA-HB insertion ( Table 3), suggesting that a more modern HLA-A * 31 AH was subsequently generated by segmental shuffling exchanges.

Supplementary
Since the exact SNP XO regions between different MHC haplotypes are poorly defined in regard to the intergenic and genic distribution of repeat elements, a detailed comparative examination of DNA sequence alignments of similar and different haplotypes was undertaken using the PIP method of Schwartz et al. (2000). We started with an examination of ∼3 Mb of genomic sequence of the same haplotypes that included the class I region from HLA-F to MICB and the OR gene cluster that included the GPX5, ZNF311, OR2H2, GABBR1, and MOG genes (Table 4, Supplementary Table 8). We then performed a more detailed examination of SNP densities and XOs within the 1.2-Mb OR gene region (Table 5), the 310kb alpha block (Table 6), the 1,172-kb inter alpha and beta blocks (Supplementary Tables 9, 10) and the 307-kb beta block (Tables 7, 8) of the same and/or different HLA-A, -C and -B haplotypes. The alignments and SNP counts were analysed manually across the entire 3 Mb and in 50-kb to 500-kb segments connecting the various segments as a sliding window. Figure 1 summarises the findings of our analysis of more than 250 sequence alignments between different and the same haplotypes with the identification of at least 38 ancestral SNP XO sites between SNP-poor and SNP-rich regions within ∼2.8 Mb from GPX5 to MICB.

SNP Densities Within MHC Class I Homologous Haplotypes
We grouped and aligned 41 sequences to evaluate the variations of SNP density and the degree of homology within the same CEHs/AHs (Table 4 and Supplementary Table 8).
The homologous sequence alignments for 12 of 16 different CEHs/AHs revealed a scarcity of SNPs ranging over ∼1.8 Mb from HLA-F to MICB with <150 SNPs over the entire region at an average of 20 SNPs for 17 sequence alignments. Seven of the 16 different CEHs/AHs were SNP poor (<150 SNPs over ∼3 Mb) from the GPX5 locus in the OR gene region to the MICB gene in the beta block region. These highly homologous sequence runs represented the seven sequences of 8.1AH, five sequences of 7.1AH, two sequences each of 18.2AH, 51.1AH, 57.xAH and 62.xAH, and two of three sequences classified as 62.1AHs. The SNP counts over the same range for the alignment of different haplotypes such as between the 7.1 CEH/AH and 8.1 CEH/AH were >2,000 for ∼3 Mb.
Six CEHs/AHs had regions of substantial diversity that were SNP-rich between the alpha and beta blocks and/or in the OR gene region at the telomeric end of the alpha block. These haplotypes consisted of different-sized, SNP-poor recombinant blocks interspersed between SNP-rich recombinant blocks. The most surprising results were for the comparison between the three sequences of the 62.1 CEH/AH and the three sequences of the 44.1 CEH/AH. The sequence of LAB ID_41 had varying regions of SNP density with three SNP XOs, whereas ID_40 and ID_85 had few detectable SNPs (0.77 SNPs per 100 kb) and no XO SNPs in their alignment from GPX5 to MICB. Similarly, the ID_60 sequence with the 44.1 CEH/AH had at least two SNP XO events, one in the region between HLA-J and HLA-E and another in the region between MUC21 and HLA-C. In comparison, the ID_74 sequence of 44.1 CEH/AH had few SNPs and no detectable SNP XO in the 1.8-Mb sequence block from HLA-F to MICB.

Crossing Over Within the OR Extended Gene Region
The SNP XOs for some CEHs/AHs were detected in the OR gene regions hundreds of kilobases telomeric of the HLA-F gene ( Table 5). For the sequence comparison between haplotype pairs, the OR genomic region was divided into four segmental blocks of 210-300 kb each, ranging from the telomeric GPX5 gene to the ZFP57 gene that are 1,118.4 kb and 42.1 kb telomeric of the HLA-F locus, respectively. The average SNP/210 kb for four different haplotype pairs was 317 SNPs within the genomic region between MAS1L and the start of the alpha block F segment. In paired sequence comparisons of 23 similar haplotypes, a FIGURE 1 | to MICB and the nucleotide position 28.5-31.6 (Mb distance from telomere on chromosome 6, GRCh38.p12 Primary Assembly NC_000006.12; NCBI, UCSC or ENSEMBL browsers on the Web). (A) The OR genomic region from 28.5 to 29.6 Mb, (B) the OR and MHC alpha block region from 29.6 to 30 Mb with the location of the 10 HLA segmental duplications segment F to segment J indicated by the horizontal double arrows between the ERV3-16A3_int insertions (solid red vertical arrow), (C) the TRIM gene cluster, HLA-L and HLA-N pseudogenes, non-classical HLA-E gene, and structural and regulatory gene marker (TUBB to DDR1) from 29.9 to 30.9 Mb, (D) the regulatory gene region (GTF2H4 to PSORS1C3) and the beta block of HLA-C to MICB from 29.9 to 31.6 Mb. The NCR3 to LY6G6F gene markers from 31.6 to 31.8 Mb are near the putative recombination sites identified by Lam et al. (2013) at the start of the MHC class III region. The X within green boxes indicate crossover sites, the blue solid blocks are the recombination sites reported by Lam et al. (2013), and the vertical orange (indels) and open arrows (fixed) are the SVA listed in Table 1. The Alu indels such as AluOR and AluOR1 in Table 1 are indicated as labelled yellow vertical boxes in the Alu or SVA indel row. The location of the THE1, LTR16B, and Charlie are indicated by the vertical bars in their respective rows. The blue dotted bars represent the THE1B subfamily and the purple vertical bars represent the other subfamilies, THE1A, THE1C, and THE1D.
SNP XO was found in 15 pairs within 237 kb between MAS1L and HLA-F. A SNP XO was found near to or within ERV3-16A3, an ancient HERV-16 element at the junction of the F segment in five haplotype pairs, within LTR10A of two haplotype pairs, within MER21C of two haplotype pairs, within AluY/Sc8 of two haplotype pairs and within Tigger2b of one haplotype pair. Most SNP XO were found in loci between ZFP57 and HLA-F (five haplotype pairs), USB and OR2H2 (five haplotype pairs), GABBR1 and MOG (three haplotype pairs) and OR2H2 and GABBR1 (two haplotype pairs), revealing the variability of the SNP XO junctions that were involved with ancestral recombinations. The SNP XO in a region between the OR12D3 and ORD12D2 genes in the sequence alignment of 22_A * 32-B * 38 and 72_A * 32-B * 44:02 is ∼332.2 kb from the HLA-F gene. Moreover, this SNP XO site is in close proximity to the young Alu indel, AluOR, that was detected in Japanese and Caucasians at a frequency of 0.32 and 0.14, respectively ( Table 1). This SNP XO and active Alu insertion site appear to mark a hotspot for meiotic and insertion recombinations. Of the seven other sequence comparisons, no SNP XO was detected in two pairs (ID4 v ID6 and ID78 v ID79) of sequences that ended at the MA1L/LINC01015 segmental block and in five pairs (ID11 v ID16, ID4 v ID51, ID34 v ID94, ID25 v ID26, and ID46 v ID82) that ended near the GPX5 locus because of the absence of sequence for further analysis. The comparisons between two 7.1AH (ID4 v ID51) and two 8.1AH (ID11 v ID16) were striking because the SNP-poor region extended from the alpha block to at least the GPX5 gene that is ∼1118 kb from the HLA-F gene. Both of the extended A * 30-B * 18 haplotypes (ID25 v ID26) carried the haplospecific AluOR insertion and the novel AluOR1 insertion (Table 1).

HLA-A Haplotypes
The alpha block was divided into 10 segments containing the duplicated HLA genes and pseudogenes from segment F with the HLA-F gene to segment J with the HLA-J pseudogene Kulski et al., 1999a,b) for SNP counts and XO analysis ( Table 6) as shown in Supplementary Figure 6 and Supplementary Table 3. The SNP counts were zero to less than 20 SNPs over 320.3 kb of sequence for 20 of 35 similar alpha block haplotype pairs ( Table 6). In contrast, the SNP counts were much greater in the sequence alignments of seven different haplotype pairs ranging between 2,035 and 2,644 SNPs per 320.3 kb at an average of 7 SNPs/kb. The highest average SNP density was 16 SNPs/kb in the A segment and the lowest density was 2 SNPs/kb in the J segment. Because of deletions or XO within the alpha block, some recombinant haplotypes had intermediate SNP numbers such as 692 to 884 SNPs per 320.8 kb for six of the HLA-A * 24 haplotypes, 269 SNPs for the 48_A * 24 vs. 59_A * 24 haplotypes and 707 SNPs for the 46_A * 31 vs. 49_A * 33 haplotypes. The smallest amount of SNP diversity within the alpha blocks of different HLA-A allelic lineages was between the HLA-A * 26 and the HLA-A * 66 haplotypes with only 66 SNP differences across the 320.8 kb alpha block (one SNP per 4.9 kb compared to an average of one SNP per 0.14 kb for an average of seven haplotype pairs). The biggest SNP difference among the different haplotypes was in the 53-kb G-segment and the smallest was within the 15-kb A segment that had only a total of three SNP differences. A previous analysis of the HLA-A * 26/A * 66 loci indicated that the HLA-A * 66 was a product of a gene conversion (Madrigal et al., 1993). Alternatively, the relatively small number of SNP differences across the entire alpha block of the HLA-A * 26/A * 66 haplotypes suggests that they might have evolved from the same AH and diverged slightly by gene conversions and mutagenesis over time because of their age.
An intermediate amount of SNP diversity was detected between HLA-A * 31 and HLA-A * 33 with 707 SNPs within the 320.8-kb alpha block. However, the first 100 kb of the alpha block including the F, V and P duplicated segments was 605 SNPs and the remaining 220 kb of the alpha block from the G segment to the J segment was only 102 SNPs including 11 SNPs in the A segment. This segmental division with sequence homogeneity at the centromeric end of the HLA-A * 31 and HLA-A * 33 haplotype sequences and large diversity at the telomeric end of their alpha blocks has an ancestral SNP XO at the centromeric end of the P segment within the MICG pseudogene (C/T). In comparison to the 707 SNPs across the alpha block of the HLA-A * 31 and -A * 33 haplotypes, there were 2,287 SNPs across the alpha block of the HLA-A * 03 and -A * 33 haplotypes. Coincidently, the HLA-A * 03, -A * 31, and -A * 33 haplotypes all have an HERVK9 insertion. Thus, two thirds of the HLA-A * 31/ * 33 alpha blocks have the same haplotype lineage whereas the other third are evidently from different haplotype lineages.
The largest difference among different HLA-A haplotypes was between HLA-A * 30 and HLA-A * 31 with 2,644 SNPs in the alpha block, whereas on average there were 2,235 SNPs in the alpha block for seven different sequence pairs. The biggest differences between the same HLA-A haplotypes were obtained for the HLA-A * 24 pairs, suggesting that their alpha blocks may have undergone numerous shuffling and exchanges with various other HLA-A haplotypes. No SNPs were detected in the alpha blocks of the HLA-A * 32 haplotype pair and only one SNP was detected in the alpha blocks of an HLA-A * 24 haplotype pair and the HLA-A * 29, -A * 30, and -A * 31 haplotype pairs. The SNP XOs for the same haplotypes at the telomeric end of the alpha block were variable depending on which haplotypes were compared ( Table 6), but mostly involved the F segment with the HLA-F gene and sites within or between the Tigger1 and Charlie20 DNA elements. In genomic sequence comparisons between seven similar HLA-A * 24 haplotype pairs, all of them had the 54-kb deletion of the T and K segments between the H and A segments (Supplementary Figure 5). Also, some XOs occurred in the G segment at A/G 141380 between HAL1 and (ATAAT)n, which is near the AluHG insertion locus and the MICF pseudogene (ID34 v ID48, ID34 v ID59, ID48 v ID68, and ID48 v ID94) (Figure 2).
The sequence comparison between the HLA-A * 02:01:01 and HLA-A * 02:05:01 haplotypes revealed large SNP differences within five segments from F to H of the alpha block (ID5 v ID8 and ID10 v ID8). This difference indicates that an exchange of segments T, K and A had occurred in the HLA-A * 02:05:01 haplotype (Italian cell line WT49) that lacks the haplospecific A * 02 lineage marker AluHG within the G segment (Supplementary Table 2). The sequence comparison between 4_A * 03:01:01:01 and 18_A * 03:01:01:01/A * 24:02:01 with a low SNP density within the segments H to A is noteworthy because it reveals that the 18_A * 03:01:01:01/A * 24, if assembled correctly from the heterozygous Australian Caucasoid cell line LO081785, is an atypical and highly divergent A * 03 haplotype with a HERVK9 deletion (Hap ID19.A * 03.2 in Supplementary Table 6 and ID_18 in Supplementary Table 2). The SNP XO was within the G segment at nucleotide position 150337 C/T located between the Tigger1 and Charlie20a DNA elements ( Table 6).

XO in Regions
Between HLA-J and HLA-C (a) SNP XOs within HLA-A haplotypes. The analysis of XO junctions in the genomic sequences between HLA-J and HLA-C outside the alpha and beta blocks using the same HLA-A and different HLA-C alleles was limited to only 15 comparisons (Supplementary Table 9). Of these, the XO was between HLA-A and HLA-J in two comparisons, between HLA-J and HLA-E in six pairs, and between HLA-E and MUC21 in six pairs. In the sequence comparison between 11_A * 01-C * 07-B * 08 and 2_A * 01-C * 06-B * 57 (example 6 * in Supplementary Table 9), at least five different XO junctions were detected in different regions of segments B to E, including the transition from SPR to SRR in block B, two separate XO transitions in block C, a SRR to SPR XO in block D and a SPR to SRR XO in block E. Multiple XOs were observed also for analyses numbered 5 to 8 in Supplementary Table 9. The entire SNP-poor region (20 SNPs) was extended from HLA-J to HLA-C in the sequence alignment of cell lines with the same haplotype, A * 01-C * 06.
(b) SNP XOs within HLA-C haplotypes. Supplementary Table 10 shows the XO regions for 44 recombinant sequence pairs with the same HLA-C, but different HLA-A allelic lineages. A control homologous sequence pair (analysis 45) with the same HLA-A and HLA-C allelic lineages, 2_A * 01-C * 06-B * 57 and 31_A * 01-C * 06-B * 40, was SNP poor with only 20 SNPs counted over 1,322.9 kb of sequence. In comparison, the other sequences were heterologous (SNP rich) over most of the genomic region between the HLA-A and HLA-C genes. XOs occurred from SNP rich to SNP poor within the HLA-C gene or the 3 ′ non-coding region (NCR) of HLA-C in a comparison of 14 haplotype pairs, implying that recombinations occurred within the coding region of the ancestral HLA-C * 07:18 and some other ancestral HLA-C * 07 allelic lineages. The HLA-C haplotypic alleles that transitioned from SNP-poor to SNP-rich regions within 12 kb of the 3 ′ end of exon 8 of HLA-C included C * 01, C * 04, C * 05, C * 07, C * 08, C * 12, and C * 14 depending on their linkage with HLA-A alleles. The C * 03/HLA-A * 01, -A * 02, -A * 24, and the C * 17/HLA-A * 01, -A * 30 combinations were SNP poor until the PSORS1C3 gene that is located about 100 kb from the HLA-C gene. Some of the HLA-C * 06 and HLA-C * 07 alleles within different recombinant haplotypes (analysis numbers 31 to 34) were homologous (SNP-poor) in the region from the HLA-C locus to the HCG22 locus. This long-range, homologous genomic segment of ∼215 kb between HLA-C and HCG22 includes the various psoriasis candidate genes HCG27, PSORS1C3, POU5F1, TCF19, CCHCR1, PSORS1C2, PSORS1C1, CDSN, and C6orf15 (Nair et al., 2006). However, in combinatorial analyses of recombinant risk haplotypes and alleles genotyped in 678 families with early-onset psoriasis, Nair et al. (2006) determined that HLA-C * 06 was the solitary risk allele that conferred susceptibility to early-onset psoriasis. Other HLA-C/HLA-A recombinant haplotypes were SNP poor over even greater distances ranging between 156 kb from the HLA-C gene to C6orf15 (C * 01) and 730 kb from the HLA-C gene to a region between the LINC02569 and GNL1 loci. XO between SPR and flanking SRRs also occurred within or near to the MUC22 and MUC21 genes that are 234 kb and 279 kb from the HLA-C gene locus, respectively.

SNP-density XOs in Regions Between Different
HLA-C and HLA-B Alleles Table 7 shows the results of SNP counts and XO loci for 59 HLA-C and HLA-B haplotype sequence alignments using various combinations of similar and different haplotypes. As controls for comparing the various HLA-C/HLA-B recombinants, two were different HLA-C and HLA-B haplotypes and six were the same HLA-C and HLA-B haplotypes. The different haplotype pairs yielded an average of 1,656 SNP counts for ∼93 kb, whereas the same haplotype pairs produced a few or no SNPs over the same distance between HLA-C and HLA-B loci. There were 39 sequence pairs of HLA-C/HLA-B recombinants with the same HLA-C allele and a different HLA-B allele, whereas 12 pairs of HLA-C/HLA-B recombinants had the same HLA-B allele and different HLA-C allele. Most SNP XO occurred near to or within the HLA-B or HLA-C coding region depending on which HLA-C/HLA-B recombinant haplotype sequences were aligned. SNP XOs occurred within the 3 ′ NCR or coding region of the HLA-B gene and within a 3-kb portion between a MLT1N2 element and the HLA-B gene (Figure 3) for 28 of 51 HLA-C/HLA-B haplotype alignments ( Table 7). The SNP XOs in the intermediate loci  (48, 59, 68, and 94) with the same HLA-A*24 allelic lineage. The structural biallelic AluHG insertion is linked to HLA-A*02, but is absent in HLA-A*24 ( Table 2). The AluHG TD insertion site CACTTAAACAT (Kulski et al., 2001) is outlined in purple.  Table 7. Numbers above the arrowheads represent the number of different haplotype alignments revealing the SNP crossovers. Although more than half of the crossover sites were found within the 3-kb 3 ′ -NCR of HLA-B, six were found within HLA-C and another 13 were outside the 3-kb 3 ′ -NCR of HLA-B and downstream towards HLA-C.
regions were (1) within the L1PA13 fragment that is ∼9 kb from HLA-C and ∼72 kb from HLA-B, (2) between the L1MB8 and MLT1D elements ∼53 kb from HLA-B, (3) a MIR element ∼44 kb from HLA-B, and (4) within the HERVI ∼21.5 kb from the HLA-B gene.

SNP-density XOs in Regions Between Various HLA-B and MIC Gene Alleles
The SNP XOs in the genomic region between HLA-B and MICA and between MICA and MICB for 31 recombinant haplotypes are listed in  (Figure 4). A SNP XO was found also within the insertion sequence of the SVA-MIC indel ( Table 1) for the haplotypes 86_B * 40/MICA * 008/MICB * 002 and 13_B * 40/MICA * 008/ MICB * 014. SNP XO locations were not identified in five sequence comparisons because the genomic region from HLA-B to MICB was either too SNP-rich (comparisons 29 to 31) or too SNP-poor (comparisons 25 and 26).

DISCUSSION
The comparative sequence analyses of the 95 genomic sequences using RepeatMasker (Hubley et al., 2016) and PIP (Schwartz et al., 2000) confirmed the identity and HLA class I allelic linkages of 12 haplotypic RE markers ( Table 1) that had been genotyped previously in population frequency studies and MHC homozygous and heterozygous cell lines Kulski et al., 2011). We identified another four novel REs and indels; three of them were annotated as AluOR1, AluP5, and SVA-BC indels (Table 1, Figure 1). A fourth indel was 9.5 kb in size and composed of the MER5B and LTR33 subfamily members (Supplementary Figure 1). The deletion variant of the 9.5-kb indel that is located between the SVA-BC and SVA-HB loci within the beta block was linked to all 11 HLA-C * 06:02 alleles, all 4 HLA-C * 12:03 alleles, and 1 of 9 C * 04:01 alleles ( Table 3). The 9.5kb indel, SVA-BC and SVA-HB are all located within a relatively strong recombination hotspot of 82 kb between the HLA-B and HLA-C genes (Figures 1, 3). This intergenic recombination hotspot was identified previously in sperm studies (Cullen et al., 2002) and HapMap studies (Lam et al., 2013). Interestingly, a SNP XO was detected within the SVA-MIC insertion between the MICA and MICB loci (Figure 4) of the A * 02/C * 03:04/B * 40 haplotype pair ID13 and ID86 ( Table 3), suggesting that this insertion locus is near a recombination site that exchanged the MICB * 002:01 allele with MICB * 014. We included eight HLA class I pseudogenes (V, P, H, T, K, U, W, and J) as haplotypic markers in our analysis  Schwartz et al. (2000).
of the alpha block haplotype diversity even though most of them may have no physiological functions or regulatory roles. However, the HLA-H gene is polymorphic and has transcriptional activity, and the signal peptide of the membranebound HLA-H molecule can mobilise HLA-E to the cell surface of mononuclear cells, bronchial epithelial cells and lymphoblastoid cell lines (Jordier et al., 2020). The HLA-H * 02:07 allele was found at a frequency of 19.6% in some East Asian populations (Paganini et al., 2019) and encodes a full-length HLA protein that may have tolerogenic activity (Jordier et al., 2020) in comparison to immunosuppressive activity of its neighbouring gene product, HLA-G (Elliott, 2016). Taken together, the alleles of the pseudogenes confirmed that the alpha block haplotypes consist of multilocus alleles and that the recombinant XOs were mostly at the telomeric and centromeric ends of the block often within the telomeric F segment and/or the centromeric W and J segments ( Table 2), but also in MICF fragment of the G segment close to the biallelic AluHG insertion site (Figure 2) and in proximity of HERVK9 of the HLA-H segment associated with the 54-kb deletion of the HLA-A * 24 lineage (Supplementary Figure 5). Thus, the haplotypic HERVK9, AluHF, AluHG, and AluHJ haplotype for 4136 kb and 8.7 SNPs/100 kb for the A2-B46-DR9 for 2721 kb, which are much higher than our comparative counts. This difference is possibly due to spikes of diversity in their localised regions of sequence. The SNP counts by Lam et al. (2015) within homologous haplotypes represented as nucleotide diversity values were still at least 38× less than that found between 7.1CEH/AH and 8.1CEH/AH, the two different common European MHC heterologous haplotypes of the cell lines PGF and COX, respectively. The count of 2240 SNPs per 320.8 kb (7 SNPs/kb) for PGF (LAB ID_4) and COX (LAB ID_11) in our study when their alpha blocks were aligned ( Table 6) further demonstrates the large SNP diversity between them. In comparison, the average SNP density of the human genome is one SNP per 1.9 kb or between 5 and 9 SNPs per 10 kb (Sachidanandam et al., 2001;Zhao et al., 2003).
The two longest MHC haplotypes in the human genome are considered to be the European 7.1AH and 8.1AH (Horton et al., , 2008 with CPS extending into the telomeric OR gene cluster region beyond the GPX5 gene that is 1.2 Mb telomeric of HLA-F (Figure 1). However, the CPS of at least four other MHC class I haplotype pairs also extended beyond the GPX5 gene: 18.2AH, 51.xAH, 57.1AH, 62.xAH, and 62.1AH ( Table 5, Supplementary Table 8). The CPS of another three HLA-A haplotype pairs also extend beyond the GPX5 gene: HLA-A * 24/C * 03:04, HLA-A * 31/C * 15:02 and HLA-A * 32/C * 12:03 presented in Supplementary Table 8. The SNP XO junctions could not be determined for these haplotypes because they are telomeric of the GPX5 gene (NCBI ID:2880) and located somewhere beyond the region of the available genomic sequences provided for these cell lines (Norman et al., 2017). In comparison, the SNP XO junctions for two other HLA-A * 01/C * 06 and HLA-A * 01/C * 07 pairs and two HLA-A * 03/C * 07:02 and HLA-A * 03/C * 06:02 pairs were closer to the MOG and OR2H2 genes ( Table 5) that are 51 kb and 134 kb from the HLA-F gene, respectively (Figure 1). It seems that the recombinant breakpoint of different haplotype segments or blocks generates the SNP XO site even if the classical HLA alleles are the same. For example, a number of HLA-A * 02 allelic lineages were represented by distinctly different alpha block haplotypes ( Table 2) that evolved from various shuffling events as well as segmental conversions (Table 6). Thus, the SNP XO within the alpha block F segment was strongly correlated with different local HLA-F alleles even if the downstream HLA-A allele within the A segment of the alpha block was the same; that is, the XO between HLA-F * 01:01:01:01 of 18.xAH or 38.a2AH and F * 01:04:01:02 of 60.1AH are both linked to HLA-A * 02. The SNP XO defines the junction of the haplotype XO, which in turn points towards a putative ancestral recombination site. Whether this relationship that is based on a small number of comparative examples can be confirmed in the future will depend on the empirical findings of a much greater number of haplotypes with different HLA-A allelic and haplotypic lineages.
During mammalian and primate evolution, the MHC region transitioned through various genomic rearrangements including recombinant XO events of MIC, HERV16 and HLA class I genes, which resulted in the current structural organisation of the human MHC class I genomic region (Kulski et al., 1999a,b;Kulski et al., 2004). The HERV16, now classified as ERV3-16A3 (repeated throughout the MHC class I and class II regions), along with HLA class I coding and non-coding sequences, seems to have been a recombination site for many of the duplication events by way of unequal XOs (Kulski et al., 1999a,b;Kulski et al., 2004). Ancient hominoid haplotypes undoubtedly were the progenitors to the modern human CEH/AH, but when, how and where is unknown. In addition, each AH is a unique integrated genetic module consisting of many immunologically related proteincoding genes with gene copy number variations, segmental duplications and fragmented or relatively intact transposons and REs that contribute to more than 50% of the genomic content. In this regard, the MHC haplotype comprising a cluster of multilocus, monocistronic expression units is analogous to the polycistronic bacterial operon, which is a functional unit of DNA containing a cluster of genes under the control of a single promoter (Lee and Sonnhammer, 2003;Blumenthal, 2004). However, the MHC haplotype structures are far more complex with their regulatory network of cis-acting multilocus expression units known as haplotype-specific expression quantitative trait loci (eQTL) (Lamontagne et al., 2016;Lam et al., 2017) that are largely controlled by an array of virus-derived REs and DNA transposons, both as binding sites for transcription factors and as sources of regulatory non-coding RNAs (Kulski, 2019;Sznarkowska et al., 2020). Random mutations, methylations and recombinations can generate considerable haplotype diversity that is part of the MHC immune system's response to highly prevalent infectious (Gao et al., 2019;Sanchez-Mazas, 2020) and chemical agents, including those responsible for drug hypersensitivities (Alfirevic and Pirmohamed, 2010).
Various factors have been hypothesised to elucidate the generation and maintenance of MHC CEHs/AHs including recombination suppression, balancing selection and demographic factors such as population bottlenecks, genetic drift, migration and admixture van Oosterhout, 2009;Prohaska et al., 2019). If CEHs/AHs were generated in recent population history, for example, during the last 21,000 to 26,000 years as estimated from the mutation rates of the Caucasian A1-B8-DR3-DQ2 haplotype (Smith et al., 2006) and the Asian A2-B46-DR9 and A33-B58-DR3 haplotypes (Lam et al., 2013), then there may have been insufficient time over a period of a few thousand generations to have disrupted the LRHs by recombination. It seems that there is a time-associated equilibrium between the population amplification of the LRH and its meiotic recombinational breakage over periods of human population coalescent times (Song et al., 2017;Wang et al., 2020). Polymorphism and sequence heterozygosity also might suppress crossing over and recombination (Ohta, 1999;Dluzewska et al., 2018) and it is well-known that an excess of heterozygosity (overdominance) contributes to MHC diversity as one form of balancing selection (van Oosterhout, 2009;Lenz et al., 2016;Lobkovsky et al., 2019). However, in the context of molecular mechanisms and our study, the connection between the silencing of TE/REs and recombination suppression warrants greater consideration (Campos-Sánchez et al., 2016). TEs are known to affect recombination rates by acting as recombination modifiers, activators and suppressors in mice and humans (McVean, 2010;Altemose et al., 2017;Yamada et al., 2017). Although TEs can directly modulate the local recombination environment either through silencing-mediated suppression or the recruitment of recombination hotspots, the silencing of TEs and other repetitive sequences may also contribute directly to recombination suppression. The densities of DNA methylation and repressive chromatin marks associated with the silencing of TEs and other repeats are often negatively correlated with recombination rates (Myers, 2005;Kent et al., 2017). A recombination suppression mechanism discovered and studied in recent years involves the PRDM9-mediated recombination machinery that initiates at specific sequence motifs and alters chromatin structure (Myers et al., 2010;Parvanov et al., 2017). In humans, PRDM9 determines the locations of meiotic recombination hotspots and binds multiple motifs including the ATCCATG/CATGGAT motif of the THE1B repeat for both dependent and independent recombination suppression (Myers et al., 2008;Altemose et al., 2017). The active PRDM9 binding sites are also enriched with other classes of human repeat sequences including L2 LINEs and AluY elements (Myers et al., 2008;McVean, 2010). Recent studies with B6 mice demonstrated that a third of meiotic DNA strand breaks occurred within repetitive sequences of different classes, especially within the DNA transposons like TcMar-Mariner and hAT-Charlie, that resulted in their depletion from the PRDM9 binding sites (Yamada et al., 2017). This finding led the authors to hypothesise that the PRDM9 coevolved with meiotic recombination in order to target active transposons and limit their spread by inactivating or eliminating them by creating mutations or deletions at the PRDM9 binding site. Moreover, a proportion of the duplicated MHC ERV3-16A3_I (HERV16 int) sequences Kulski et al., 1999a,b) have the PRDM9 binding motif ATCCATG/CATGGAT at one or two of its nucleotide positions (Supplementary Figure 7).
Although TEs, methylations and recombinations can influence each other markedly (Myers et al., 2008;McVean, 2010;Moolhuijzen et al., 2010;Jones, 2012;Zamudio et al., 2015;Altemose et al., 2017;Kent et al., 2017), there is a surprising paucity of such studies in the MHC genomic region of different haplotypes (Rakyan et al., 2004;Jongsma et al., 2019). The genomic PRDM9 binding motif ATCCATG/CATGGAT (Altemose et al., 2017) is spread broadly throughout the MHC class I region and can be found in various fragmented TEs: L1, L2, MER5, HUERS-P3-int (2×), HERVK9-int (2×), HERVK14 (2×), LTR73, MER2, MER8, MER41, MER84, Charlie9, MLT1, MLT2B, Tigger1, ERV3-16A3_I, four of ∼1450 Alu elements and various coding regions including the MICB sequence (present study, data not shown). The 38 SNP XOs at the borders of the haplotype blocks that are described in our study appear to reflect regions of historical recombination sites that currently might be involved with recombination suppression and haplotype maintenance (Figure 1). Some TEs are likely to have provided the recombination sequence motifs that Cullen et al. (2002) considered could be due to microsatellites and particularly to those with long tracts of GT repeats. Many of the SNP XO junctions are within 10 kb of TEs that are commonly repeated throughout the MHC genomic region including LTR16B/ERV3-16A3_I, L1, L2, MLT1, THE1, Charlie, Tigger, MST, and MER5 sequences. The presence of the LTR16B/ERV3-16A3_I sequences at the XO and recombination sites is not surprising since this RE was associated with various genomic rearrangements including XO events of MIC, HERV16, and HLA class I genes, which influenced the structural organisation of the MHC locus during primate evolution (Kulski et al., 1997(Kulski et al., , 1999a(Kulski et al., ,b, 2002b. The LTR16B/ERV3-16A3_I and THE1 elements often are located together in close proximity to the XOs. Moreover, there are ATCCATG/CATGGAT motifs in the MER2 and L2 that flank the ERV3-16A3-int extension of the HCP5 gene (Kulski, 2019) that is within a recombination hotspot located between the MICA and MICB genes (Table 8, Figure 4). On the other hand, the THE1 elements in the MHC genomic region are represented by different subfamilies, and most of them lack the ATCCATG motif and therefore might not interact with PRDM9. Some active or young TE insertions are found in regions close to meiotic recombination sites. Therefore, the TE insertions in the MHC class I region that have yet to reach fixation such as the structurally polymorphic Alu and SVA elements (Table 1) are regional markers of insertion bias, and they are in close proximity to SNP XOs that could be active recombination hotspots (Figure 1). This implies that young TE insertion polymorphisms are relatively recent ancestral recombination hotspots; that is, the younger and more haplospecific TE insertions represent the integration and recombination sites of younger haplotype segments (Campos-Sánchez et al., 2016), whereas the fixed TE insertions, such as SVA-16, -T26, -ER, -EG, -M21, and -M22 (Table 1, Figure 1), reveal the older haplotype recombination XO spots of our primate progenitors (Anzai et al., 2003, Wilming et al., 2013. The presence of SNP XOs near to or within the HLA-B and HLA-C genes (Tables 7, 8) suggests that these genes and/or neighbouring TE's L1, Alu, L2, MLT1, MST, MER21, MER41, LTR9, MER1, and MER4 (Kulski et al., 1997) have had a role in recombination (Cullen et al., 2002, Lam et al., 2013 along with the occasional gene conversions (Madrigal et al., 1993, Adamek et al., 2015. Many of these old elements may now contribute to recombination suppression. Further detailed sequence multiple alignment studies at SNP XOs using HLA-B/HLA-C recombinant haplotypes such as previously described by Nair et al. (2006) in their psoriasis association studies may help to resolve this consideration.
The genomic sequences that we analysed in this study have important implications in medical research and treatment (Lokki and Paakkanen, 2019). Genotyping SNPs of the five major HLA class I and class II loci for cross matching provides most of the haplotype information needed for successful transplantation outcomes. However, ignoring SNPs outside the five loci when comparing the same or similar haplotypes may be problematic and misleading and result in choosing the wrong SNP markers for GWAS of disease or phenotypes, although GWAS need to be correlated to the MHC haplotype and not to the SNP per se. Nevertheless, particular haplospecific segments can be used to identify likely "disease" genes or regions (Lokki and Paakkanen, 2019) in comparative haplomics as previously performed for the psoriasis gene (Nair et al., 2006). Haplotypic or haplospecific markers like the well-defined HLA alleles and dimorphic RE markers may assist in narrowing genomic segments and loci towards MHC disease regions. Also, the haplospecific and haplotypic regions of the non-HLA coding regions such as the ∼39 non-HLA genes between HLA-A and HLA-C are still poorly defined and need to be better characterised as essential components of haplotype regulatory modules. We found some SNP XO junctions in the non-HLA gene region between HLA-E and HLA-C that might have important implications in affecting disease. The systematic comparison of various recombinant haplotypes (Nair et al., 2006) is a promising approach that has been little utilised in GWAS (Alper and Larsen, 2017;Lokki and Paakkanen, 2019). Imputing haplotypes from 3D genome structures of single diploid human cells (Tan et al., 2018) along with tagging regulatory TEs using chromosome conformation capture techniques (Raviram et al., 2018) might be new and productive technical approaches to investigate haplotype regulatory modules.

CONCLUSION
Our study confirms that the genomic sequences of MHC homozygous cell lines are useful for analysing MHC haplotypic landscapes and characterising unique CPS, haplotypic markers, and XO zones without a need to use LD or other probabilistic statistical imputations. Comparative sequence analyses confirmed the identity of 12 haplotypic RE markers and revealed that the HERVK9 indel within the alpha block and a SVA indel within the beta block divided the HLA-A/B/C haplotypes into a series of distinctly interrelated historical lineages, and we identified numerous ancestral segmental XOs between different haplotypes within various REs, lncRNA, MUC22, C6orf15, and HLA-C and/or HLA-B genes extending over 2 Mb from the HLA-A to the MICB loci. It is evident from this study and previous studies that there is a vast MHC haplotype and allelic diversity in the human and that we have captured only a fraction of the complexity. Here, we analysed and characterised the polymorphic REs and the duplicated copies of MHC class I genes within genomic sequences of 95 haplotypes that were sequenced and assembled previously by Norman et al. (2017) in order to broaden the framework of the reference sequences so that they can be further improved and utilised to interrogate the human MHC in greater detail. More attention than usual was given to the polymorphic TE and RE at the SNP-density XOs as potential recombination hotspots. A greater emphasis on the commonality and differences of MHC class I recombinants may set the scene for better functional studies involving the described MHC alleles and haplotypes and their role in immunity, transplantation and overall health and well-being.

DATA AVAILABILITY STATEMENT
All datasets generated for this study are included in the article/Supplementary Material.

ETHICS STATEMENT
Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

AUTHOR CONTRIBUTIONS
JK carried out the analyses of the repeat elements, SNP-density XOs and interpretation of the data, and wrote the manuscript. SS and TS analyzed and interpreted parts of the data and provided the alleles for the non-classical HLA class I genes, HLA pseudogenes, and the MICA and MICB genes. All authors checked the final version of the paper.