Igh Locus Polymorphism May Dictate Topological Chromatin Conformation and V Gene Usage in the Ig Repertoire

Vast repertoires of unique antigen receptors are created in developing B and T lymphocytes. The antigen receptor loci contain many variable (V), diversity (D) and joining (J) gene segments that are arrayed across very large genomic expanses and are joined to form variable-region exons of expressed immunoglobulins and T cell receptors. This process creates the potential for an organism to respond to large numbers of different pathogens. Here, we consider the possibility that genetic polymorphisms with alterations in a vast array of regulatory elements in the immunoglobulin heavy chain (IgH) locus lead to changes in locus topology and impact immune-repertoire formation.


INTRODUCTION
The adaptive immune response has evolved to recognize pathogens using antigen-specific receptors expressed on B and T lymphocytes. Two identical immunoglobulin (Ig) heavy chains (IgH) and two identical light chains (Igk or Igl) constitute the B-cell receptor (BCR). The two lineages of T cells are distinguished by the type of T-cell receptor (TCR) expressed. TCRab is encoded by the Tcra and Tcrb loci, whereas TCRgd is encoded by the Tcrg and Tcrd loci. Developing B and T cells undergo an ordered set of DNA rearrangements termed V(D)J recombination, using RAG recombinase (RAG1/2) and thereby creating a diverse repertoire of antigen receptors (1). The assembly of antigen receptors involves the juxtaposition of variable (V), diversity (D) and joining (J) gene segments into a V gene exon that encodes the antigen binding domain of antigen receptors. However, there are several barriers which must be overcome to enable a suitably diverse Ig repertoire to emerge. Many of the concepts discussed here are applicable to TCR loci.
Formation of a diverse Ig repertoire is critically dependent on proficient pro-B and pre-B cell function since it is in these cells that IgH and IgL chain genes are assembled through V(D)J recombination, respectively. V(D)J recombination requires that antigen receptor genes undergo ordered rearrangement with D H to J H joining preceding V H to D H J H recombination ( Figure 1A). There are~100 functional Igh locus V H gene exons that must recombine with one rearranged DJ H element, that is assembled from one of 8-12 D H and one of 4 J H gene segments in C57BL/6 mice ( Figure 1) (1). The introduction of RAG dependent DNA breaks at recombination signal sequences (RSSs) adjacent to each rearranging gene segment initiates Igh gene assembly (1). RAG1/2 loads at the recombination center (RC) situated in the region spanning Eµ and the most 3' D H segment, DQ52 (2). RAG1/2 has been proposed to track from the RC to locate a suitable RSS for synapsis and DNA cleavage (3).

THE V H GENE USAGE CONUNDRUM
The number of V, D, and J gene segments and the availability of those segments for rearrangement determines the composition and complexity of antigen receptor repertoires. The Igh locus is quite large in linear genomic distance, extending 2.9 Mb and containing~100 functional V H gene segments. V gene usage is only quasi-random in the pre-selected Igh repertoire as V genes rearrange at very different intrinsic frequencies (4)(5)(6)(7)(8)(9)(10)(11). In studies of V germline transcript levels, transcription factor (TF) binding, RSS quality, and the distribution of a variety of epigenetic marks each make a contribution, but no one variable, or combination of variables, fully accounts for unequal V gene usage (4-6, 8, 12). Although the V gene accessibility hypothesis (13)(14)(15) offered an attractive model to explain V gene usage, recent studies have made plain that V gene accessibility is a necessary but insufficient condition for participation in V->DJ rearrangement (16,17). Therefore, the factors underpinning unequal V gene rearrangement frequencies remain to be determined.
It is important to note that the potential contribution of Ig haplotype diversity in these processes has been underappreciated (18). Despite the fact that the Ig loci of natural outbred organisms are known to be extremely diverse, much of our understanding of the mechanisms dictating V(D)J recombination have come from studies of inbred models. However, even in inbred models, it has been demonstrated that Ig genetic diversity is more extensive than initially appreciated, in many cases mirroring (or even exceeding) what has been observed in human populations and other more outbred organisms (19)(20)(21)(22)(23)(24)(25)(26)(27)(28). In mouse, for example, a comparison of germline V H sequences between C57BL/6 and BALB/c revealed surprisingly little overlap in the germline repertoires of these two strains. Of the 99 C57BL/6 and 164 BALB/c V H alleles compared, only 5 were found to be identical (23), likely the result of both allelic sequence divergence and structural variation associated with differences in V gene content between the two strains. Extended comparisons of V H germline alleles across additional inbred wild-derived strains, thought to represent diverse mouse sub-species origins, revealed even greater diversity, suggesting that Ig germline variation across commonly used mouse inbred strains is likely to be vast (24). Similar inter-strain diversity has been observed within the mouse D H gene loci as well (19,20). These and other data clearly demonstrate the presence of extensive polymorphism within the Igh locus, and highlight the potential influence of both sequence diversity and Ig gene segment number as significant contributors to Ig repertoire diversity.
There is a growing body of evidence supporting the potential impact of genetic polymorphism on V(D)J recombination. First, multiple studies of the naïve repertoire in human monozygotic

IGH LOCUS ARCHITECTURE AND CONTRACTION ARE IMPLICATED IN REPERTOIRE DIVERSITY
It is essential that all V H genes achieve spatial proximity with the RC located at the Eµ-D H J H domain to produce a fully representative Ig repertoire ( Figure 1B) (36,37). TADs are zones in which intraregional interactions are more frequent than those traversing the boundaries between TADs (37-39). TAD organization reflects the functional partition of chromatin regions by transcriptional activity (37,39), histone modifications (37)(38)(39)(40), and replication timing (41) implying a link between function and genome structure. The Igh locus is contained within a 2.9-Mb TAD in pro-B cells (42). 5C studies demonstrate that the murine Igh TAD is subdivided into two highly structured sub-TADs A and C, corresponding to the D H -proximal and D H -distal V H gene families, respectively, while the less structured sub-TAD B includes the intermediate V H gene segments (42). Correspondingly, live pro-B cell imaging indicates that Igh locus topology is organized as a series of three large, intermingled chromatin loops anchored close to the DJ H RC, that provide comparable access between distal V H gene segments with rearranged 3' D H J H (43). Hence, Igh locus topology is best described as a series of three large chromatin loops that are anchored at sub-TAD boundaries.

THE BUILDING BLOCKS OF TAD ARCHITECTURE: LOOP EXTRUSION, CTCF AND ENHANCER-PROMOTER CONTACTS
TAD boundaries are frequently marked by CBEs in a convergent orientation (40,44) which participate in loop extrusion (45,46). The loop extrusion model posits that chromatin loops are formed when cohesin is loaded onto and reels in DNA in an ATP-dependent process (45)(46)(47)(48). Architectural "stripes", visualized within Hi-C maps (49) may form when one subunit cohesin stalls near a strong CTCF loop anchor while the second one slides along the chromatin to form multiple interactions. The extrusion model explains how enhancers can processively track along arrays of promoters separated by long genomic intervals (45,46,50) and has been proposed as the mechanism that enforces deletional CSR (51,52) and creates Igh locus contraction during V(D)J recombination (52). However, while sharp TAD boundaries are lost upon CTCF inactivation, compartment organization as well as TAD-like globular chromatin domain structures are preserved in single cell experiments (53) and the impact of CTCF inactivation on the transcriptome is small (54). Cohesin has been shown to promote clustering of enhancer elements in 3D spatial hubs (55). Intra-TAD contacts between regulatory elements facilitated by cohesin loop extrusion can be stabilized by other mechanisms such as homo-dimerization of the structural regulator YY1 (56). The inter-relationship of loop extrusion and a putative promoterenhancer interactome in the Igh locus remains largely undefined.

IS IGH LOCUS TOPOLOGY CONFIGURED BY A PROMOTER-ENHANCER INTERACTOME?
New unpublished work from the Kenter group has identified highly transcribed V H gene promoters and a series of novel enhancers (NEs) that are pro-B cell specific, are involved in anchoring Igh subTAD loops and influence Igh repertoire formation in pro-B cells through formation of a promoterpromoter-enhancer hub. This is an interesting proposition as hundreds of V H exon promoters and newly recognized enhancers could participate in an intricate contact interactome that spatially organizes V H segments within the previously defined large chromatin loops and defines access probability to the RC and DJ segments. The presence of intra-TAD promoterpromoter-enhancer interactomes has been documented in several genetic loci and in different developmental and differentiation systems. Here we consider evidence that enhancers and promoters initiate specific interactions in nuclear space and propose that this interactome influences repertoire diversity.
Multiple lines of evidence support the existence of enhancer interactomes in different genomic contexts (57). While some studies link chromatin contacts between regulatory elements to transcriptional activity (58), in other examples these contacts precede gene activation (59). Most prominently, enhancers in olfactory sensory neurons form a large inter-chromosomal hub (60). In other systems, super-enhancers, clustered arrays of enhancer elements in close spatial proximity that can span several kilobases and are linked to the regulation of cellidentity genes with high transcriptional activity (61,62), are highly involved in the formation of specific chromatin contacts. Genome architecture mapping identified abundant three-way contacts between super-enhancers and highly transcribed chromatin regions beyond the pairwise interactions detectable by 3C techniques (58). Similarly, the enhancer elements within a super-enhancer and target promoters can cluster spatially to form a hub structure with simultaneous multi-way interactions as demonstrated by multi-contact 4C for the locus control region of the beta-globin locus (63).

INTERGENIC IGH POLYMORPHISM MAY ALTER ENHANCER AND PROMOTER FUNCTION
When considering the Igh locus it is important to note that genetic differences between inbred strains extend beyond coding variation into intergenic regions. To date, the mouse IgH locus has only been fully characterized in C57BL/6, which, as we noted above, has served as the primary model for characterizing the functional regions and mechanisms that dictate V(D)J recombination. However, a partial assembly, including the proximal region of Igh in the 129S1/SvImJ mouse strain was published in 2007 (20). A comparison of these haplotypes revealed evidence of both local sequence conservation as well as divergence, including examples of structural variation and single nucleotide differences (20) (Figure 2). For example, several complex regions (Figure 2A) represent insertions of Ig genes in the 129S1/SvImJ strain that are absent in C57BL/6. In addition, the degree of sequence identity between these two strains varies considerably across the locus, with sequence identities ranging between 88% and 98%. Even in regions characterized by high degrees of homology between 129S1/ SvImJ and C57BL/6, SNPs occur at relatively high densities in both coding and intergenic regions ( Figure 2B). The impact of such inter-strain haplotype diversity on V(D)J recombination has not been investigated.
Elsewhere in the genome, deletions or mutations in enhancer sequences can lead to aberrant gene expression and disease phenotypes (66). Genome-wide association studies show that the vast majority of sequence variants associated with common diseases and traits are located in such non-coding parts of the genome (67). In line with long-range enhancer interactions discussed above, misregulation of target gene expression due to variation in enhancers can occur tens of kilobases away in linear sequence space (68).
Promoters and enhancers are characterized by a high density of sometimes overlapping TF binding sites. Active enhancers are established through the recruitment of TFs to those binding sites which opens chromatin (69). Mechanistically, TFs can mediate promoter-enhancer contacts in a variety of ways directly or indirectly through binding of additional factors and structural proteins (reviewed in (57)). For example, in mouse embryonic stem cells, deletion of KLF4 binding sites or KLF4 ablation results in reduced contact frequency in enhancer hubs and diminished expression of multiple target genes (70).
Single nucleotide changes in regulatory sequences can impact the affinity for TF binding (71). It can tip the balance in sites where different factors compete for the same space (72,73) or regulatory regions have multiple functions (74). Since TF binding  is essential for establishing an active enhancer, SNPs detected in Igh intergenic regulatory regions could potentially ensue in a cascade of downstream effects. They can alter TF binding in enhancers and promoters, impact long-range enhancer-promoter interactions, and thereby change the composition of promoter-promoter-enhancer interactomes ( Figure 2C). In this context it is significant that variation of intergenic CBE sequence can have a profound effect on V H gene usage. V H gene access to the RC was recently shown to be dependent on the quality of the flanking CTCF binding element (CBE) and related ability of the gene to loop with IGCR1 (16,17). V H 81X is the second gene in the locus and is most prominently used. V H 5-1 is the most D H proximal V H gene in the locus and is very rarely used even though its promoter and recombination signal sequence are intact and similar to that found for V H 81X. However, the quality of the flanking CBE for V H 5-1 is poor and when replaced with a functional motif directs looping with IGCR1 and high frequency recombination. Thus, the quality of CBEs within the Igh locus highlights the importance of the integrity of similar regulatory sequences which can be altered by Igh polymorphisms.

CONCLUSIONS
Genetic differences, such as those observed between 129S1/SvImJ and C57BL/6 ( Figures 2A, B), when considered alongside observations that have been made for regulatory elements elsewhere in the genome, raise important questions about the potential for Ig genetic diversity to impact V(D)J recombination. First, there are numerous examples for which germline V H variants have been shown to contribute to antigen specificity (75)(76)(77)(78)(79) and associate with disease and clinical phenotypes in the context of infection, inflammation, and vaccination (31,32,(80)(81)(82)(83). Second, both large structural variants and single nucleotide polymorphisms could modify key regulatory elements, such as CTCF sites and in promoters and enhancers, either through the disruption of these elements (e.g. sequence deletions or loss-offunctions SNPs) or the creation of novel elements (e.g. through sequence duplication or gain-of-function SNPs). In addition, structural variants could also be expected to change the spatial organization of interacting regulatory elements by increasing or decreasing the genomic distance between particular elements, or by changing their orientation. These modifications would in turn be expected to impact promoter/enhancer interactomes, lead to changes in the epigenetic landscape, and influence the overall locus architecture and TAD structure, and ultimately affect the selection of particular V H , D H , and J H segments into the repertoire. We expect the discovery of such examples to continue as the inclusion of genetic variation in the study of repertoire diversity and dynamics becomes more commonplace.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. Sequence data and related gene annotations for the Igh locus in C57BL/6 were extracted from the mm10 genome reference assembly, available at https://genome.ucsc.edu. The sequence of the proximal Igh region of 129S1/SvlmJ was previously published by Retter et al. (20) available on GenBank under the accession number AJ851868.3.

AUTHOR CONTRIBUTIONS
Conceptualization, and Writing: AK, CW, and J-HS. Funding Acquisition, AK. All authors contributed to the article and approved the submitted version.

FUNDING
This work was supported by grants to AK from the NIH (RO1AI121286, R21AI151892).