Shared Mechanisms for Mutually Exclusive Expression and Antigenic Variation by Protozoan Parasites

Cellular decision-making at the level of gene expression is a key process in the development and evolution of every organism. Variations in gene expression can lead to phenotypic diversity and the development of subpopulations with adaptive advantages. A prime example is the mutually exclusive activation of a single gene from within a multicopy gene family. In mammals, this ranges from the activation of one of the two immunoglobulin (Ig) alleles to the choice in olfactory sensory neurons of a single odorant receptor (OR) gene from a family of more than 1,000. Similarly, in parasites like Trypanosoma brucei, Giardia lamblia or Plasmodium falciparum, the process of antigenic variation required to escape recognition by the host immune system involves the monoallelic expression of vsg, vsp or var genes, respectively. Despite the importance of this process, understanding how this choice is made remains an enigma. The development of powerful techniques such as single cell RNA-seq and Hi-C has provided new insights into the mechanisms these different systems employ to achieve monoallelic gene expression. Studies utilizing these techniques have shown how the complex interplay between nuclear architecture, physical interactions between chromosomes and different chromatin states lead to single allele expression. Additionally, in several instances it has been observed that high-level expression of a single gene is preceded by a transient state where multiple genes are expressed at a low level. In this review, we will describe and compare the different strategies that organisms have evolved to choose one gene from within a large family and how parasites employ this strategy to ensure survival within their hosts.


INTRODUCTION
Pathogenic organisms, including eukaryotic parasites, have evolved numerous mechanisms to ensure their survival in the different, often hostile environments they encounter as they transition through their complex life cycles. These diverse environments often include infection of multiple host species, each with different stresses that must be overcome for successful completion of the cycle. In particular, infected hosts often mount a vigorous immune response that can drastically reduce parasite numbers or eliminate the infection. One especially important mechanism for prolonged survival inside their host is the ability of parasites to respond to changing environmental conditions through alterations in gene expression. Several of the most dramatic responses involve the mutually exclusive expression of individual members of large, multicopy gene families. This process promotes clonal variability and enables populations of infecting parasites to rapidly adapt to the changing conditions that they encounter both while maintaining a chronic infection or while transitioning from one host to another. The ability to undergo clonal changes in gene expression is key for several processes that are vital for parasite survival, for example nutrient uptake, colonization of different tissues, host cell invasion and, perhaps most dramatically, immune evasion (Cortes and Deitsch, 2017). Mutually exclusive expression of genes encoding variable surface antigens is indeed the primary mechanism underlying the phenomenon of antigenic variation in parasites like Trypanosoma brucei, Plasmodium falciparum, and Giardia lamblia (Duraisingh and Horn, 2016). It enables them to periodically switch their antigenic signature and thereby escape recognition by the host immune system thus maintaining prolonged, chronic infections. Despite the relevance of this mechanism for parasite survival, little is understood regarding how this is achieved at a molecular level.
While mutually exclusive expression within large, multicopy gene families has been a high-profile subject of research within the parasitology community for many years, it is worth noting that this phenomenon is not a unique feature of pathogens, but rather a process conserved throughout the evolution of the eukaryotic lineage (Dalgaard and Vengrova, 2004;Goldmit and Bergman, 2004). Ranging from the simple choice between two alleles to the activation of a single gene within a larger family that can include thousands of copies, many of the basic mechanisms by which a single gene is chosen and expressed appear to be shared between even the most distant evolutionarily related organisms. For example, both Trypanosoma brucei and Giardia lamblia are referred to as early branching eukaryotes and are thought to be amongst the most evolutionarily divergent eukaryotes in existence today (Morrison et al., 2007;Lukes et al., 2014). Nonetheless, recent work suggests they share several molecular mechanisms for regulating multicopy gene expression with higher eukaryotes, including humans. In this review we describe how well-studied model organisms achieve mutually exclusive gene expression and explore analogies and differences with parasites.

EXAMPLES OF MUTUALLY EXCLUSIVE EXPRESSION IN MODEL EUKARYOTES
Detailed molecular research into the mechanisms regulating mutually exclusive expression have often focused on higher eukaryotic organisms, with the yeast and mammalian model systems providing the majority of the conceptual insights. The genetic systems that are most relevant include 1) the simple selection of one of two alleles for active transcription (mating type switching, immunoglobulin gene recombination and expression), 2) the activation or silencing of entire chromosomes (dosage compensation) and 3) single gene expression within large multicopy gene families (olfactory receptor gene expression). All these systems have proven to be rich sources of information that have influenced our understanding of similar gene expression systems in parasites.

Yeast Mating-Type Switching
Both the fission yeast Schizosaccharomyces pombe and the budding yeast Saccharomyces cerevisiae are free-living, singlecelled eukaryotes that can easily be grown in the lab and genetically manipulated. Under specific conditions, haploid cells of different mating types can fuse, resulting in diploid cells which can then undergo meiosis, sporulate and produce haploid cells again. While successful mating requires two cells of different mating types, individual cells can switch their mating type, between P and M cells for S. pombe or between a and α in S. cerevisiae, thus facilitating efficient creation of hybrids and exchange of genetic material. In these examples, the expressed mating type is determined by a transcriptional choice between two different alleles, thus representing a simple binary system of mutually exclusive expression.
The mating-type switch is made possible through three genecassettes located on the same chromosome. In both S. pombe and S. cerevisiae, one cassette is constitutively transcriptionally active whereas the other two are silent and serve as donors for transposition into the active site ( Figure 1). The mating type of the cell is determined by which of the silent cassettes occupies the transcriptionally active site, and mating type switching results from recombination using the alternative donor. Thus, through a recombinational mechanism, cells can switch between mating types. In S. pombe, the expressed cassette is referred to as mat1 and can contain information copied from the two silent cassettes mat2-P and mat3-M ( Figure 1A), while in S. cerevisiae, the content of the transcriptionally active MAT locus can be replaced with information from the silent loci HMLα and HMRa ( Figure 1B) (Kelly et al., 1988;Haber, 1998). Additionally, the choice of which donor cassette is used for recombination is not random. For example, in S. pombe, the mat2P cassette gets chosen preferentially for recombination in M cells, whereas the mat2-M cassette is preferentially chosen in P cells, thereby increasing the overall probability of mating. A similar directionality is also observed in S. cerevisiae Klar, 1990). These two single-celled organisms therefore provide an elegant model for mutually exclusive expression that couples transcriptional activation and silencing with genetic recombination.

V(D)J Recombination of Immunoglobulin Genes
The mammalian immune system has evolved to recognize and destroy invading organisms through the continuous production of a vast repertoire of antigen receptors, called immunoglobulins, that are thought to be able to recognize virtually any possible antigen conformation. For this adaptive immune response to be effective it is necessary for the antigen receptor repertoire to be extremely diverse and that each individual receptor expressing cell only express a single immunoglobulin. Similar to mating type switching in yeast, this is achieved through a mechanism that incorporates DNA recombination and mutually exclusive expression of the genes encoding antigen receptors.
Immunoglobulin diversification is generated through a mechanism called V(D)J recombination that ensures that every mature antigen receptor expressing cell expresses a different immunoglobulin. Each immunoglobulin is formed by two identical heavy chains and two identical light chains with each chain consisting of constant (C) and variable (V) regions. The process of creating a unique immunoglobulin begins from a germline containing array of highly similar gene fragments which can recombine to form single, functional open reading frames that encode a unique antigen receptor ( Figure 2). The heavy chain is formed through the recombination of sets of Variable (V), Diversity (D) and Joining (J) genes, whereas the light chain is formed by rearranged V and J genes (Weigert et al., 1978;Early et al., 1980). In mice, the immunoglobulin locus contains 195 V segments, 10 D segments and 4 J segments arranged in tandem within a chromosomal region~3 MB long. Epigenetic mechanisms control the order and the site of recombination. Specifically, recombination begins with one heavy chain allele, chosen randomly. If the recombination events result in the creation of a functional heavy chain, only this allele is actively transcribed and the second allele is permanently silenced. In contrast, if the recombination events do not yield a functional heavy chain, the second allele is accessed and recombined in an attempt to generate a functional protein. Only when a functional heavy chain has been generated does a similar recombination occur at the light chain alleles, with a similar feedback mechanism ensuring that only a recombined allele that encodes a functional receptor gets expressed in mature immune cells (see (Jung and Alt, 2004) for a review of this process). Similar to mating type loci switching in yeast, mutually exclusive expression (also called allelic exclusion in this system) is linked to DNA recombination, however here the recombination events have the additional function of diversifying the sequence of the resulting protein, thus serving as a continuous source of variability in antigen recognition and maintaining the enormous breadth of the repertoire of antigen binding receptors.

Inactivation of X-Chromosome
X-inactivation is an example of mutually exclusive gene expression on a chromosome-wide scale. The sex of mammals is determined by the XX/XY sex-determination system, with FIGURE 2 | The ordered recombination of the mouse immunoglobulin heavy chain. The germline possesses 195 V segments (pink), 10 D segments (blue) and 4 J segments (green) in a locus of~3 Mb. In a first step, J and D segments recombine, followed by a second recombination with V segments. differentiation into the male sex determined by genes present on the Y chromosome (Graves, 1995;Lahn and Page, 1997). In females, the presence of two X chromosomes would result in a potentially lethal dose of expression of X-linked genes if the alleles present on both chromosomes were equally transcribed. To ensure proper dosage compensation, one of the X chromosomes is transcriptionally silenced (Lyon, 1961). This silenced X chromosome is condensed into heterochromatin and forms a compact structure within the nucleus called a Barr body (Barr and Bertram, 1949;Boumil and Lee, 2001). This inactivation can be either imprinted or random, depending on the species. Imprinted X-inactivation preferentially silences the paternal X chromosome while in random X-inactivation systems there is an equal chance of paternal or maternal X-inactivation, a cell fate choice that occurs early during embryonic development. Once inactivation has occurred, all resulting cells throughout the lineage will maintain this transcriptional state, resulting in a mosaic pattern of expression in the resulting organism (Graves, 2006;Augui et al., 2011). Random X-inactivation is initiated by competition between transcriptional promoters within a specific locus on each X chromosome called the X-inactivation center (Xic) (Figure 3) (Lee et al., 1996). This locus is responsible for the production of several long noncoding RNAs (lncRNAs), the most prominent of which are Xist and Tsix (Lee et al., 1999). Stable expression of Xist only occurs on the chromosome destined to be silenced, where it is incorporated into the structure of the chromatin along the full length of the chromosome (Plath et al., 2002;Engreitz et al., 2013). The presence of the Xist RNA is key to initiating the assembly of transcriptionally silent heterochromatin and the partitioning of the silent X-chromosome into the Barr body (Avner and Heard, 2001;Augui et al., 2011). The role of the Xist transcripts represents one of the first examples of how the production of lncRNAs is often the initiating event for establishing mutually exclusive expression.

Olfactory Receptor Gene Expression
The sense of smell, specifically the ability to ascertain environmental odorants, is facilitated by olfactory receptors (OR). In vertebrates, odorants are detected by a collection of G protein-coupled receptors displayed along the cilia and synapses of the olfactory sensory neurons (OSNs) making up the OR family (Buck and Axel, 1991;Monahan and Lomvardas, 2015). There is a wide range of variability in the number of different ORs encoded within the genomes of various vertebrates, with humans possessing~400 OR genes and mice~1,000 (Buck and Axel, 1991;Niimura and Nei, 2007). Mutually exclusive expression, with each OSN expressing a signal OR, is essential to the functionality of the olfactory system (Chess et al., 1994). Each OR is capable of binding a wide variety of odorants at different affinities resulting in a specific odorant's detection by a unique combination of ORs. The aggregate of the resulting signals from the excited neurons is then processed in the olfactory bulb and cortex, thereby providing crucial sensing of the odorants present in the surrounding environment (Malnic et al., 1999;Monahan and Lomvardas, 2015).
Members of the OR gene family are found throughout the genome, organized into clusters of genes on most chromosomes ( Figures 4A,B) (Glusman et al., 2001;Niimura and Nei, 2003). During OSN differentiation and maturation, the chromosomes undergo a systematic nuclear reorganization that results in the physical interaction of the OR gene clusters into phase-separated regions of the nucleus ( Figure 4C). This organization is anchored by specific genetic elements at each cluster of OR genes called "Greek Islands." These elements appear to bring the OR genes together and thus play a pivotal role in the nuclear reorganization that enables mutually exclusive expression within this gene family (Lomvardas et al., 2006;Bashkirova and Lomvardas, 2019;Monahan et al., 2019;Pourmorady and Lomvardas, 2021). This nuclear reorganization occurs in a stepwise fashion during OSN differentiation, initially leading to low-level expression of many OR genes prior to high-level, exclusive expression of a single gene in fully differentiated cells (Hanchate et al., 2015;Tan et al., 2015). These discoveries have proven to be important concepts that likely apply to many other multicopy gene families in distantly related organisms, including parasites.

MUTUALLY EXCLUSIVE EXPRESSION IN PARASITES: ANTIGENIC VARIATION
To maintain an infection within a mammalian host, pathogens must be able to evade the immune response, including the production of highly specific antibodies that recognize surface antigens unique to the pathogen. Despite vast evolutionary distances, many eukaryotic parasites have evolved very similar strategies, specifically the development of large, multicopy gene families in which each gene encodes a protein of similar function but that is antigenically distinct. By expressing a single member at a time and systematically cycling through the family over the course of an infection, parasites can perpetuate chronic infections of remarkable length. Mechanisms that establish and maintain mutually exclusive expression are imperative for the success of this type of immune avoidance mechanism. The gene regulatory processes underlying antigenic variation are also used to regulate other biological processes that require clonally variant gene expression, most notably alternative invasion pathways or altered nutrient uptake in Plasmodium. However here we will focus on the large gene families involved in antigenic variation in African trypanosomiasis, human malaria and giardiasis ( Figure 5).

Trypanosoma brucei
Trypanosoma brucei is a unicellular parasite responsible for sleeping sickness in humans and nagana in cattle. Its life cycle alternates between tsetse flies and mammalian hosts, where they live extracellularly in the bloodstream and other tissues. When infecting mammals, the parasite's surface coat consists of a single antigen called the variant surface glycoprotein (VSG), which forms a thick layer that effectively obscures other surface molecules from recognition by the host immune system. Key to long-term infection is antigenic variation of this coat: to escape clearance by host antibodies, bloodstream form trypanosomes periodically switch the VSG that they express. This is possible thanks to an abundant genomic repertoire accounting for more than 1,000 vsg genes or gene fragments (Horn, 2014). The mRNA encoding the active VSG is transcribed from a specific subtelomeric locus called an Expression Site (ES). The T. brucei genome contains around 20 ES, but only one is expressed at a time, ensuring that only one VSG is displayed on the surface of every parasite. Thus, mutually exclusive expression in this organism refers to expression of a single vsg ES (Cross, 1975;Kooter et al., 1987). VSG switching can either be in situ, where one ES promoter is silenced while another is activated, or by a recombination event that copies a silent gene (or portion of a silent gene) into the active ES (Li, 2015) ( Figures 6A-D).
African trypanosomes are evolutionarily very distant from the higher eukaryotes in which most molecular mechanisms controlling transcription were initially defined and consequently display many unusual characteristics. For example, unlike standard models of eukaryotic transcription in which each protein coding region is contained within an individual gene, each ES is a polycistronic unit of 45-60 kb containing multiple ES-associated genes (ESAGs) and a promoter typically located~50 kb upstream of the vsg coding region (Glover et al., 2013). Interestingly, polycistronic organization of genes is not unique to the ES, but rather is a genome-wide feature that characterizes this ancient lineage of parasites (Clayton, 2019). Another peculiarity of vsg transcription is that it is carried out by RNA Polymerase I, a unique example of an RNA Pol I transcribed mRNA among eukaryotes (Gunzl et al., 2003). Nonetheless, the basic strategy of employing mutually exclusive expression within a large gene family is similar to what is observed in higher eukaryotes.
Interestingly, vsg transcription starts before parasites enter the mammalian host in the last phase of their development within the tsetse fly, called the metacyclic stage. At this point of their lifecycle, trypanosomes express the metacyclic form of the variant surface glycoprotein (mVSG), as a type of preadaptation to their entry into the mammalian host. Similar to mutually exclusive expression of a single vsg ES, only one of approximately eight mVSG encoding genes is transcribed in each metacyclic parasite, thereby ensuring heterogeneity in the population and increasing the probability of a successful infection (Barry et al., 1998;Ramey-Butler et al., 2015;Muller et al., 2018). While similarly displaying mutually exclusive expression, unlike the vsg ESs of the bloodstream form of the parasite, the mVSG genes are the only example in trypanosomes of monocistronic transcription of a protein coding gene (Ginger et al., 2002).

Plasmodium falciparum
Plasmodium falciparum is the parasite responsible for the vast majority of cases of malaria around the world and it is transmitted between people by Anopheles mosquitoes. The P. falciparum genome encodes several families of clonally variant genes involved in numerous processes including antigenic variation, erythrocyte invasion and erythrocyte permeability (Cortes and Deitsch, 2017). Other malaria species, including the model parasites that infect rodents, also have clonally variant gene families that display variable expression Brugat et al., 2017;Lin et al., 2018). However, the lion's share of research into transcriptional regulation and mutually exclusive expression has investigated the var gene family. Therefore, for the purpose of this review we will focus our attention on var genes. Upon entry into the human host, after initial replication inside hepatocytes the parasites are released into the bloodstream where they invade and replicate within erythrocytes (Cowman et al., 2016). Once inside the erythrocytes, the parasites make extensive modifications to the host cell, including alterations to the cytoskeleton and insertion into the erythrocyte membrane of a protein called Plasmodium falciparum Erythrocyte Membrane Protein 1 (PfEMP1) (Boddey and Cowman, 2013). This protein is exposed on the erythrocyte surface where it binds to ligands on the vascular endothelium, enabling the infected cells to cytoadhere within capillaries and sequester away from the peripheral circulation. This prevents the infected cells from being cleared by the spleen. However, by exposing PfEMP1 on the erythrocyte surface, the parasite is now vulnerable to the antibody response of its host. To escape recognition, P. falciparum parasites systematically change the expressed PfEMP1, thereby undergoing antigenic variation in a way analogous to T. brucei and G. lamblia. PfEMP1 is encoded by a multicopy family of genes called var (Scherf et al., 1998;Deitsch and Dzikowski, 2017). Unlike T. brucei, the repertoire of var genes is relatively small, limited 40-90 genes per genome, depending on the isolate (Otto et al., 2018b). Similar to vsg expression, var gene expression is mutually exclusive and regulated at the level of transcription initiation.
All var genes have a common bi-exonic structure, with the first exon encoding the extracellular portion of PfEMP1 and the second coding for the cytoplasmic portion, with a similar sequence among all var genes ( Figure 7). Each gene possesses two promoters: one located approximately 1 kb upstream of the coding region and subject to mutually exclusive activation, and a second within the intronic region. The second promoter is bi-directional and drives the expression of sense and anti-sense lncRNAs (Figure 7) (Calderwood et al., 2003;Epp et al., 2009). The majority of the var gene family is subject to frequent recombination, resulting in the gene family displaying tremendous sequence diversity when the repertoire of var genes from different isolates are compared. However, two genes, referred to as var1csa and var2csa, appear to be conserved in all P. falciparum isolates from around the world and are also found in the related Plasmodium species that infect chimpanzees and gorillas (Otto et al., 2018b;Gross et al., 2021). It has been suggested that these genes could serve an additional function as conserved regulatory elements for coordinating mutually exclusive expression (Mok et al., 2008;Ukaegbu et al., 2015).

Giardia lamblia
Similar to T. brucei, Giardia lamblia is an early-branching eukaryotic parasite that is evolutionarily very distant from higher eukaryotes. It infects the intestines of its vertebrate FIGURE 6 | Different ways in which VSG switching can be achieved in T. brucei in situ switching (A) where one promoter is silenced and another one is activated relies on epigenetic changes and is the only mechanism that does not involve recombination. Telomere exchange (B) or gene conversion (C,D) require DNA recombination and gene rearrangement, either at the telomeres level or within the polycistronic unit. If gene conversion occurs, the active gene is lost, and a new gene is copied into the active ES. Red arrows represent active ES promoters while black arrows represent silent ES promoters.
Frontiers in Cell and Developmental Biology | www.frontiersin.org March 2022 | Volume 10 | Article 852239 6 hosts and is one of the major causes of intestinal diseases and diarrhoea throughout the world. It is binucleated, with each diploid nucleus possessing a compact genome of around 12 Mb. Giardia's life cycle alternates between two forms: a motile trophozoite which colonizes the upper intestine and an infective cyst form that enables infection of new hosts through oral-faecal transmission (Adam, 2001). The trophozoite form is coated with a variant-specific protein, VSP, which, like VSG in T. brucei, serves as the dominant antigen recognized by the host immune system, resulting in a strong antibody response. To avoid antibody mediated clearance, these parasites can switch the expressed form VSP through mutually exclusive expression from a repertoire of around 200 vsp genes arranged as individual genes or in tandem arrays throughout the parasite's genome ( Figure 8). This enables them to display antigenic variation in a way similar to P. falciparum or T. brucei (Nash and Aggarwal, 1986;Nash, 2002). However, the binucleated nature of Giardia poses unique problems for mutually exclusive expression. Since both nuclei are transcriptionally active and functional, it is important that vsp expression is coordinated between the two nuclei so that only a single VSP is ultimately expressed on the surface of the parasite. Giardia appears to accomplish this feat by using an RNAi-like mechanism within the cytoplasm to degrade nearly all vsp transcripts from both nuclei. Only mRNA from a single vsp gene escapes degradation, thus leading to expression of a single VSP on the parasite's surface. What enables transcripts from a single vsp gene to avoid destruction is not understood, although it appears to depend on orthologues of the RNAi machinery (Prucca and Lujan, 2009;Gargantini et al., 2016). Thus, in this system, mutually exclusive expression is rooted in mRNA stability rather than transcriptional activation and silencing, although the ultimate result of antigenic variation is the same.

SHARED MECHANISMS FOR MUTUALLY EXCLUSIVE EXPRESSION
Ranging from the model yeasts and mammalian systems to the protozoan parasites, these organisms represent an exceptionally  broad portion of the eukaryotic evolutionary tree. Nonetheless, many of the molecular mechanisms that underpin mutually exclusive expression are shared, suggesting that they are rooted in the origins of the eukaryotic lineage. By using a comparative approach, it is possible to gain insights into the molecular components and key players that regulate these important processes and to see how each organism employs these tools to solve specific evolutionary problems.

Master Genetic Elements
One characteristic shared by all examples of mutually exclusive gene expression is the presence of non-coding master genetic elements that influence chromatin assembly and transcription at the loci. These include DNA sequence elements, transcriptional enhancers and non-coding RNAs (explored in more detail in section 4.2). For example, in S. pombe, two enhancers are responsible for directionality of the donor choice during mating type switching. The enhancers SRE2 and SRE3 are located next to the two silent donors, mat2-P and mat3-M, respectively, and removal of one skews the choice in the direction of the opposite donor ( Figure 1A). These enhancers guide the interaction of the Swi2-Swi5 complex with the local chromatin, which in turn can recruit Rad51 and guide recombination of the locus (Jia et al., 2004;Jakociunas et al., 2013). A similar phenomenon has been described for S. cerevisiae involving an element named the Recombination Enhancer (RE). The RE is located next to the HML locus and directs recombination towards that locus ( Figure 1B). When deleted, selection of the HML locus for recombination is dramatically reduced (Wu and Haber, 1996). Unlike S. pombe, the two donors in S. cerevisiae are located at the two opposite ends of the chromosome, so enhancer activation and silencing likely also involve chromatin rearrangement to bring the loci together for recombination.
Similarly, the choice of which olfactory receptor gene is expressed involves specific enhancer elements located adjacent to clusters of olfactory receptor genes. The first of these to be identified, termed the H element ( Figure 4A), is a 2-kb homology region conserved between mouse and human sequences that was found to be essential for cis activation of genes on transgenic yeast artificial chromosomes (Serizawa et al., 2003). Using Chromosome Conformation Capture (3C) and DNA/RNA fluorescence in situ hybridization (FISH), the H element was found to interact with OR gene promoters from several different chromosomes and to colocalize specifically with the active OR allele, suggesting it could serve as a singular trans acting element essential for monoallelic expression (Lomvardas et al., 2006). However, deletion of the H element only affected expression of the OR genes found in the adjacent cluster, and there was no global effect on olfactory receptor expression (Fuss et al., 2007). This prompted a search for additional enhancer elements, which led to the discovery of a total of 63 OR enhancers (also called "Greek Islands"), each found adjacent to OR gene clusters ( Figure 4B) (Markenscoff-Papadimitriou et al., 2014;Pourmorady and Lomvardas, 2021). These enhancers are proposed to lead to the formation of a subnuclear olfactory receptor compartment through the actions of the chromatin binding proteins Lhx2, LDB1 and EBF, where they form a single super-enhancer hub that associates specifically with the single active olfactory gene ( Figure 4C) (Monahan et al., 2019). While it has been demonstrated that the formation of this enhancer hub is essential for OR gene transcription, its role in the selection of the active OR remains unclear.
It has been shown that in T. brucei and P. falciparum the genes involved in antigenic variation also cluster together in specific subnuclear locations, however no specific enhancers or genomic elements have yet been identified. Nevertheless, recent advances in genome-wide analysis are beginning to provide insights into potential elements that could play a role in this process. In T. brucei, the unique role of RNA Pol I in transcribing the active vsg gene sets it apart from other mRNA encoding genes in the parasite's genome. However, it has been demonstrated that the vsg promoter sequence can be replaced with a rRNA promoter and still be properly regulated and transcribed in a mutually exclusion fashion (Rudenko et al., 1995). This suggests that the mechanisms of recognition for transcriptional control are more dependent on chromosomal context and positioning near the telomere than on the promoter sequence itself. Recent Hi-C analysis confirmed that the silent ESs cluster within the nucleus, while the active ES localizes to a unique subnuclear structure called the expression site body (Muller et al., 2018). The active vsg gene also interacts with a specific locus on chromosome nine where the spliced-leader (SL) RNA array is located (Faria et al., 2021). This locus consists of a tandem array of~200 SL RNA genes, each with its own promoter, encoding the SL-RNA that is trans-spliced to 5′ end of each trypanosome mRNA. This stabilizes the transcripts and provides them with a cap structure (Nelson et al., 1983;Perry et al., 1987). It is proposed that the close proximity of the active ES with the SL-RNA array acts as a posttranscriptional enhancer, ensuring high turnover of SL RNAs to more efficiently produce mature vsg mRNAs.
Using both FISH (Figueiredo et al., 2002;Duraisingh et al., 2005) and Hi-C (Ay et al., 2014), var genes in P. falciparum have been shown to cluster in specific nuclear compartments, with the active var gene being separated from the silent loci, but no specific genetic elements have been connected to this clustering. As previously mentioned, there are two var genes, var1csa and var2csa, that are uniquely conserved amongst all isolates of Plasmodium falciparum, and this conservation extends to related species that infect chimpanzees and gorillas (Gross et al., 2021). This is in stark contrast to the highly polymorphic sequences of all other var genes, leading to speculation that these two genes could play a role in organization of the family or by directly regulating transcription (Mok et al., 2008;Ukaegbu et al., 2015). Two other genetic elements that seem to be important for var gene activation and silencing are the promoter regions upstream of each gene and the conserved intron that interrupts all var coding regions. var promoters can be classified into five groups (UpsA, B, C, D and E), with the UpsA and B types found within subtelomeric regions of the chromosome while the UpsC promoters are found at genes located in the interiors of the chromosomes (Figures 7A,B). UpsD and E are specific to var1csa and var2csa, respectively. While the different promoter types display somewhat different rates of transcriptional activation and silencing, all appear to be co-regulated in terms of mutually exclusive expression. Experiments employing episomal constructs have shown that var promoters must be paired with a var intron to be subject to mutually exclusive expression (Deitsch et al., 2001;Frank et al., 2006;Dzikowski et al., 2007), and this interaction is dependent on specific pairing elements (PEs) located within the upstream region of every var gene and the intron (Avraham et al., 2012). The precise function of these interactions is unknown, although the intron has been shown to be the source of non-coding RNAs implicated in regulating var gene transcription.

Non-Coding RNAs
An additional layer of control common to several organisms that employ mutually exclusive expression is the involvement of noncoding RNAs. One of the first and best studied examples of lncRNAs involved in mutually exclusive expression is the X-inactive-specific transcript (Xist), which is required for X chromosome inactivation (XCI) (Penny et al., 1996;Wutz and Jaenisch, 2000). The minimal chromosomal region required to ensure XCI is referred to as the x-inactivation centre (Xic) (Augui et al., 2011) which contains the Xist sequence as well as several other key lncRNAs that act as regulators of Xist, including Tsix, Ftx, Jpx/Enox and Xite (Figure 3) (Ogawa and Lee, 2003;Froberg et al., 2013;Maclary et al., 2013;Loda and Heard, 2019). During embryonic development, the Xist transcript is incorporated into the chromatin structure of the inactive X chromosome, spreading from the Xic and "painting" the entire chromosome. This initiates the assembly of condensed heterochromatin and the segregation of the inactive X into the Barr body. In the mouse model, Tsix is transcribed from the Xic in the antisense direction as Xist and serves to prevent incorporation of Xist into the chromatin structure of the active X chromosome. Thus, this interplay of lncRNAs is key to the choice of which X chromosome becomes silenced and for the assembly of the condensed chromatin required to suppress gene expression.
Non-coding RNAs have also been implicated in the regulation of V(D)J recombination at the immunoglobulin loci. The immunoglobulin locus was one of the first examples of ncRNAs, at the time described as "sterile" transcripts. Strikingly, these transcripts are produced only from the chromosomal regions that are poised for recombination (Lennon and Perry, 1985). It has been proposed that transcription of these ncRNAs leads to opening of the chromatin structure and accessibility for recombination (Corcoran, 2010). Indeed, transcription of lncRNAs precedes each recombination event, with lncRNA transcription and recombination occurring first at the site of DJ recombination, then subsequently from the V segments. Both sense and antisense transcripts are produced, with transcripts coming from the coding region of V segments and antisense transcripts from the intergenic regions.
While the exact mechanisms are unclear, non-coding RNAs are thought to also play important roles in P. falciparum and G. lamblia in regulation of antigenic variation. Both sense and antisense lncRNAs are known to be produced from a bi-directional promoter within the intron of each var gene in P. falciparum (Figure 7) (Epp et al., 2009). The antisense lncRNA is 1.7 kb long and it is only expressed from the active var gene, early in the replicative cycle at the same time when the var mRNA is expressed. Interestingly, episomal expression of the antisense transcript can activate a silent var gene (Amit- Avraham et al., 2015), suggesting this lncRNA is involved in activating the locus. The sense lncRNA is 2.5 kb long and it is expressed from the exon 2 of all var genes late in the cycle when var genes are normally all silenced (Calderwood et al., 2003;Kyes et al., 2003). No clear function has yet been demonstrated for this lncRNA, but it remains nuclear and is associated with chromatin, suggesting it could function in var gene silencing. Another class of lncRNAs proposed to play a role in var gene regulation are named RUF6 (RNA of Unknown Function-6). These are transcribed from 15 genes that are dispersed among the chromosome-internal var gene arrays ( Figure 7A). They have a conserved sequence with an unusual GC-rich content compared to the rest of P. falciparum genome, which has <20% GC-content (Gardner et al., 2002;Upadhyay et al., 2005). Although their mechanism of action is unknown, FISH showed they localize close to var genes and their expression has been linked to activation of specific var genes (Wei et al., 2015;Guizetti et al., 2016;Barcons-Simon et al., 2020). Within the subtelomeric regions, UpsA and B var genes are separated from the chromosome end by Telomere Associated Repeat Elements (TARES) that are transcribed into lncRNAs (Broadbent et al., 2011;Sierra-Miranda et al., 2012) (Figure 7B). The function of these lncRNAs is not known, although they have been proposed to contribute to var gene regulation. Thus, similar to X-inactivation, there appear to be multiple lncRNAs involved in regulating var gene activation, silencing and mutually exclusive expression. The study of lncRNAs in model organisms indicates that they often bind chromatin and drive gene expression and chromatin rearrangement (Li and Fu, 2019), therefore a role for lncRNAs in the regulation of the var family seems likely.
Non-coding RNAs have also been proposed to play a key role in regulating antigenic variation in G. lamblia. Due to the presence of two nuclei and thus the need to coordinate the choice of which vsp gene is expressed to attain mutually exclusive expression, Giardia evolved a post-transcriptional control mechanism rather than regulating activation and repression at the level of transcription. Two different classes of small RNAs have been implicated in vsp control: small interfering RNAs (siRNAs) or micro-RNAs (miRNAs) (Prucca et al., 2008;Saraiya et al., 2014). Elimination of components of the RNA interference (RNAi) machinery, such as Dicer or RNA-dependent RNA Polymerase (RdRP), disrupts mutually exclusive vsp expression and results in trophozoites expressing more than one VSP on their surface. These experiments are consistent with a model in which all vsp genes are actively transcribed, but that a cytoplasmic complex including Argonaute, Dicer and the RdRPN limits expression to a single vsp mRNA, through the production of either miRNAs or siRNAs (Prucca et al., 2008;Saraiya et al., 2014). In either case, it has been proposed that the vsp mRNA that is ultimately stabilized and actively translated is determined by a threshold in the amount of transcript available in the cytoplasm. There is also evidence for a role for epigenetic mechanisms controlling the level of transcription from each vsp gene, thus contributing to which transcript reaches the threshold (Kulakova et al., 2006;Sonda et al., 2010;Carranza et al., 2016). Thus, the cumulative evidence clearly implicates aspects of RNAi in vsp control, however the exact mechanism for how expression is limited to a single VSP remains unknown. For a more detailed review of the proposed theories, refer to .

DNA Recombination and Repair
The DNA recombination and repair machinery are known to play an important role in the biology of multicopy gene families and mutually exclusive expression in multiple organisms, either by driving diversification of family members or through direct involvement in choosing which gene is expressed. In model systems, the two most studied examples are yeast mating-type switching and V(D)J recombination in the immunoglobulin loci. As previously mentioned, mating-type switching in yeast requires recombination of the locus where the active gene resides with one of two silent donor loci (Figure 1). In both yeast systems, the switch is initiated by a double strand break (DSB) at the active gene followed by Rad51/Rad52 mediated homologous recombination using one of the silent donor loci as the template for repair (Malone and Esposito, 1980;Aboussekhra et al., 1992). In S. pombe this DNA rearrangement results from two events during S-phase. The first, called imprint, involves the introduction of one or two ribonucleotides in a specific position (the MPS1 site) of the mat1 locus during lagging-strand synthesis of DNA replication. This insertion is inherited by only one of the two daughter cells and during subsequent leading-strand synthesis, the imprint causes stalling of the replication fork, which triggers recombination between mat1 and one of the two donor loci (mat2P or mat3M). As a result, one of the daughter cells has switched mating type (Dalgaard and Klar, 1999;Arcangioli and De Lahondes, 2000). In S. cerevisiae, the recombination event happens during G1 phase and is dependent on the expression of the endonuclease HO, which introduces a DSB at the MAT locus. As in S. pombe, this DSB is repaired by homologous recombination using the opposite mating-type donor cassette. The expression of HO is restricted to mother cells that have divided at least once Kostriken et al., 1983). For a recent detailed review of the mechanisms employed by both yeasts, see (Thon et al., 2019).
The tightly regulated action of the recombination machinery is also key for the generation of mature immunoglobulins. The process starts with the generation of DNA breaks at a specific sequence called the Recombination Signal Sequence (RSS) within one of the heavy chain alleles. This sequence lies next to each antigen receptor segment and it is crucial for recognition by the RAG1/2 recombinase complex (Swanson, 2004). Once one of the two alleles recombines successfully it produces a functional heavy chain that assembles with a surrogate light chain and forms the pre-B cell receptor. The pre-BCR initiates a feedback signal to inhibit rearrangement of the other heavy chain allele through the assembly of inaccessible chromatin and repression of RAG1/2. This also promotes a similar recombination at the light chain allele, including an analogous feedback to ensure only one allele is expressed and a mature B cell receptor is formed (detailed review in (Schatz and Ji, 2011)). Importantly, in order to have successful allelic exclusion, there must be asynchronicity between recombination of the two alleles, meaning that one of them has to recombine before the other. The choice of which allele recombines first is determined by differences in replication timing, with the allele that replicates early in S-phase undergoing the initial rearrangement (Mostoslavsky et al., 2001).
Regarding antigenic variation in parasites, recombination plays an especially prominent role for vsg gene expression in T. brucei. In addition to switching which expression site is activated, T. brucei has also evolved additional switching mechanisms involving DNA recombination ( Figure 6). For example, the parasites can employ gene conversion to replace the entire active vsg gene with the vsg gene from a silent subtelomeric expression site. The homology region used for recombination can vary, leading to conversion events that can extend from the promoter region to the telomeric repeats. Alternatively, parasites can also undergo telomere exchange where the telomere ends, including the vsg expression sites, are exchanged without sequence loss. Lastly, multiple fragments from different vsg pseudogenes can be merged to form a "mosaic" vsg in a phenomenon referred to as segmental gene conversion (Myler et al., 1984;Hall et al., 2013;Mcculloch et al., 2015). Efficient recombination between intact vsg genes requires RAD51, BRCA2 and the homologous recombination machinery, although a minor amount of switching by conversion has been detected in the absence of RAD51 (Mcculloch and Barry, 1999;Hartley and Mcculloch, 2008). Similar to recombinational switching in yeast and immunoglobulin genes, it was demonstrated that switching by conversion is initiated by a DSB in the expression site (Boothroyd et al., 2009). While it is not clear what directs a DSB to occur specifically in the active expression site, it is possible that high transcriptional activity makes the locus more vulnerable to DNA lesions. It has been shown that RNA-DNA hybrids (R-loops) can accumulate at the active expression site, resulting in genomic instability (Briggs et al., 2019), and as described in the next section, the active ES was also shown to be depleted of nucleosomes. It was proposed that a combination of this depletion with high levels of transcription could result in natural DSBs (Figueiredo and Cross, 2010;Stanne and Rudenko, 2010).
The ability to recombine variant genes is not only important for switching which gene is expressed, but also to drive diversification of the family between different parasite strains. For example, P. falciparum var genes undergo much more rapid diversification than the rest of the genome, and this diversification results from frequent recombination between variant gene family members during asexual replication (Bopp et al., 2013;Claessens et al., 2014;Otto et al., 2018a). Since Plasmodium lacks components of nonhomologous end joining (NHEJ), it relies entirely on homologous recombination (HR) to repair DNA breaks (Kirkman et al., 2014). Additionally, since the parasites are haploid during their asexual stage, DSBs repaired by HR must use homologous sequences from elsewhere in the genome as template for repair. When such breaks occur at or near var genes, the creation of chimeric var genes through gene Frontiers in Cell and Developmental Biology | www.frontiersin.org March 2022 | Volume 10 | Article 852239 conversion events are favoured (Siao et al., 2020). Most var genes are also located in subtelomeric regions that are vulnerable to telomere healing and telomere exchange, which can result in cascades of recombination between var genes that ultimately give rise to new mosaic var genes and drive rapid diversification of the family (Zhang et al., 2019).

Histones and Chromatin Modifications
A key component that all the described systems have in common is that activation and silencing of genes are associated with changes in chromatin modifications and assembly. Genes that need to be kept in a silent state are associated with silencing histone marks and heterochromatin formation, whereas expressed genes are associated with activating histone marks and a euchromatic state. Interestingly, for systems that employ recombination as the main means of gene switching, like yeast and immunoglobulin genes, the donor sequence is always associated with increased nuclease accessibility and activating histone modifications, such as acetylation (Klar et al., 1998;Bergman et al., 2003). The components involved in epigenetic control of multicopy gene families are largely conserved between the major model organisms. For example, gene activation is typically associated with acetylation on histone 3 and 4 (H3/ H4ac), or with methylation of lysines 4 and 36 on histone 3 (H3K4me/HE3K36me). In contrast, gene silencing is usually driven by methylation on histone 3, including H3K9 or H3K27. Several conserved chromatin modifying enzymes are involved in this process. Among the most conserved are heterochromatin-associated protein HP1 (Swi6 in yeast), the histone demethylases Lsd1 and Lsd2, the nucleosome remodelling proteins SWI/SNF and the polycomb proteins PRC1/PRC2 (Ekwall et al., 1995;Okamoto et al., 2004;Holmes et al., 2012;Brockdorff, 2013;Jaeger et al., 2013;Tan et al., 2013;Ji et al., 2019). As a consequence of its early divergence from the eukaryotic lineage, the chromatin organization and histone code of Trypanosoma brucei are only partially conserved. In addition to the canonical histones, T. brucei possesses four histone variants: H2Az, H2Bv, H3v and H4v. Additionally, the core histones appear to possess fewer modifications than in other eukaryotes, and the majority of them are not conserved at the sequence level (Maree and Patterton, 2014). Despite these divergences, epigenetic control has clearly been shown to be involved in activation and silencing of vsg expression. Silent vsg loci are enriched in H1, H2A, H3 and H3v, whereas the active expression site is depleted of nucleosomes (Figueiredo and Cross, 2010;Stanne and Rudenko, 2010). Additionally, a few chromatin modifiers have been implicated in vsg control. DOT1B is a histone methyltransferase that trimethylates lysine 76 of histone H3 in T. brucei and is required for vsg silencing (Figueiredo et al., 2008). Similarly, an orthologous member of the ISWI family of SWI2/SNF2-related chromatin-remodelling proteins (TbISWI) was demonstrated to be involved in downregulation of ES expression (Hughes et al., 2007), and knockdown of the chromatin remodeller and transcription elongation factor FACT/SPT16 results in permissive chromatin and vsg derepression (Denninger et al., 2010). T. brucei DNA is not methylated, but presents a modified base, β-Dglucopyranosyloxymethyluracil, called Base J (Gommers-Ampt et al., 1993). This base is enriched at trypanosome telomeres and at silent vsg genes but is not present at the active expression site (Van Leeuwen et al., 1997) and appears to be involved in termination of polycistronic transcription rather than transcriptional initiation (Reynolds et al., 2014). More recently, a protein complex specific to the active vsg was identified. VSG Exclusion 1 (VEX1) was shown to be sequestered at the active vsg locus and to associate with VEX2, an orthologue of the nonsensemediated-decay helicase, UPF1. Together, this complex sustains chromatin accessibility and active transcription. Additionally, when the complex is depleted, vsg expression becomes heterogeneous (Glover et al., 2016;Faria et al., 2019).
In P. falciparum, the histone tails are largely conserved in sequence with those of model eukaryotes, however the linker histone H1 is missing (Gardner et al., 2002). Some additional histone variants have been identified, namely H2Az and H2Bz, with the latter seemingly unique to Apicomplexans. These two variant histones are associated with the activating histone marks H3K9ac and H3K4me3 and are enriched at the active var gene (Bartfai et al., 2010;Hoeijmakers et al., 2013;Petter et al., 2013). Additionally, Plasmodium has devoted specific histone marks, which are typically distributed throughout the genome in model organisms, to clonally variant gene families including var genes. Specifically, H3K9 trimethylation is enriched at the promoters of silent var genes and is recognized by the silencing heterochromatin protein 1 (PfHP1), whereas H3K36me3, deposited by the methyltransferase PfSet2, has been found on both silent and active var genes (Flueck et al., 2009;Salcedo-Amaya et al., 2009;Jiang et al., 2013;Ukaegbu et al., 2014). PfSet10, a methyltransferase responsible for methylation of H3K4, has been associated with var genes activation. Interestingly, this enzyme localizes to the active var gene during the late stage of asexual replication, when var genes are not expressed, so it has been proposed to play a role in maintenance of epigenetic memory through multiple cell cycles (Volz et al., 2012). ATAC-Seq has been used to analyse nucleosome distribution at var genes and detected several peaks of accessibility, but not a clear correlation with the active var gene. Interestingly, it was proposed that increased accessibility of the two RUF-6 genes flanking the active var gene could play a role in var transcriptional activation (Ruiz et al., 2018).
In G. lamblia, the precise involvement of epigenetic control remains unclear. It appears that acetylation of the upstream region of vsp genes is associated with activation and it is known that RNA interference can induce the deposition of epigenetic marks (Kulakova et al., 2006;Francia, 2015). However, when histone deacetylases (HDACs) were knocked out, no major change was observed in vsp gene expression (Sonda et al., 2010).

Nuclear Position and Chromatin Assembly
Spatial organization of DNA within the nucleus is crucial to gene regulation. In 1949 Barr and Bertram first reported a "nucleolar satellite" in the nucleus of feline nerve cells. This nuclear body was only found in cells isolated from female cats and was not observed in male cats. They went on to speculate that "the nucleolar satellite may be derived from the heterochromatin of the sex chromosomes." (Barr and Bertram, 1949). This structure, known as the Barr body, was further determined to be formed by one of the two sex chromosomes in female cells (Ohno and Hauschka, 1960) leading to the hypothesis that this condensation resulted in silencing and inactivation of the chromosome (Lyon, 1961). The silenced X chromosome was shown to localize either to the nucleolus or the nuclear periphery (Barr and Bertram, 1949;Klinger and Schwarzacher, 1960), consistent with the heterochromatic positioning associated with transcriptional silencing (Towbin et al., 2013;Holla et al., 2020). Nuclear organization and assembly of silent chromatin have also been implicated in monoallelic expression of olfactory receptors in olfactory sensory neurons. Interchromosomal interactions and nuclear repositioning are key to the stabilization of the stochastic selection of a single OR from the thousands of available alleles (Chess et al., 1994). Following activation of a single OR gene, there is a confluence of the OR gene clusters (Armelin-Correa et al., 2014) resulting in increased interactions between the Greek Island enhancer elements and the active OR allele (Markenscoff-Papadimitriou et al., 2014;Monahan et al., 2019). This allows for the formation of a super-enhancer-like unique nuclear phase, allowing for the local accumulation of activating factors ( Figure 4C). Key to the repression of the silent ORs is their aggregation into a small number of distinct heterochromatic foci (Clowney et al., 2012;Lyons et al., 2013).
Numerous lines of evidence also support the conclusion that correct recombination and assembly of the active immunoglobulin locus is guided by nuclear positioning (Reviewed in (Jhunjhunwala et al., 2009)). Consistent with what is observed for X chromosome inactivation and OR gene expression, in progenitor cells, the Ig loci are associated with the nuclear lamina and kept in a silent state. Subsequently in the gene undergoing recombination, chromatin looping allows the D and J segments to re-localize, while the V region remains tethered to the nuclear periphery. During B cell commitment, the V region is also relocated away from the nuclear lamina. Once productive rearrangement of the locus has occurred, the non-functional Ig allele repositioned into heterochromatin at the nuclear periphery (Skok et al., 2001;Kosak et al., 2002;Roldan et al., 2005).
Similar dynamics involving nuclear relocation and chromatin reorganization have been observed in parasites, including T. brucei which, as mentioned above, lacks identifiable enhancers that could mediate chromosome looping or repositioning. One of the first pieces of evidence in T. brucei was provided by the identification of an extra nucleolar RNA Polymerase I transcription site during the bloodstream stage of the parasite's life cycle. This site, named the Expression Site body (ESB), is where the active ES is located and gets transcribed (Navarro and Gull, 2001). Remarkably, partial nuclear relocation is also observed for the silent ESs. During the bloodstream stage, the silent ESs are separated from the active ESB, but are located in extranucleolar clusters, whereas in the insect form all Es are sequestered in the nuclear heterochromatic periphery (Landeira and Navarro, 2007). The relocation of silent ESs during the bloodstream stage is proposed to promote more rapid activation. The ESB is maintained throughout the cell cycle and is dependent on the cohesin complex. Depletion of this complex interferes with antigenic switching and promotes activation of a previously silent ES (Landeira et al., 2009). Similarly, depletion of the nuclear periphery proteins 1 and 2 (NUP-1/2) results in increased VSG switching, providing additional evidence that nuclear positioning is crucial in trypanosome antigenic variation (Dubois et al., 2012;Maishman et al., 2016). In a recent publication, Budzak and others identified three different nuclear bodies associated with the ESB: the Cajal body, the spliced-leader array body (SLAB) and the NUF1P body. They proposed that these bodies facilitate the high requirement for splicing that needs to occur at this site (Budzak et al., 2022).
Nuclear localization is also thought to play an important role in control of var gene expression in P. falciparum. var genes are known to cluster at the nuclear periphery, although some argument remains as to whether all var genes cluster together in one locus, as suggested by recent Hi-C profiling, or instead in multiple spots as suggested by FISH experiments (Freitas-Junior et al., 2000;Freitas-Junior et al., 2005;Lemieux et al., 2013;Ay et al., 2014). Similar to T. brucei, the active var gene appears to relocalize into a discreet euchromatic location, separated from the silent var genes (Duraisingh et al., 2005;Ralph et al., 2005). The repositioning of the active var promoter to a different nuclear location was also demonstrated using extrachromosomal var promoters located on episomes (Dzikowski et al., 2007). These experiments confirmed that the active gene is separated from silent genes, but also showed that more than one promoter can localize within the active site at the same time, suggesting that localization alone is not enough to maintain mutually exclusive expression (Voss et al., 2006;Dzikowski and Deitsch, 2008). This conclusion was also supported by observations in parasites expressing two var genes at the same time (Joergensen et al., 2010). Interestingly, it was shown that active members of another clonally variant multicopy gene family, the rifins, relocate in the same active nuclear compartment as var genes .
A Two-step Process for Selecting a Gene for Activation Models of mutually exclusive expression have typically presumed that activation of a single copy or allele is a strictly controlled process, where in every cell only one gene can be activated at any given time without exceptions. Recent technical advances, especially the ability to observe certain phenomena at single cell resolution, has partially challenged this dogma. Employing single cell RNA-sequencing, two independent groups analysed the transcriptomes of single olfactory sensory neurons during development in order to establish when OSNs select a single OR gene for expression. They discovered that before committing to expression of a single OR gene, immature OSNs express low levels of multiple genes. It is not known if the single gene that is ultimately fully activated is selected from the subset of genes that are initially expressed at a low level, a model described as "winner-take-all," however, it is clear that selection of a single OR gene requires initial multiple gene transcription (Hanchate et al., 2015;Tan et al., 2015).
A similar phenomenon has recently been described in African trypanosomes prior to selection of a metacyclic vsg (mvsg) for expression. As described in the introduction, T. brucei begins to express a VSG coat when still in the insect host at the metacyclic stage (Tetley et al., 1987;Ramey-Butler et al., 2015). Hutchinson and others analysed single cell transcriptomes of parasites in the salivary glands of tsetse flies and identified the presence of two metacyclic populations: one, described as pre-metacyclic, expressing multiple mVSGs at low levels and one expressing a single mVSG at a higher level . Additionally, in the mammalian host, single cell RNA-Seq and single cell RT-PCR showed that transcription initiation happens at several ES promoters in every cell, but that productive elongation only occurs at one (Kassem et al., 2014;Muller et al., 2018). In both systems, the choice of a single gene for activation is hypothesized to occur through a two-step process in which initially multiple genes are expressed followed by selection of a single gene in the fully mature cell. Both examples also highlight how studying certain phenomena at the population level is not sufficient to decipher all aspects of the underlying mechanisms in detail. For a recent review on the importance of cell-to-cell studies in T. brucei, check .
Although analysis of var gene expression at the single cell level in P. falciparum has not yet been published, a two-step selection scenario similar to ORs and vsg has been hypothesized based on the study of clonal populations and mathematical modelling using data derived from in vitro cultures (Recker et al., 2011). This model describes an optimized hypothetical var gene network wherein parasites initially enter an intermediate "many" state in which several var genes are expressed at low levels. This is followed by selection and high-level activation of a single var gene, and the encoded PfEMP1 becomes the dominant antigen expressed by the population of parasites. In addition to these studies of cultured parasites, in vivo infections showed that at the onset of a bloodstream infection, multiple var genes are detectably transcribed at a low level (Wang et al., 2009;Bachmann et al., 2016). Similarly, it was demonstrated that erasing the epigenetic var memory by promoter titration or passage through the mosquito results in erythrocytic-stage parasites expressing a subset of var genes before establishment of mutually exclusive expression (Dzikowski et al., 2007;Fastman et al., 2012). These studies suggest that a two-step process for selection of a single gene for mutually exclusive expression might be a common pathway found throughout the eukaryotic lineage.

CONCLUSION
Mutually exclusive expression of genes from multicopy families appears to be a strategy conserved throughout eukaryotic evolution. As described in this review, several layers of control interact in a complex mechanism that ultimately results in the expression of a single family member. Understanding the molecular details underlying mutually exclusive expression remains a challenge in all eukaryotes, from model systems to the evolutionarily distant protozoan parasites. Nonetheless the development of new methodologies and modern technological advances have shed new light on this puzzle and provided hints that many of the mechanisms involved are likely to be shared throughout the eukaryotic lineage. This represents an exciting time to work in this field given that more discoveries are likely as new methods are refined and are applied to additional biological systems.
With regard to the pathogenic organisms, while many basic strategies are shared it also is clear that the divergent nature of the parasites' genomes has led to differences in some aspects of how antigenic variation is controlled. For example, the substantial difference in the size of the antigen encoding gene families in T. brucei when compared to P. falciparum could be partially responsible for the different strategies evolved by the two parasites. In addition, the polycistronic nature of kinetoplast transcription prevents T. brucei from relying solely on transcriptional regulation the way P. falciparum appears to. Similarly, the bi-nucleated structure of G. lamblia requires post-transcriptional control in the parasite's cytoplasm, thus ensuring that only a single mRNA is expressed despite the existence of two transcriptionally active nuclei.
Despite these differences, much can be learned by exploring and comparing the strategies of different organisms. A better understanding of how mutually exclusive expression contributes to antigenic variation will undoubtedly improve our understanding of pathogenesis and virulence and might also translate into new disease intervention strategies. Thus, the advantages of comparative studies that apply the lessons learned in model systems to organisms of significance to human health continue to hold great promise.

AUTHOR CONTRIBUTIONS
All three authors contributed equally to the conception, writing and editing of the manuscript.