An Evolutionary View of Trypanosoma Cruzi Telomeres

Like in most eukaryotes, the linear chromosomes of Trypanosoma cruzi end in a nucleoprotein structure called the telomere, which is preceded by regions of variable length called subtelomeres. Together telomeres and subtelomeres are dynamic sites where DNA sequence rearrangements can occur without compromising essential interstitial genes or chromosomal synteny. Good examples of subtelomeres involvement are the expansion of human olfactory receptors genes, variant surface antigens in Trypanosoma brucei, and Saccharomyces cerevisiae mating types. T. cruzi telomeres are made of long stretches of the hexameric repeat 5′-TTAGGG-OH-3′, and its subtelomeres are enriched in genes and pseudogenes from the large gene families RHS, TS and DGF1, DEAD/H-RNA helicase and N-acetyltransferase, intermingled with sequences of retrotransposons elements. In particular, members of the Trans-sialidase type II family appear to have played a role in shaping the current T. cruzi telomere structure. Although the structure and function of T. cruzi telomeric and subtelomeric regions have been documented, recent experiments are providing new insights into T. cruzi's telomere-subtelomere dynamics. In this review, I discuss the co-evolution of telomere, subtelomeres and the TS gene family, and the role that these regions may have played in shaping T. cruzi's genome.


INTRODUCTION
Trypanosome cruzi causes Chagas disease a debilitating and often lethal malaise affecting millions of people in Latin American countries. T. cruzi populations are very variable and this variability is due in part to a clonal population structure, genome plasticity and abundance of repeated sequences. Nearly 50% of the total genome is made of repeated sequences, some of which code for protein families such as the Transialidases (TS), mucins (MUC), mucins associated proteins (MASP), Disperse gene family 1 (DGF-1), Retrotransposon Hot Spot proteins, and retrotransposons elements .
The Trans-sialidase family is divided into eight groups (Freitas et al., 2011), and the family name derives from enzymes included in the first group that can transfer host sialic acid onto the parasite's surface MUC. The rest of the TS groups have no enzymatic activity but code for surface glycoproteins involved in infectivity, adhesion, and evasion from the host immune response. MUC are not only receptors for sialic acid (Di Noia et al., 1995) but together with MASP proteins (Bartholomeu et al., 2009) provide a developmentally regulated shield that protects the parasite in hostile environments. DGF-1 are integrin-like proteins (Gonzalez et al., 2009;Kawashita et al., 2009;Lander et al., 2010) that are developmentally regulated, but whose role has not been deciphered. The Retrotransposon Hot Spot (RHS) family codes in T. brucei for nuclear and perinuclear proteins and their sequences are targets for the insertion of RIME/ingi non-LTR retrotransposons (Bringaud et al., 2002). In T. cruzi RHS genes are frequently interrupted by the non-LTR retrotransposon LTc1 (Bringaud et al., 2006).
Early experiments in Manning's lab (Peterson et al., 1989;Ruef et al., 1994) revealed that two members of the Transialidase family type II (TSII) located in the vicinity of the telomere were highly expressed, whereas another member, in a more internal chromosomal location, showed a lower expression level (Peterson et al., 1989). After this original report, Freitas-Junior et al. (1999) and Chiurillo et al. (1999) confirmed the presence of TSII gene members in the subtelomeres of several T. cruzi strains. The cloning of T. cruzi's telomeric and subtelomeric regions using a vector-adaptor complementing the last nine nucleotides of the telomere (Chiurillo et al., 1999(Chiurillo et al., , 2002Kim et al., 2005), allowed us to define the sequence of the telomeric repeat, and have a detailed view of a large segment of the subtelomere.
Subtelomeres are often described as regions of genomic instability (Baird, 2018), and this observation is supported by the abundance of retrotransposon fragments and pseudogenes in T. cruzi telomeric clones, but they also represent a buffer region to minimize chromosome damage when telomere attrition occurs. However, despite this unstable environment, complete gene copies of DGF-1, TS, and RHDEAD/H-RNA helicase and N-acetyltransferase in T. cruzi subtelomeres are expressed, indicating that a positive selection is preserving the integrity of these genes at these locations (Moraes Barros et al., 2012). At the end of subtelomeres, there is a 189 base pairs sequence with homology for the 3 ′ and 5 ′ UTRs sequences of a gp85 gene (TSII) that we dubbed the 189 bp junction (Kim et al., 2005). A recapitulation of the possible events associated with the creation of this junction was proposed by Kim et al. (2005), Chiurillo et al. (2017) in which tandem repeats of TSII genes suffered breakages, excisions and rejoining to generate the junction. After this junction, the chromosomes are capped by runs of hexameric repeats of variable length ending in a single strand terminus 5 ′ -TTAGGG-OH-3 ′ .

THE EVOLUTION OF T. cruzi TELOMERES AND THE ROLE OF MEMBERS OF THE TSII GENE FAMILY IN SHAPING ITS GENOME
The intimate association between telomeres and TSII (gp85) sequences raises many questions about the creation of T. cruzi telomeres, such as how and when TS II family members were incorporated into the 189 bp junction? Were they part of transposon elements that eventually evolved into a primitive telomere? Did they come along with the telomeric repeats?
When we examine the structure of eukaryotic telomeres we observe a tremendous diversity in telomeric repeats and sequences associated with telomeres, the extreme case of Diptera where transposons assumed the role of telomeres. Telomeres arose during eukaryogenesis as a need to protect the ends of linear chromosomes from exonucleases degradation, prevent chromosome rearrangements, and facilitate the replication of the DNA lagging strand. This process may have started with the endosymbiosis of an ancestral phagotrophic host cell (Cavalier-Smith, 2002) and an α-proteobacterium carrying Type II introns. Type II introns are Eubacterial mobile retroelements with ribozyme and reverse transcriptase activities. Eventually, Type II introns were transferred into the host cell genome and gave rise to the spliceosome, the non-LTR transposons and telomerase (Garavis et al., 2013;Podlevsky and Chen, 2016). According to this hypothesis, these non-LTR transposons were inserted at several genome locations, including the chromosome ends, originating the prototelomeres.
On the other hand, although still debatable, based on the sequence homogeneity of the transialidases within the animal kingdom, and their sequence homology with their very variable bacterial counterparts (Roggentin et al., 1993) proposed that sialidases originated in ancestors of the Echinodermata and Deuterostomate animals, and later these genes were horizontally transferred to bacteria via viruses. However, Schwerdtfeger and Melzig (2010) argued that the irregular appearance of sialidases in invertebrates does not support a common evolution of this gene family. In agreement with this view, in the Trypanosomatidae family, a taxon that includes pathogenic flagellates, we also observe this kind of patchy inheritance, since of all genera within this family the TS genes are only present in the genus Trypanosoma.
It is assumed that the first trypanosome to acquire sialidase genes was T. brucei or a common ancestor of both T. brucei and T. cruzi. The separation Trypanosoma brucei from Trypanosoma cruzi occurred approximately 100 millions of years during the Gondwanaland breakage (Briones et al., 1995;Stevens et al., 1999), and since then, the TS family suffered an expansion exclusively in the T. cruzi clade (Chiurillo et al., 2016a). During this long separation, the two species developed different survival strategies within their vectors and hosts. T. brucei uses the telomeric-subtelomeric compartment as a specialized expression site for variable antigenic determinants. On the contrary, T. cruzi acts like a stealth invader jamming the vertebrate host immune system through a simultaneous expression of multiple surface antigens (Millar et al., 1999), and among these, members of the TS family play an important role.
How the expansion of TSII genes occurred in T. cruzi is unknown, but likely, several events of gene mutation, duplication, recombination, transposition, and genetic drift generated the current picture of the TS family. Vestiges of these events are the salad of sequences derived from retrotransposon elements and Retrotransposons Hot Spot (RHS) genes scattered through the genome, and particularly in T. cruzi subtelomeres (Figure 1). The abundance of these elements at the subtelomere suggests the active participation of these regions in shaping the parasite genome (Kim et al., 2005). Regarding the telomeric repeat 5 ′ -TTAGGG-OH 3 ′ , it is found in Plantae, Chromoalveolata, Excavata (where Trypanosomatids are included) and Rhizaria supergroups, a reason why some authors (Dressen et al., 2007;Fulneckova et al., 2013) have suggested that it was the repeat of the ancestral Eukaryotic telomeres. Thus, a safe assumption is that the TS sequences in the subtelomeric transition (189 bp junction) appeared after the hexameric repeats were already fixed, and perhaps the fixation of the 189 bp junction has to do with its adaptation to the telomeric function, thus this sequence appears to be an integral part of T. cruzi telomere. Chiurillo et al. (2017) have proposed that the 189 bp junction may act as a seed to stabilize and/or create new telomeres via telomerase, but it is also possible that it was adopted as a recognition site for other proteins that contribute to the telomeric function.

GENE FAMILIES AND SUBTELOMERES
In T cruzi apart from the TS family other gene families suffered big expansions, namely RHS, DFG-1, MASP, and MUC and these genes tend to form clusters at multiple locations in the genome, in some cases covering whole chromosomes Weatherly et al., 2009;Berna et al., 2018;Callejas-Hernández et al., 2018). Except for MUC and MASP families, these expanded families are often found at chromosomal terminal locations Weatherly et al., 2009;Moraes Barros et al., 2012;Callejas-Hernández et al., 2018). Therefore it is not surprising to detect members of these families when T. cruzi telomeres and subtelomeres are cloned (Chiurillo et al., 1999;Freitas-Junior et al., 1999). Thus, the T. cruzi genome is organized in blocks of syntenic non-repeated gene sequences at more interstitial locations, and non-syntenic blocks consisting of repeated genes that are intermingled with retrotransposons and other highly repeated elements, some of which are located at subtelomeres . The presence of many repeated genes may propitiate recombination events that, although not desirable for housekeeping genes, favor the generation of variability for genes coding for surface proteins.
The uniqueness of terminal chromosomal locations (subtelomeres) as a ground for the generation of new capabilities has been well documented among other organisms in yeast (Haber, 1998), Plasmodium (Scherf et al., 2008), T. brucei (Horn, 2004), and the expansion of human odor receptors (Mefford et al., 2001). Based on these observations and in the composition of T. cruzi telomeres and subtelomeres, we proposed that subtelomeres were places where the variability of some of these multigene families was generated, and in later events, the gene variants were mobilized to different locations in the genome (Kim et al., 2005).

HYPOTHETICAL MECHANISMS TO GENERATE GENE VARIABILITY
What are the mechanisms that may generate gene variability in T. cruzi, and how has the mobilization of gene variants occurred? Is the variability generated by gene conversion or unequal crossing over? Are these processes still occurring in T. cruzi populations?
Our first attempt to address these questions was through in silico simulation studies using as inputs real gene sequences generated by the T. cruzi genome project (Azuaje et al., 2007). The simulations evaluated the generation of variability by introducing different mutagenic pressures in housekeeping genes, and members of gene families with presence at subtelomeres. The premise was that mutation rates would be a trade-off between the generation of variability to expand adaptive capabilities, and the need to keep important core functions. This study concluded that housekeeping genes were more robust against the introduction of random point mutations than genes coding for surface proteins and that the most effective mechanism to introduce variability was gene conversion. The energetic burden of keeping a large number of pseudogenes is an indication that they play an important role in the parasite, an observation that prompted us to include pseudogenes of RHS, TS, and DGF-1 in our simulations. The results confirmed the potential of pseudogenes to contribute to the generation of variability. This finding contradicts the idea that pseudogenes are merely relics of gene deterioration (Rogers et al., 2011) or pseudogenization.
An important piece of information came from experiments using T. cruzi artificial chromosomes (pTAC) carrying the 189 bp junction, hexameric repeats, and drug selection markers (Curto et al., 2014). When T. cruzi epimastigotes were transformed with these pTCAs, they were able to replicate with surprising stability for 150 generations in the presence of the selection drug, or 60 generations without it. In other words, the pTACs showed nondetectable sequence exchange with the host chromosomes and replicated and segregated without the presence of centromeres, or perhaps the telomere was fulfilling this role. In a more recent experiment (Chiurillo et al., 2016b), addressed the possibility of chromosomal sequence exchanges by introducing a cutting site for the rare meganuclease I-SceI within the RHS gene of one pTACs (pTAC-D6CISceI * ) harboring larger portions of T. cruzi subtelomeres. After transforming this pTAC into T. cruzi cells expressing the meganuclease I-SceI, and confirming that double-strand breaks (DBSs) were produced, the probing pTAC was examined to check whether the DSBs were repaired. Out of seven clones studied, six showed repairs, as evidenced by the disappearance of the I-SecI site, and the reversion to the original pTAC (pTAC-D6C * ). The most likely explanation is that the repair events used as template the host subtelomere homologous to the pTAC. The seventh clone was repaired losing the I-SecI site, but the sequences around the repair site shared homology with the subtelomere of another chromosome (ectopic recombination). From these experiments several conclusions can be derived: first chromosomal exchanges at the subtelomere can be promoted by the introduction DSBs; second the repair mechanism for these DSBs is homologous recombination (HR), or some version of it (Dressen et al., 2007); third, events involving exchanges with non-homologous chromosomes (ectopic recombination) can occur, with the possibility of generating gene variants. These experiments don't rule out that similar exchanges can occur at more interstitial locations. The absence of detectable recombination in earlier pTAC experiments was likely due to a lack of sufficient homology with any given chromosome subtelomere, and/or that DSBs are strictly necessary to induce recombination.
Contrary to most eukaryotes in T. brucei and T. cruzi the most important repair mechanism for DSBs repair is HR, with minor participation of microhomology repair (MHR). To detect MHR-DSB repair, it was necessary to abolish HR (Glover et al., 2008).
In T. cruzi out of the potential genes that participate in HR, RAD51 plays a central role in facilitating homologous strand invasion (Gomes Passos Silva et al., 2018). An interesting activity discovered after massive gamma radiation of T. cruzi cells was tyrosyl-DNA phosphodiesterase I (Tdr-I), an enzyme that plays an important role in Topoisomerase I mediate DNA DBS repair (Das et al., 2010). No genes for NHEJ have been found in T. cruzi (Gomes Passos Silva et al., 2018). MHR seems to occur in chromosomal rearrangements when single DSB is introduced by Cas9 endonuclease (Lander et al., 2015;Soares Medeiros et al., 2017).
An efficient HR repair mechanism in T. cruzi may explain the rapid karyotype and cell growth recovery after massive irradiation with 500 Gy of gamma radiation (Garcia et al., 2016), and the difficulty in generating mutations (indels) in CRISPR experiments involving a single DSB vs. gene replacement (Lander et al., 2015;Soares Medeiros et al., 2017). But also, the prevalence of a very stringent HR repair mechanism in genomes with a large number of repeated sequences, like T. cruzi, hampers (but does not eliminate) chromosomal-internal recombination events that can be detrimental to the organism.
Thus, we believe that an important source for the generation of variability in some T. cruzi surface proteins is subtelomeric recombination promoted by DSBs followed by a dispersion of these variants either by transpositions or ectopic recombination events. Further duplications events and genetic drift produced the current clusters that we see at several locations of the T. cruzi's genome.
Along this line, experiments with meganuclease I-Sce-I in T. brucei (Dressen et al., 2007) revealed that the introduction of a DSB in telomeric VSG gene promoted antigenic switching via gene conversion. The break is resolved by a replication mechanism induced by this break (Break Induced Replication, BIR). Similar detailed studies to determine how DBSs repair occurs in T. cruzi are missing due in part to the lack of an RNAi machinery and/or inducible promoters coupled to CRISPR-Cas9 vectors.
As mentioned before, despite this stringent HR mechanism interstitial chromosome rearrangement leading to karyotype changes can occur via ectopic recombination within multigene families through duplicated sites flanking these sequences (Figures 1A,B).

HOW DSBs CAN BE INTRODUCED IN T. cruzi SUBTELOMERES?
In T. cruzi retrotransposon elements occupy nearly 5% of its genome and among these elements, the L1Tc non-LTR retrotransposon has the necessary machinery for its mobilization (Macías et al., 2018). L1Tc genome codes for an AP endonuclease activity (NL1Tc) capable of introducing breaks for the insertion of the retrotransposon, but also plays a role in repairing DSBs produced by daunorubicin (Olivares et al., 2013). Other retrotransposon elements like SLAKS and CZAR code for sitespecific endonucleases (Macías et al., 2018). So, some DSBs may be introduced in the subtelomeres by retrotransposon nucleases, and HR repairs the break using as templates homologous chromatids, or on occasion non-homologous chromatids (ectopic recombination). In T. cruzi there are active L1Tc transposons and no RNAi machinery to counteract their activities. Several observations about the organization of multigene families reveal close associations with retrotransposon sequences i.e., most DGF-1 and TSII copies are flanked by RHS (Olivares et al., 2000;Kim et al., 2005) or L1Tc retrotransposons sequences, the duplications of RHS and L1Tc genes at both sides of these genes suggest the occurrence of ectopic recombination events (Figure 1; Olivares et al., 2000). Although L1Tc appears to be randomly distributed in T. cruzi's genome, 50% of its copies are associated with RHS genes flanked by the putative insertion site 5 ′ -TGCAGACAT-OH-3 ′ (Olivares et al., 2000; Figures 1B,C). Also, they are found inserted downstream the sequence GA (x) 2 AxGa (x) 5 txTATG↑A(x) 11 ↑ where arrows mark the single strand cleavage sites Bringaud et al., 2006). How frequent DBSs leading to genetic recombination and gene variability occurs is difficult to assess, since as shown in the experiments with meganuclease I-Sce-I (Chiurillo et al., 2016b), DBSs are mainly repaired by HR using as template homolog chromosomes.
The MASP superfamily is associated with the site-specific retrotransposon TcTREZO (Souza et al., 2007). These elements are frequently found flaking MASP genes, thus providing potential sites for HR. Since TcTREZO is species-specific, it must have appeared after the separation of T. cruzi and T. brucei, and it may have played an important role in the expansion of the MASP family. MASPs proteins present highly conserved N and C terminal sequences, and a variable middle region, also besides, their gene's 5 ′ and 3 ′ UTRs are highly conserved (Bartholomeu et al., 2009). So sequence evolution in this gene family is quite different from the rest of the repeated families. Interestingly other retrotransposons and members of the gp85, DFG-1, and RHS gene families frequently interrupt MASP clusters.
Duplication and mobilization of genes can also occur via piggybacking the retrotransposon reverse transcriptase machinery by the transduction of genes neighboring the retrotransposons insertion site ( Figure 1D). The termination signal for transposon transcriptases is usually weak, thus transcription can run through neighboring segments which can be duplicated and mobilized elsewhere (Xing et al., 2006). So far no experiments have been conducted to address this type of event in T. cruzi, although the size of the DNA segments that non-LTR retrotransposons can transduct is usually small (1 or 3 Kbp), making it an, unlike mechanism to mobilize large genes like DGF-1 (>10 Kbp).
In T. brucei, in the case of non-recombinational gene conversion for antigenic switching, alternatives for the generation of subtelomeres DSBs have been proposed, such as accelerated transcription, conflicts between replication and transcription machineries , and TERRA (Telomeric Repeat-containing RNA) transcription leading to the formation of R-loops (Nanavaty et al., 2017;Saha et al., 2019). Since antigenic variation is a vital phenomenon for the survival of T. brucei populations, it is not surprising that redundant mechanisms exist to make sure that antigenic switching occurs. In the case of T. cruzi, not such extensive studies have been done, and it is possible that R-loops formation or transcription-replication conflicts may also contribute to the generation of subtelomere DBSs. In this review, I favored the role of retrotransposons in the generation of DBSs, given the ubiquity of these elements in the T. cruzi genome, their close association with important surface antigens families, and the absence of RNAi machinery.

CONCLUSIONS
Once T. cruzi telomeres were fixed, members of the TS II family positioned at the subtelomeres co-evolved to be part of the transition to the telomeric repeat. Although the reason for the fixation of this junction is still unknown, it suggests a potential telomeric function for this region. The variability of some surface proteins and their localization at the subtelomeres together with retrotransposon elements suggests that these regions are grounds for the generation genetic variability. We propose that DBSs introduced in the subtelomeres by retrotransposon nucleases are repaired by homologous recombination, and when the repair includes non-homologous chromatids there is a possibility to generate gene variants. These variants are mobilized elsewhere either by transposition or ectopic recombination. Gene families increased their numbers by gene duplication to achieve higher expression levels. MUC and MASP superfamilies likely evolved in later events not related to the subtelomeres.

AUTHOR CONTRIBUTIONS
The author confirms being the sole contributor of this work and has approved it for publication.