In and Outs of Chuviridae Endogenous Viral Elements: Origin of a Potentially New Retrovirus and Signature of Ancient and Ongoing Arms Race in Mosquito Genomes

Background Endogenous viral elements (EVEs) are sequences of viral origin integrated into the host genome. EVEs have been characterized in various insect genomes, including mosquitoes. A large EVE content has been found in Aedes aegypti and Aedes albopictus genomes among which a recently described Chuviridae viral family is of particular interest, owing to the abundance of EVEs derived from it, the discrepancy among the chuvirus endogenized gene regions and the frequent association with retrotransposons from the BEL-Pao superfamily. In order to better understand the endogenization process of chuviruses and the association between chuvirus glycoproteins and BEL-Pao retrotransposons, we performed a comparative genomics and evolutionary analysis of chuvirus-derived EVEs found in 37 mosquito genomes. Results We identified 428 EVEs belonging to the Chuviridae family confirming the wide discrepancy among the chuvirus genomic regions endogenized: 409 glycoproteins, 18 RNA-dependent RNA polymerases and one nucleoprotein region. Most of the glycoproteins (263 out of 409) are associated specifically with retroelements from the Pao family. Focusing only on well-assembled Pao retroelement copies, we estimated that 263 out of 379 Pao elements are associated with chuvirus-derived glycoproteins. Seventy-three potentially active Pao copies were found to contain glycoproteins into their LTR boundaries. Thirteen out of these were classified as complete and likely autonomous copies, with a full LTR structure and protein domains. We also found 116 Pao copies with no trace of glycoproteins and 37 solo glycoproteins. All potential autonomous Pao copies, contained highly similar LTRs, suggesting a recent/current activity of these elements in the mosquito genomes. Conclusion Evolutionary analysis revealed that most of the glycoproteins found are likely derived from a single or few glycoprotein endogenization events associated with a recombination event with a Pao ancestral element. A potential functional Pao-chuvirus hybrid (named Anakin) emerged and the glycoprotein was further replicated through retrotransposition. However, a number of solo glycoproteins, not associated with Pao elements, can be found in some mosquito genomes suggesting that these glycoproteins were likely domesticated by the host genome and may participate in an antiviral defense mechanism against both chuvirus and Anakin retrovirus.

Background: Endogenous viral elements (EVEs) are sequences of viral origin integrated into the host genome. EVEs have been characterized in various insect genomes, including mosquitoes. A large EVE content has been found in Aedes aegypti and Aedes albopictus genomes among which a recently described Chuviridae viral family is of particular interest, owing to the abundance of EVEs derived from it, the discrepancy among the chuvirus endogenized gene regions and the frequent association with retrotransposons from the BEL-Pao superfamily. In order to better understand the endogenization process of chuviruses and the association between chuvirus glycoproteins and BEL-Pao retrotransposons, we performed a comparative genomics and evolutionary analysis of chuvirus-derived EVEs found in 37 mosquito genomes.
Results: We identified 428 EVEs belonging to the Chuviridae family confirming the wide discrepancy among the chuvirus genomic regions endogenized: 409 glycoproteins, 18 RNA-dependent RNA polymerases and one nucleoprotein region. Most of the glycoproteins (263 out of 409) are associated specifically with retroelements from the Pao family. Focusing only on well-assembled Pao retroelement copies, we estimated that 263 out of 379 Pao elements are associated with chuvirus-derived glycoproteins. Seventy-three potentially active Pao copies were found to contain glycoproteins into their LTR boundaries. Thirteen out of these were classified as complete and likely autonomous copies, with a full LTR structure and protein domains. We also found 116 Pao copies with no trace of glycoproteins and 37 solo glycoproteins. All potential autonomous Pao copies, contained highly similar LTRs, suggesting a recent/current activity of these elements in the mosquito genomes.
Conclusion: Evolutionary analysis revealed that most of the glycoproteins found are likely derived from a single or few glycoprotein endogenization events associated with a recombination event with a Pao ancestral element. A potential functional Pao-chuvirus hybrid (named Anakin) emerged and the glycoprotein was further replicated through

INTRODUCTION
Viruses have long-term and intricate interactions parasitizing host cells and both viruses and hosts are subject to an endless arms race (Forterre and Prangishvili, 2009). A large body of evidence currently supports that virus/host interactions can occur at both the protein and nucleic acid levels. One clear example of the last, is that viral genomic sequences can be integrated into the host genome (Feschotte and Gilbert, 2012;Johnson, 2019). The process of viral genome integration is called endogenization and normally occurs as a "life cycle" stage in viral groups such as retroviruses and phage DNA viruses (Weiss, 2016). However, recent studies have shown that genomes, or genomic regions, of non-integrative viruses can also be found integrated into various eukaryotic genomes (Katzourakis and Gifford, 2010;Feschotte and Gilbert, 2012;Johnson, 2019). These viral loci have been called endogenous viral elements (EVEs).
Endogenization can occur by two main mechanisms: through non-homologous recombination mediated by double-strand break repair pathway of the host cell; or mediated by proteins, such as reverse transcriptases and integrases, from the endogenous retrotransposons-envelope-free retrovirus-like elements (Katzourakis and Gifford, 2010). Recent findings on mosquito genomes suggest that the latter mechanism is likely to be the most important, since the abundance and diversity of EVEs are positively correlated with retrotransposon abundance and activity (Palatini et al., 2017;Whitfield et al., 2017). Likewise, EVEs and some retrotransposons families were found in neighboring loci in the Aedes aegypti Aag2 genome assembly (Whitfield et al., 2017). Such association does not appear to be only a physical co-localization, but a result of putative antiviral mechanism mediated by the activity of some retrotransposon elements (Whitfield et al., 2017;Tasseto et al., 2019). Lastly, Crava et al. (2020) have shown that TEs from BEL-Pao superfamily are enriched into piRNA clusters found in the AaegL5 Ae. aegypti genome assembly and this enrichment can influence their association with EVEs.
EVEs, are generally found as fragments of exogenous viral genomes and it is therefore unlikely that they are able to generate new virus particles or infect new cells. Therefore, there are three non-mutually exclusive hypotheses regarding the fate and impact of these elements on the host genome: (i) EVEs may evolve neutrally accumulating mutations and degenerate over time; (ii) EVEs may be co-opted by the host genomes, giving rise to new functional host genes; and (iii) EVEs may play an antiviral role, generating small RNAs which degrade cognate exogenous viral RNA or non-functional viral proteins that hinders proper assembly/maturation of a new viral particles or blocks the viral receptor on the host cell surface (Robinson et al., 1981;Ito et al., 2013;Armezzani et al., 2014). The vast majority of studies on EVEs found in mosquito genomes focus on the role of piRNA production as a post-transcriptional regulatory mechanism of exogenous circulating viruses (Théron et al., 2013;Palatini et al., 2017;Whitfield et al., 2017). On the other side, there is only one study that considers the potential role of EVEs at the mRNA expression level (Pischedda et al., 2019).
Chuviridae is a recently discovered RNA viral family of negative-sense single-stranded viruses characterized by metatranscriptomic and bioinformatic analysis only (Shi et al., 2016). The information available on this family is limited to its distribution (it likely infects several arthropod species including mosquitoes), its variable genomic structure (unsegmented, bisegment and circular), and the presence of a number of EVEs in species from Amphipoda, Hemiptera, Coleoptera, Hymenoptera and Diptera genomes (Shi et al., 2016). An in-depth descriptive analysis of chuvirus-derived EVEs exists only for the Ae. aegypti Aag2 cell-line genome (Whitfield et al., 2017) displaying three intriguing features: a large abundance of chuvirus-derived EVEs compared to other viral families (it is the second most abundant EVE family, outnumbered only by the Rhabdoviridae family); an association with retroelements from the BEL-Pao superfamily; and a striking difference in the viral genome fragment endogenized-a much higher quantity of glycoprotein (Gly) compared to RNA dependent RNA polymerase (RdRp) and nucleoprotein (NP) sequences (Whitfield et al., 2017). Endogenization of different viral regions is expected to be influenced by the virus genome structure and orientation of the RNA replication process. Whitfield et al. (2017) have shown that EVEs derived from another non-segmented negative-sense single-stranded virus from the Rhabdoviridae family occur in the following order of abundance: NP − > Gly − > RdRp resembling the order in which the viral genomes is replicated in this viral family. On the other hand, Chuviridae viruses, which are similar to Rhabdoviridae in genomic structure (NP-Gly-RdRp), shows a very different endogenization pattern, with eighty-seven endogenous Gly sequences, only four from RdRp and no endogenized nucleoproteins (Whitfield et al., 2017). It may indicate that chuvirus-derived EVEs found in Ae. aegypti genome were either derived from an exogenous chuvirus with segmented or non-segmented genome and for some reason Gly has been majorly endogenized, or that the endogenization of chuvirus genomic regions occurred evenly at the beginning but only Gly was maintained through the evolutionary time.
Endogenous repetitive elements are abundant in metazoan genomes and mosquito genomes from the Aedes genus are particularly full of retrotransposons/retroviruses (Nene et al., 2006;Goubert et al., 2015;Whitfield et al., 2017). One of the most abundant LTR retrotransposons/retroviruses are from the BEL-Pao superfamily which is also very abundant and widespread in other metazoan genomes (Malik et al., 2000). Elements from this group have two long terminal repeat (LTR) regions, between 100 and 900 base pairs, and two coding regions-a capsid protein with a GAG domain, and a polyprotein, which commonly has four domains: aspartic protease (PR), reverse transcriptase (RT), RNAse H (RH), and integrase (INT). Moreover, at least three elements (Roo, Tas, and Cer-7) have an envelope-like protein (Gly) downstream of the polyprotein, which were acquired from Gypsy, Phlebovirus, and Herpesvirus, respectively (Felder et al., 1994;Browning et al., 1996;Malik and Henikoff, 2005).
In view of all the intriguing aforementioned chuvirus/BEL-Pao/host features, we investigated in depth which biological phenomenon has generated the high abundance of chuvirus glycoproteins found in mosquito genomes and examined the role these glycoproteins may play in BEL-Pao retroelements and mosquito biology. Our results showed, for the first time, that these EVEs are broadly present in mosquito genomes and that a large majority of the glycoproteins are physically associated with elements from the Pao family. We also found that most of the chuvirus-derived glycoproteins are structurally associated with potentially autonomous Pao elements and are likely to play a role in viral particle formation as an envelope protein. However, we also found structurally conserved solo glycoproteins, suggesting a potential role as an antiviral defense mechanisms.

Data Collection
A non-redundant database of proteins including all taxa and a database of chuvirus genomes were obtained from NCBI (last accessed January 2018). Mosquito genomes were retrieved from NCBI and Vectorbase (last accessed January 2018) and chuvirusderived EVEs already identified in mosquito genomes were retrieved from Whitfield et al. (2017). Mosquito genome sources and assembly metrics are presented in Supplementary Material 1. All command lines used in the present study can be found in Supplementary Material 2.

EVE Screening
Two BLAST-based strategies were used. The first used the chuvirus genome dataset as a query in a tBLASTx (Altschul et al., 1989) screening against mosquito genomes, and the second used the EVEs from Whitfield et al. (2017) as a query in a BLASTn analysis against mosquito genomes. All EVE regions were extracted from mosquito genomes in two ways: I-only the ungapped aligned region; II-ungapped aligned region plus 10 kb of each flanking region.
The resulting sequences were used as a query in a BLASTx analysis against the non-redundant protein database with different filters in order to eliminate false positives and false negatives chuvirus EVEs. Such filtering was performed in order to avoid two detected issues. First, as a number of viral proteins are still not annotated in the databases, if one considers the first hit alone in order to determine the viral origin one would disregard some EVEs. Second, some wrongly annotated viral proteins would cause the first hit to be miss-annotated as viral, while the following two or more hits would indicate that the sequence belongs to another taxon, generating a false positive result. Queries that showed four or fewer matches were annotated as an EVE if the best match was a viral protein. For queries with five or more subject matches, the proportion of the five best matches was taken into consideration, in accordance with the following criterion: a sequence is annotated as an EVE only if three or more subjects are viral proteins.
The EVE sequences containing the flanking sequences were clusterized using CD-HIT (Li and Godzik, 2005) to remove redundancy. Flanking sequences were retained to avoid clusterization of identical or very similar copies while allowing the removal of the same EVE copy recovered using the two search strategies (chuvirus genomes and chuvirus EVEs).

Chuvirus EVE Characterization
Using the above mentioned strategy, we obtained a total of 428 chuvirus-derived EVEs from mosquito genomes, of which 409 were glycoproteins, 18 RNA-dependent RNA polymerases and one nucleoprotein region. However, a number of these were found in small-size contigs and in close proximity to indeterminate "NNN" regions. In order to avoid genome assembly problems, we restricted our analysis to contigs bearing EVEs with at least 4 kb (Llorens et al., 2011)

of each flanking region and no undetermined bases (Supplementary Material 3).
For the remaining 322 sequences, we extracted potential coding regions using the EMBOSS getorf 1 tool, and ORFs containing fewer than 100 amino acids were removed from further steps, resulting in 279 sequences.

Flanking Sequence Analysis
All EVEs plus flanking regions passing the aforementioned filters were translated using the EMBOSS getorf software, and the resulting ORFs were used in a domain-signature analysis with BATCH-CDD (Marchler-Bauer et al., 2011) to characterize the genomic context of each EVE locus. Sequences with BEL-Pao signature domains-GAG, PR, RT, RNAse H, and INT (whether or not flanked by long terminal repeats-LTRs)were considered to be putative hybrids of a BEL-Pao element and chuvirus-derived sequences. For graphical representation, genetic maps are constructed with karyoploteR R-package (Gel and Serra, 2017). In order to identify orthologous EVE copies, we aligned EVEs plus flanking regions (around 20 kb of each side) with MAFFT (Katoh et al., 2001).

Search for Homologous BEL-Pao Elements
Nucleotide sequences of complete BEL-Pao elements containing domain signatures and LTRs recovered in the previous analysis were used as queries in a BLASTn analysis against the respective mosquito genomes to recover all homologous BEL-Pao copies. Sequences retrieved in this step were recovered with 10 kb of each flanking region and screened for the presence of chuvirusderived EVEs. Long terminal repeat (LTR) regions from all sequences retrieved were evaluated using LTR_FINDER with default parameters (Xu and Wang, 2007).
All EVEs were sorted, based on RdRp, Gly, or NP proteins and BEL-Pao Retrotransposons based on whether they are associated with chuvirus-derived glycoproteins or not. The structural conservation of the copies were sorted into the following categories: (i) potentially active retrotransposons containing both LTRs, BEL-Pao conserved protein domains (GAG-PR-RT-RH-INT) with or without a chuvirus-derived glycoprotein; (ii) defective elements which have at least one BEL-Pao domain and one or no LTRs; (iii) solo LTRs; and (iv) solo glycoproteins ( Table 1).

Molecular Modeling
Molecular modeling was performed for three groups of glycoproteins: I-Solo glycoproteins, corresponding to glycoproteins without the BEL-Pao signature in their flanking regions; II-glycoproteins found inside of BEL-Pao element boundaries (LTRs)-in order to select some Gly proteins found inside BEL-Pao retrotransposons (group II), an amino acid distance matrix was created using UGENE with simple similarity distance algorithm maintaining gap regions (Okonechnikov et al., 2012). One random sequence was chosen for molecular modeling from each cluster exhibiting more than 90% similarity (Supplementary Material 4); and IIIglycoproteins from bona fide chuviruses previously characterized by metatranscriptomics in mosquitoes.
The Phyre2 server (Kelley et al., 2015) was used to select a template for each protein of interest. The 3D structure of each template selected was obtained from the Protein Data Bank (PDB) (Berman et al., 2000) and submitted, together with the Gly sequence, to the Modeller package version 9.23 multimer modeling algorithm. The predicted models were evaluated for their stereochemical parameters using the Procheck tool (Laskowski et al., 1993).

Solo LTRs
LTRs from previously characterized elements were used as a query against mosquito genomes in a BLASTn analysis. Matches were recovered with 10 kb of the flanking regions. Sequences with a single LTR matching the mean length of LTRs identified using full BEL-Pao elements (around 670 base pairs) and no surrounding BEL-Pao domains were considered to be Solo LTRs (Supplementary Material 5).

Evolutionary Analysis
Two analyses were performed. The first was based on Gly protein sequences using only amino acid sequences with more than 100 amino acid residues. This analysis included most EVEs identified and chuvirus Gly from the literature and NCBI. The second analysis involved the evolutionary history of BEL-Pao retrotransposons and used amino acid sequences from reverse transcriptase and RNAse H (the most abundant domains present on BEL-Pao as characterized in the present study) (Supplementary Material 6). The second analysis used both BEL-Pao (with and without glycoprotein) and representative BEL-Pao retroelements from the five branches of the BEL-Pao superfamily included in the Gypsy Database 2.0 2 .
The amino acid sequences for both analyses were aligned with the MAFFT algorithm (Katoh et al., 2001) and automatically edited using Gblocks (Castresana, 1999) with relaxed parameters (Supplementary Material 2). The most likely amino acid substitution model was determined using SMS (Lefort et al., 2017) on the ATCG online platform 3 .
Phylogenetic trees reconstruction were carried out using MrBayes v3.2.2 × 64 (Ronquist et al., 2010), starting with three seeds and 1,000,000 generations. The convergence of the independent runs was detected when the standard deviation of all three seeds was lower than 0.05. The burn-in removed the first 25% of trees sampled and the remaining 75% were used to generate a posterior probability consensus tree and the phylogenies were visualized using iTOL (Letunic and Bork, 2007).

PCR Validation of Chuvirus Solo-Glycoproteins
Five solo glycoproteins without BEL-Pao signature were selected for PCR validation in three species available at the insectary of the Departamento de Entomologia-Instituto Aggeu Magalhães: Aedes aegypti, Aedes albopictus, and Culex quinquefasciatus, as well as the natural population of mosquitoes of the same species collected from different points at the Recife city ( DNA extraction was performed from pools of 5 individuals (females and males) of each species with the protocol established by Ayres et al. (2003), the quality and concentration of DNA was evaluated with NanoDrop 2000 (Thermo Fisher Scientific). Primers to amplify the endogenous gene of the sodium channel (sodium channel protein para-like LOC109432678), were used as control to evaluate the DNA samples integrity before the EVE screening. The primers sets that amplify part of the EVE and a region of the mosquito genome were designed with Primmer3 (Koressaar and Remm, 2007) and validated with PrimmerBLAST (Ye et al., 2011) and OligoAnalyzer Tool (Owczarzy et al., 2008) against the mosquitoes reference genomes ( Figure 1A). Primers information is shown in Supplementary Material 7. PCR was performed with GoTaq-Flexi G2 DNA-Polymerase following the Promega manufacturer's protocol. All PCR reactions are conducted in a final volume of 25 µL containing 1 µL of each primer (10 µM), 2uL of dNTP (0.      program: Initial denaturation at 94 • C for 2 min, 45 cycles at 94 • C for 1 min (denaturation), primer annealing at 51-59 • C for 50 s (depending on specific TM of each primer pair), extension at 72 • C for 1 min followed by a final extension at 72 • C for 5 min. DNA amplification was visualized with 2% agarose gel stained with ethidium bromide and bands with expected were extracted, purified and sequenced with DNA ABI Prism 3100 Genetic Analyser (Applied Biosystems) from both forward and reverse strands. Forward and reverse electropherograms are analyzed with Geneious Prime version 2019.1.3 (Kearse et al., 2012) and consensus sequences were generated. Contigs from sequenced products are then aligned with EVEs identified by in silico analyses with Aliview (Larsson, 2014).

Supporting Evidence of Widespread Chuvirus Endogenization in Different Mosquito Genomes
Four hundred and twenty-eight EVEs were identified in 32 out of 37 genomes screened. These elements correspond to 409 glycoprotein fragments, 18 RNA-dependent RNA-polymerases (RdRp) and one nucleoprotein (Supplementary Material 8).
After the exclusion of EVEs from small contigs and those in close proximity to uncertain sequences/assembled bases (NNNs), 279 sequences remained. Two hundred forty-one of these presented glycoprotein conserved amino acid domains.
In view of the previously described and confirmed abundance of chuvirus-derived glycoprotein EVEs, a more in-depth characterization of these EVEs was performed. Endogenous glycoproteins varied in size from 117 to 1977 nucleotides (median = 880) and amino acid length from 39 to 659 residues. The amino acid identity with each chuvirus genome used as a query varied from 29.03 to 56.41% (average = 32.83, SD = 6.31), the alignment length varied from 35 to 910 amino acids (average = 283.77, SD = 191.16) and e-value varied from 9,00E-05 to 0.0 (average = 6,00E-07, SD = 6,00E-06) (Supplementary Material 8). We could not identify orthologous EVE copies based on flanking region alignment analyses.

Glycoproteins Are Mostly Associated With Elements From the BEL-Pao Retrotransposon Superfamily
Screening for BEL-Pao protein domain signatures and LTRs we found glycoproteins in three different contexts (Figures 1, 2 and Table 1). Thirty-seven glycoproteins were categorized as solo glycoproteins, since no BEL-Pao protein domain or LTR signature was found ( Figure 1A). Two hundred glycoproteins were flanked by BEL-Pao protein domains but with only one LTR or none, characterizing them as defective BEL-Pao elements ( Figure 1B). Sixty-three glycoproteins were associated with BEL-Pao domains flanked by two LTRs, characterizing them as potentially active elements. Thirteen of the BEL-Pao elements including chuvirus-like glycoproteins had all domains and complete LTRs. We call these elements "Anakin." This name refers to the putative "switching sides" of viral glycoprotein found into a potential new retrovirus and solo glycoprotein that is likely counteracting against this retrovirus and circulating Chuviruses (Anakin element, Figure 1C).
We also identified solo LTRs in seven genomes ( Table 1). These vary in number from only one in the Anopheles gambiae Pimpirena assembly to 82 solo LTRs in the Aedes albopictus C636 assembly (Supplementary Material 5). Interestingly, the number of solo LTRs was found to be greater than the number of LTRs associated with full BEL-Pao elements in the Aedes albopictus C636 and Anopheles maculatus BTQ1 assemblies ( Table 1).

Endogenous and Exogenous Glycoproteins Have Similar 3D Structures
Although BLAST and phylogenetic analysis indicate that the EVEs found share a common ancestor with chuviruses, the amino acid identity is considerably low (between 25 and 50%). Another way to obtain further support for the viral origin of these endogenous sequences is through 3D structure modeling. If these polypeptides resemble viral glyco/envelope proteins in 3D space or have similar protein domains (Malik and Henikoff, 2005), this would corroborate their viral origin, as well as the protein function (Webb and Sali, 2017).
The templates identified by the Phyre2 tool represent many type B glycoproteins homotrimers from different viruses (Figure 3). It was possible to reconstruct threedimensional models for all glycoproteins analyzed. All pdb files of modeled glycoproteins are available on Supplementary Material 9 and Ramachandran Plots region values can be seen in Supplementary Material 10. Only two 3D models (AnMacBtQ1_01 and Wuhan Mosquito Virus 8) had more than 1.0% residues in disallowed regions. For all glycoproteins modeled the residues in core regions varied between 82.3 and 88.3% and in allowed regions from 9.3 to 14.2%.
The TM-alignment between representatives of solo glycoproteins, glycoproteins fused with BEL-Pao elements and bona fide chuvirus glycoproteins demonstrates a similarity of three-dimensional structures greater than 0.7 (Figure 3B), providing strong evidence that these glycoproteins are folded in a similar way.

Evidence of a Chuvirus ENV-Like Protein Captured by Elements From the BEL-Pao Superfamily
Bona fide chuviruses with non-segmented genomes (sequences in yellow) form a basal clade in the glycoprotein evolutionary tree (Figure 4), confirming the common origin of EVE glycoproteins and non-segmented chuviruses. It is also worth noting the existence of a basal clade of Aedes aegypti EVEs clustered with one bona fide chuvirus-Mos8Chu0 ( Figure 5A).
Four main findings can be reported regarding EVE clades: (i) the presence of various high-identical EVE copies, indicated by clades containing several sequences with near-zero branch lengths (Figures 5B,D); (ii) the presence of glycoproteins inside BEL-Pao retroelements from RepBase intervening into various EVE clades ( Figure 5C); (iii) the presence of several sologlycoproteins embedded into retroelements clades (Figure 5B), and (iv) the presence of sequences described as bona fide chuviruses (Lara Pinto et al., 2017;Pauvolid-Corrêa et al., 2016) inside and outside of clades mostly composed of EVE copies ( Figure 5D).
The evolutionary tree using the RT and RH regions from both BEL-Pao retroelements, including elements containing or not chuvirus glycoproteins, shows four clearly defined clades (Figure 6). One of these has 181 elements without glycoproteins closely related to BEL elements in Branch 1 (Figure 6). The other three are closely related clades with Pao elements from Branch 2, of which 129 are composed of elements with glycoproteins (red scalene triangles) and 122 of elements without glycoproteins (blue scalene triangles, Figure 6). It is clear that chuviruses glycoproteins are specifically associated with elements from the Pao family-the now called Anakin elements.
There is a large number of Pao elements without glycoproteins, but comparing it with the Anakin elements the last are more abundant and widespread, in a larger number of mosquito species, than Pao elements without glycoproteins (Figure 2 and Supplementary Material 11).

PCR Validation of Solo-Glycoproteins in Different Mosquito Strains
Five primer pairs were designed to amplify both chuvirus/mosquito integration junction of solo-glycoprotein regions. Four lineages of Ae. albopictus, two lineages of C. quinquefasciatus and two lineages of Ae. aegypti were screened ( Table 2 and Supplementary Material 12). At least one EVEmosquito boundaries were sequenced successfully for all five solo glycoproteins (Table 2), showing a consistent alignment to the corresponding genomic region in the available genomes FIGURE 3 | Molecular modeling of glycoproteins. (A) Tridimensional models where I AegAag2_20 element, an example of solo-glycoprotein; II AAlC636_23 element, an example of glycoprotein fusioned with a complete Pao element; and III represents the glycoprotein of Mos8Chu0. (B) Represents the tridimensional glycoproteins B alignments between: I: AegAag2_20 in purple and AAlC636_23 in blue, with TM-score equals to 0.79415; II: AegAag2_20 in purple and Mos8Chu0 in yellow, with TM-score equals to 0.76639; and III: AAlC636_23 in blue and Mos8Chu0 in yellow, with TM-score equals to 0.80812. Material 12) confirming the integration of the EVEs and highlighting that these insertions were conserved and an ancient component of these genomes.

DISCUSSION
Viruses and transposable elements share a common evolutionary history and hence shared several features, particularly for some retroviruses and retrotransposons that are differentiated by the presence/absence of an envelope gene (glycoprotein). The envelope protein is responsible for the infection capacity of the former (Hayward, 2017). But more than that, these entities exchange genetic information between them and with their host genome (Drezen et al., 2017;Frank and Feschotte, 2017;Gilbert and Cordaux, 2017;Sinha and Johnson, 2017). Several instances of exchange of functional genes that allowed major changes in their evolutionary history are known such as the acquisition of a complete herpesvirus by a piggyBac transposon (Inoue et al., 2017) or the emergence of retroviruses by envelope capture FIGURE 4 | Phylogenetic reconstruction of glycoproteins from Chuviridae family. Bayesian tree reconstructed after 5,000,000 generations from 3 seeds with standard deviation mean between final trees equal to 0.02. Posterior probability are depicted over each node, only values greater than 90 were plotted; Tip label colors denotes: Sequences identified as EVEs derived from Chuviridae family, blue and red color represents EVEs identified in Culicinae and Anophelinae subfamilies, respectively; Pink labels are retrotransposons of BEL-Pao superfamily available on RepBase and that showed similarity with chuviruses glycoproteins; Light orange labels are chuviruses identified from samples of different arthropods sampled in China by Shi et al. (2016); green labels are sequences described in Brazil as chuviruses by Lara Pinto et al. (2017); dark orange are chuvirus available on NCBI (access KX924631.1). Sequence structure: LTR, Long Terminal Repeat-red triangle; GAG Capsid protein-orange square; PR, Protease-light green square; RT, Reverse Transcriptase-light blue square; RH, RNAse H-blue square; Integras-purple square and GLY, Glycoprotein-pink square. Phylogeny available on: https://itol.embl.de/tree/200133261329821563538693. (C) clade with complete BEL-Pao Retrotroelements (Anakin); (D) clades with BEL-Pao derived elements associated with chuvirus glycoprotein and potential chuvirus sequences described by Lara Pinto et al. (2017). Sequences identified as EVEs derived from Chuviridae family, blue and red color range labels represent EVEs identified in Culicinae and Anophelinae subfamilies, respectively, retrotransposons of BEL-Pao superfamily available on RepBase and that showed similarity with chuviruses glycoproteins, range labels colored with purple; Chuviruses identified in China by Shi et al. (2016), range labels colored with light orange; Sequences described in Brazil as chuviruses by Lara Pinto et al. (2017), range labels colored with light green; Chuvirus available on NCBI (access KX924631.1), range label colored with dark orange; LTR, Long Terminal Repeat-red triangle; GAG, Capsid protein-orange square; PR, Protease-light green square; RT, Reverse Transcriptase-light blue square; RH, RNAse H-blue square; Integrase-purple square and GLY, Glycoprotein-pink square.  (Kim et al., 2004). Here, we characterized a new event of gene sequence exchange between viruses, the mosquito host genome and retrotransposons with the acquisition of a glycoprotein (envelope) gene from a chuvirus by a Pao retrotransposon (named Anakin) suggesting a past and likely current arms race between these entities.
The Chuviridae family was first described by Shi et al. (2016) in several arthropod species (including mosquitoes) through metatranscriptomic sequencing. Chuviruses have three possible genomic structures: a complete or bi-segmented circular genome, identified in ticks, crab, flies, spiders, cockroaches and mosquitoes, and a linear structure, identified in flies and crabs. In the bi-segmented structure, the glycoprotein gene is always flanked downstream by a nucleoprotein and by a viral particle protein (Shi et al., 2016). In the same study, the authors identified endogenous chuvirus elements in both mosquito and other insect genomes. Two subsequent studies also identified EVEs derived from chuvirus in mosquito genomes (Whitfield et al., 2017;Russo et al., 2019) but focused only on the Aedes aegypti genome. Whitfield et al. (2017) characterized two interesting features of chuvirus-derived EVEs from Ae. aegypti: (i) large differences among endogenized genomic regions, with several glycoproteins and few RdRps and nucleoproteins; and (ii) enrichment of BEL-Pao retroelements around these EVEs (Whitfield et al., 2017;Russo et al., 2019).
Virus genome replication is tightly linked to the virus genome structure. Replication origin and orientation may favor the endogenization of the genomic regions that are first copied/transcribed, since these regions are produced more abundantly than the last replicated/transcribed regions (Whitfield et al., 2017;Russo et al., 2019). Although little is known about chuvirus replication, the position of the genes in nonsegmented genomes suggests that endogenization should occur more frequently for nucleoproteins or RdRp (in the terminal regions of the virus genomes) than for glycoproteins (around the middle of the virus genome). However, for segmented genomes, each segment could be integrated independently and hence one should expect to find a similar amount of different integrated viral genomic regions. We detected more glycoproteins than nucleoproteins and RdRps regions integrated into the mosquito genomes. This endogenization pattern is not expected to any segmented or non-segmented Chuvirus genomes and hence does not allow us to reach a conclusion about the genomic structure of the original chuvirus genome. But, some additional evidence points to another more likely explanation to explain the glycoprotein discrepancy. We detected that many of these glycoproteins are integrated into potentially active BEL-Pao retrotransposons, more specifically into retroelements from the Pao family (Figure 6). It is important to note that a recent preprint partially confirmed these findings of chuvirus EVEs into the LTR boundaries of BEL-Pao elements (Crava et al., 2020). In addition, the phylogenetic analysis showed that both glycoprotein associated or not with Pao retrotransposons domains are embedded into multiple phylogenetic supported clades of Pao retroelements (Figure 4) and a similar amount of solo glycoproteins and other solo Pao derived domains (19 RH-RT) were found scattered into the mosquito genomes suggesting that the high glycoprotein copy number is a result of the replication of Pao retrotransposons with posterior decay of Pao protein domains and that this protein may be an integral part of these elements as an envelope gene. It is important to note that most of the findings reported in this study were derived from genome sequences of cultured cells of Ae. albopictus (C6/36) and Ae. aegypti (Aag2) and lab-reared strain from Ae. aegypti (Lvp) that have been maintained in the lab for many years, therefore further investigation on wild mosquito populations are necessary to access the evolutionary implications of the findings in natural populations.
Acquisition and loss of envelope proteins have been detected in a number of viruses and retrotransposons, blurring the distinction between these entities. Substantial evidence already exists showing that many retroviruses can turn into an intragenomic lifestyle when the ENV protein is lost or becomes defective as a result of mutations and that various retroviruses have originated from retrotransposons (intragenomic lifestyle) that acquired new envelope genes (Malik et al., 2000). Specifically for BEL-Pao retrotransposons, there are three wellstudied examples of ENV-like protein acquisition by active retroelements (Malik et al., 2000). The Tas retrotransposon from Ascaris lumbricoides acquired an ENV protein from Phlebovirus (Felder et al., 1994). The Cer retrotransposon from Caenorhabditis elegans acquired an ENV protein from Herpesvirus (Browning et al., 1996) and an ENV was acquired from a Gypsy retrovirus by a Roo retrotransposon (Malik and Henikoff, 2005). We describe here a fourth event, involving Pao retrotransposons and glycoproteins of chuvirus, supported by the strong phylogenetic association between glycoproteins of Pao retrotransposons and EVEs from the Chuviridae family and the identification of glycoproteins inside complete Pao structures flanked by LTRs. This recombination event associated with further retrotransposition of Anakin explains the high abundance of glycoproteins inside mosquito genomes when compared with the other chuvirus proteins.
The distribution of EVEs derived from chuvirus in mosquito species from both the Culicinae and Anophelinae subfamilies dates the integration of chuvirus glycoproteins into the ancestor of the Culicidae family, around 190 million years ago (Hedges et al., 2006). The endogenization of a chuvirus glycoprotein may have occurred directly into a Pao retrotransposon, thereby giving rise to the Pao retrovirus ( Figure 7A). Alternatively, this hybrid element may have emerged from a recombination event involving a Pao retroelement and a chuvirus-derived glycoprotein after its endogenization ( Figure 7A). After viral envelope protein endogenization into the host genome, two major events may occur: exaptation or molecular domestication of the viral protein leading to a new host function or the emergence of an antiviral mechanism. The domesticated viral envelope may be selected as a countermeasure against the cognate circulating virus to effectively prevent new virus particles from entering the cell. For instance, host endogenous envelope proteins may be produced and exported to the extracellular space by the host cells. These polypeptides will then compete for the host cell receptors with circulating viruses (Robinson et al., 1981;Ito et al., 2013;Armezzani et al., 2014;FIGURE 7 | General scheme of the chuvirus derived EVE integration into the germinative cells of the host genome, fusion with Pao retroelements and potential antiviral mechanisms by receptor competition. (A) Bonafide circulating chuvirus infecting a host cell and releasing their single-strand RNA into the cell cytoplasm (1); RT from retrotransposons fortuitously recognize the viral genome and retro transcribe to dsDNA (2); dsDNA is directly integrated into the host genome by double-strand break repair mechanisms or by recognition and integration using integrase proteins from retroelements (3) either in different genomic loci (4) or into a Pao retroelement boundaries (5); Expression of Pao proteins (6); (B) chuvirus-derived glycoproteins are dispersed in many genomic loci but have been replicated substantially by the Pao + chuvirus glycoprotein element (Anakin). Anakin retrovirus can assemble viral particles with envelope proteins allowing it to infect new host cells (1); Complete or defective solo glycoprotein are translated and exported to the extracellular environment (2) and can block new viral infection of both Anakin and chuvirus by binding to cell membrane virus protein receptors (3).
Johnson, 2019). The similarity of solo glycoprotein and glycoproteins from bona fide chuviruses and the Anakin retrovirus, as shown by sequence identity (Supplementary Material 13) and comparison of three-dimensional structures (Figure 5A), indicates that some of these endogenous glycoproteins may play a role in antiviral defense mechanisms against circulating chuviruses or even against Anakin retroviruses through competition for cell receptors (Figure 7B). This hypothesis is corroborated by the conservation of several solo glycoprotein integrations in different populations of Ae. aegypti, Ae. Albopictus, and Cx. quinquefasciatus species (Table 2). But, functional experiments are warranted to evaluate the transcription/translation of EVEs, its role as a receptor-blocker and the production of chuvirus and Anakin virus particles.
Finally, our results show the importance of taking EVEs into account in metaviromic studies (Nouri et al., 2018). Some of the chuvirus EVEs detected in our study are highly identical at the nucleotide level (99.08-100%) with previously described circulating chuviruses Kaiowa, Guato, Cumbaru, and Croada (Lara Pinto et al., 2017;Pauvolid-Corrêa et al., 2016). Although, we cannot rule out the possibility that some of these sequences may indeed have come from circulating chuviruses that infect Ae. albopictus species, the high identity values of these "viruses" with the EVEs found here strongly suggest that these previously defined chuviruses are in fact transcribing endogenous elements from the Ae. albopictus genome (Supplementary Material 14).
In this study we revealed the diversity of endogenous virus elements derived from the Chuviridae family in mosquito genomes. Our results show that such EVEs are widely distributed across the Culicidae family and are possibly involved in two major processes: the replication of Pao retroelements, through the acquisition of chuvirus glycoproteins; and a possible antiviral response against bona fide chuviruses and Anakin retroviruses originating from a fusion of Pao retroelements and the chuvirus envelope gene. These results shed new light on the dynamic evolution of EVEs and retrotransposons inside the mosquito genomes and point to the need for in vitro and in vivo experiments to test the hypothesis raised above using different mosquito populations, considering that our results are mainly based on cell culture assembly genomes.

DATA AVAILABILITY STATEMENT
The datasets generated for this study can be found in the Figshare, https://figshare.com/articles/In_and_outs_of_Chuviridae_ endogenous_viral_elements_origin_of_a_retrovirus_and_ signature_of_ancient_and_ongoing_arms_race_in_mosquito_ genomes/11336258.