Endogenous Retroviruses in Fish Genomes: From Relics of Past Infections to Evolutionary Innovations?

The increasing availability of fish genome sequences has allowed to gain new insights into the diversity and host distribution of retroviruses in fish and other vertebrates. This distribution can be assessed through the identification and analysis of endogenous retroviruses, which are proviral remnants of past infections integrated in genomes. Retroviral sequences are probably important for evolution through their ability to induce rearrangements and to contribute regulatory and coding sequences; they may also protect their host against new infections. We argue that the current mass of genome sequences will soon strongly improve our understanding of retrovirus diversity and evolution in aquatic animals, with the identification of new/re-emerging elements and host resistance genes that restrict their infectivity.


INTRODUCTION
Understanding the infection history of living organisms is essential to apprehend the (co)evolution of infection agents with their hosts, in particular on key aspects such as immunity. To this regard, viruses, which can spread between individuals and often cause disease, constitute a major point of interest.
Viruses form the most numerous and diverse group of genetic entities. They are generally constituted by single-or double-stranded RNA or DNA genomes embedded within a protein coat. Among them, retroviruses consist in single-stranded positive-sense RNA viruses with a DNA intermediate. After infection, a retrovirus reaches the cytoplasm of the host cell and produces DNA through the reverse transcription of its RNA genome into complementary DNA (cDNA). This step is catalyzed by the reverse transcriptase (RT), an RNA-dependent DNA polymerase that is generally encoded by the retrovirus itself. Subsequently, the cDNA molecule is integrated into the host nuclear genome through the action of the retroviral integrase, forming a provirus. Provirus genes are transcribed and translated by the host machinery, like classical cellular genes.
Retroviruses are delimited by long terminal repeats (LTRs), which carry a promoter sequence and are involved in interactions with the integrase for insertion. Retrovirus genomes classically contain three open reading frames: gag (5 , group-specific antigen), encoding core and structural proteins, pol (polymerase), producing a polyprotein with RT, protease and integrase domains, and env (3 , envelope), which codes for coat proteins. Additional accessory proteins are encoded by more complex retroviruses. After integration, ectopic homologous recombination can occur between both LTRs. This eliminates one LTR copy and the intervening sequence, generating the solo LTRs that are frequently found in genomes. Evolutionary switch can occur between retroviruses and noninfectious LTR retrotransposons through the gain or loss of envelope genes (Malik et al., 2000).
If they have infected the germ line, integrated proviruses, also called endogenous retroviruses (ERVs), can be maintained within host genomes over millions of years by "vertical" transmission from parents to offspring (Feschotte and Gilbert, 2012). Identifying ERVs in genomes, which is now facilitated by the ever increasing amount of sequence data, provides an idea of the infection history by retroviruses and might give some hints on their past and current diversity, as well as on their mutation rate (Patel et al., 2011;Aiewsakun and Katzourakis, 2015). This can also help to reassess the age of retrovirus groups and to understand the coevolution with their hosts Keckesova et al., 2009;Gilbert and Feschotte, 2010). Finally, such genomic analyses might identify so far unknown retroviruses and full-length elements with the potential for retained infectivity and reemergence (Wildschutte et al., 2016). The analysis of relics of past infections in genomes, also called "viral fossil record", has been called paleovirology (Patel et al., 2011).

VERTEBRATE ENDOGENOUS RETROVIRUSES
Endogenous retroviruses have been identified in all vertebrate lineages (Hayward et al., 2015). Despite the relatively low frequency of integration that is expected in germ cells compared to somatic cells, ERVs are major components of some tetrapod genomes, constituting with over 100.000 integrated elements as much as 8% of the human genome (Figure 1). ERVs are also present in birds, reptiles and amphibians, with higher genome content in birds: with 30,420 elements, they cover 1.4% of the chicken genome, compared to 0.1% of the Xenopus genome (ca. 2,300 elements; Holmes, 2011;Chalopin et al., 2015) (Figure 1).
In vertebrates, all retroviruses belong to a single large family called Retroviridae, possibly of monophyletic origin. Retroviridae are subdivided into two subfamilies (Orthoretrovirinae and Spumaretrovirinae) and seven genera: Alpharetroviruses, Betaretroviruses, Gammaretroviruses, Deltaretroviruses, Epsilonretroviruses, Lentiviruses, and Spumaviruses (Figure 2). While some vertebrate ERVs are too divergent to be assigned to specific exogenous retrovirus genera, ERVs have been classified into three main different classes based on phylogenetic evidence: class I, with elements related to Gamma-and Epsilonretroviruses, class II, including elements grouping with Lentiviruses, Alpha-, Beta-, and Deltaretroviruses, and class III, with Spumavirus-like sequences, which represents the most divergent retroviral group (Gifford et al., 2005;Hayward et al., 2015) (Figure 2). Two additional clades have been recently recognized: HERVS/L (Human endogenous retroviruses S/L)-like and SnRV (Snakehead fish retrovirus)-like elements (Hayward et al., 2015).
Lineage-specific differences in ERV clade distribution have been observed in tetrapods (Herniou et al., 1998;Gifford et al., 2005;Katzourakis et al., 2009;Hayward et al., 2015). Class I elements are widespread in tetrapods, with a stronger contribution of Gammaretroviruses compared to Epsilonretroviruses. Class II ERVs are largely confined to mammals and birds, with high copy number of Betaretroviruses in mammals and Alpharetroviruses in birds and alligator (Figure 2). Lentivirus-like ERVs are found in lagomorphs and carnivores, while Delta-like ERV sequences have not been FIGURE 1 | Endogenous retrovirus (ERV) genome coverage, copy number, and family distribution in different vertebrate species. Values were retrieved from Chalopin et al. (2015). na: not available. The distinction between SnRV-like and Spuma-like elements requires further investigation, since the SnRV-like clade has been defined only recently (Hayward et al., 2015).
FIGURE 2 | Phylogenetic analysis of zebrafish ERVs and other fish retroelements. The phylogeny was reconstructed from an alignment of RT (201 amino acids, translated from element copies) by Maximum Likelihood using PhyML (Guindon et al., 2010) with optimized parameters (best of NNI and SPR; optimized invariable sites). Branch values represent supporting aLRT non-parametric statistics. Zebrafish and exogenous retrovirus sequences are highlighted in green and orange, respectively. Gypsy LTR retrotransposon sequences were used as an outgroup. The tree was drawn using FigTree (http://tree.bio.ed.ac.uk/software/figtree/). reported so far (Hayward et al., 2015). Insertions of Foamy virus (Spumavirus) have been identified in the genomes of some mammals Han and Worobey, 2012b). HERVS/L-like sequences are present in almost all tetrapods tested, and SnRV-like elements are found in some reptiles (Hayward et al., 2015) (Figures 1 and 2).
The Retroviridae family is restricted to vertebrates. However, other types of ERVs of independent origins have been detected in tunicates, which are the closest living relatives of vertebrates within chordates. Two divergent ERV families have been identified in the genome of the marine appendicularian Oikopleura dioica, one of them having possibly gained its envelope gene from a paramyxovirus (RNA virus; Volff et al., 2004;Henriet et al., 2015). Retroviruses of independent origins have been detected in more divergent invertebrates and in plants (Malik et al., 2000).

FISH ENDOGENOUS RETROVIRUSES
Fish are, like other vertebrates, infected by exogenous retroviruses. Retroviruses have been found to be associated with tumors in fish intensively cultured for food, in wild fish populations showing signs of sickness, and in fish reared in the laboratory (Lepa and Siwicki, 2011;Coffee et al., 2013).
One example is the Walleye dermal sarcoma virus (WDSV), an Epsilonretrovirus associated with skin tumors in the walleye, a freshwater perciform (Walker, 1969). The snakehead retrovirus (SnRV) has been identified in a cell line derived from the striped snakehead fish (Frerichs et al., 1991). The SSSV virus has been isolated from Atlantic salmon swim bladder sarcomas (Paul et al., 2006).
The genomes of different fish species have been analyzed for the presence of ERV sequences . The species with sequenced genomes analyzed included (from the most related to human to the most divergent): the coelacanth, which is a lobe-finned fish related to tetrapods; teleost fish and other ray-finned fish; cartilaginous fish including sharks; and the sea lamprey, which is a jawless fish.
Altogether, ERV sequences are present in fish genomes, but with a lower content than in mammals (1-0.01% of the genome depending on the species; Chalopin et al., 2015) (Figure 1). In teleosts ERV contribution to genomes typically ranges from 0.033% in the compact Fugu genome, with ca. 1,800 insertions, to 0.76% in zebrafish, with more than 30,000 insertions. ERV content is an order of magnitude lower in elephant shark, coelacanth and lamprey, with a genome coverage of ca. 0.007% and 100-700 insertions. Epsilon-like (class I) and Spumavirusrelated (class III) ERVs are the major retroviral elements found in fish genomes (Figure 2). A more divergent ERV clade was found in the genomes of zebrafish, coelacanth, sea lamprey but also amphibians, which might represent an ancestral branch of vertebrate ERVs and encompasses SnRV retroviruses (Hayward et al., 2015).

Coelacanth
The major category of ERVs in coelacanth is constituted by SnRV-like retroviruses, but Epsilon-like elements are also present (Hayward et al., 2015). One insertion of an Epsilon-like element called CoeERV1-1 has been found at orthologous positions in the genomes of the two extant species of coelacanths, suggesting an at least 6-8 million year-old integration (Naville et al., 2014). Complete CoeERV1-1 elements and LTRs are 7.2 kb and 475 nt in length, respectively. 258 fragments of variable sizes similar to CoeERV1-1 have been identified in the whole genome of the African coelacanth Latimeria chalumnae, most of them being internally deleted or present as solo LTRs. Interestingly, CoeERV1-1 sequences are closely related to turtle and crocodile ERV sequences (Figure 2). This suggests horizontal transfer between reptiles and coelacanths, or infection of both lineages by related retroviruses (Naville et al., 2014).
In addition, an endogenous foamy virus-like element (Spumavirus) called CoeEFV has been identified in the genome of the African coelacanth (Han and Worobey, 2012a). Beside gag, pol, and env, two additional putative open reading frames have been detected at positions similar to mammalian foamy virus accessory genes but with no significant similarity. CoeEFV probably invaded the coelacanth genome more than 19 million years ago (Han and Worobey, 2012a).

Ray-Finned Fish
Ray-finned fish genomes mostly contain Epsilon-like sequences, with different sublineages (Herniou et al., 1998;Basta et al., 2009; Figures 1 and 2). Some of these sequences occasionally include additional open reading frames, encoding for example a 2 ,3 -cyclic nucleotide 3 -phosphodiesterase (CNPase) or macro domain proteins (Basta et al., 2009). ERVs have been found in different ray-finned fish species, including an element showing 200-300 copies in the very compact genome of the pufferfish Tetraodon nigroviridis (Fischer et al., 2005).
The best studied Epsilon-line ERV in teleost fish is ZFERV from zebrafish (Shen and Steiner, 2004) (Figure 2). The ZFERV provirus is 11.2 kb in length and phylogenetically related to the salmon swim bladder sarcoma virus (SSSV). Transcription is predominantly detected in the thymus of both larval and adult fish, under the form of several transcripts. ZFERV is amplified in zebrafish T-cell leukemia (Frazer et al., 2012). The fusion core of the ZFERV envelope protein has been characterized at the functional level (Shi et al., 2015). In addition to ZFERV-related elements, we could detect two additional families of Epsilonlike sequences in the zebrafish genome that clearly segregate in a phylogeny, indicating the presence of divergent ERVs in this fish (Figure 2). The high degree of similarity between ERV elements within each phylogenetic group suggests recent introduction and/or spreading in the zebrafish genome.
Endogenous Foamy virus (EFV) sequences (Spumaviruses) have been detected in the genome of several teleost fish species including the platyfish, the zebrafish and the Atlantic cod (Llorens et al., 2009;Schartl et al., 2013) (Figure 2). The molecular phylogeny of EFVs is consistent with the host phylogeny; their distribution supports an ancient marine evolutionary origin, with possible host-virus coevolution. In the platyfish Xiphophorus maculatus, several almost intact envelopeencoding copies are present (Schartl et al., 2013). The presence of nearly non-corrupted elements in divergent teleost species and the patchy distribution of the virus suggest independent infectious introductions into the germ-line. Even if exogenous Foamy viruses have not been described so far for teleosts, exogenous versions of the virus might have been recently active and could still be infectious in ray-finned fishes.
Finally, the zebrafish genome contains several SnRV-like sequences, possibly with two distinct families (Figures 1 and 2). Neither Gamma sequences nor class II elements have been detected so far in teleosts (Hayward et al., 2015).

Cartilaginous Fish
Screening of the genome of the elephant shark Callorhinchus milii for retroviral sequences (CmiERVs) has revealed the presence of three (nearly) complete ERV insertions and many short ERV fragments (Han, 2015). Phylogenetic analysis revealed three major lineages, one clustering with the snakehead fish retrovirus SnRV (Figure 2), and two grouping with Epsilonretroviruses from walleye and amphibians (Han, 2015).

Lamprey
Very divergent elements were found in the genome of the lamprey; they group together with Spumaviruses according to our RT-based phylogeny, but with only a very low support (Figure 2). ERV abundance is lower in the lamprey compared to most other vertebrates (Hayward et al., 2015) (Figure 1).

DO FISH ERVs FULFIL FUNCTIONS USEFUL FOR THEIR HOST?
While most ERVs have no clear role for their hosts, recent results have demonstrated beneficial functions having increased the fitness of the organism (Warren et al., 2015;Naville et al., 2016). Such data have been obtained in mammals, but almost nothing is known on potentially useful roles on genome function and evolution in fish. Given the common evolutionary origin of mammals and fish ERVs, we believe that these sequences could have advantageous functions in fish too.
As mobile and repeated sequences, ERVs are mediators of genomic plasticity (Goodier and Kazazian, 2008). They can disrupt sequences through insertion, and can recombine to mediate DNA rearrangements. Such events might destroy important genomic sequences and negatively affect the fitness of their hosts. On the other hand, the effect of ERVs on genome plasticity might catalyze genome evolution and generate advantageous rearrangements, for example new duplicated genes or gene combinations, with a possible role in speciation. In primates, ERVs have mediated genomic rearrangements during evolution (Hughes and Coffin, 2001). Interestingly, homologous recombination between LTRs has been shown to be involved in post-speciation genome divergence in coelacanths (Naville et al., 2014).
In mammals, ERVs and other transposable elements are able to modify gene expression through the regulatory sequences they carry. They can even put multiple genes under a same new regulation and rewire complete regulatory networks (Feschotte, 2008). For instance, lineage-specific ERVs have dispersed multiple interferon-inducible enhancers independently in different mammalian genomes (Chuong et al., 2016). Again, nothing is known about such a role for ERVs in fish. However, another transposable element, a non-autonomous DNA transposon called EnSpmN6_DR, has shaped the repertoire of p53 target genes in zebrafish through the spreading of embedded p53-responsive elements to the vicinity of genes (Micale et al., 2012). This indicates that regulatory network rewiring by transposable elements, and potentially by ERVs, is possible in fish.
Endogenous retroviruses (ERVs) and other transposable elements have been described as a source of non-coding RNA genes, including microRNA (miRNA) and long non-coding RNA (lncRNA) genes. A strong contribution of ERVs to noncoding RNA genes has been observed in human and mouse stem cells (Fort et al., 2014). Only 5% of miRNA are derived from transposable elements in fish (compared to ca. 20% in human), mostly from DNA transposons (Qin et al., 2015). About 20% of zebrafish lncRNAs contain TE sequences, but with little contribution of LTR elements (Kapusta et al., 2013). The reduced contribution of ERV-like sequences is not completely surprising, since the zebrafish genome is dominated by DNA transposons .
Finally, retroviruses and other transposable elements have been shown to serve as new protein-coding genes during evolution in vertebrates and other lineages. Some of these genes have important functions for their host (Volff, 2006;Aswad and Katzourakis, 2012;Warren et al., 2015;Naville et al., 2016). In mammals, genes derived from endogenous retrovirus envelope sequences encode proteins called Syncytins that are involved in the formation of the placenta (Lavialle et al., 2013). Almost nothing is known on retrovirus-derived genes in fish. However, a gene called Gin2, derived from an integrase, has been identified in fish and some other vertebrate lineages (Marín, 2010;Chalopin et al., 2012). This gene was detected in all ray-finned fish species tested as well as in coelacanth and cartilaginous fish (elephant shark; Chalopin et al., 2012). The HHCC zinc finger from the ancestral integrase has been kept, suggesting that the GIN2 protein is able to bind DNA or RNA. Gin2 is expressed during gastrulation in zebrafish (Chalopin et al., 2012).
Interestingly, some retrovirus-derived genes are involved in resistance against retroviral infections in vertebrates. This is the case for the mouse Friend virus susceptibility-1 (Fv1) gene (Best et al., 1996). Fv1, which is derived from the gag gene of the MERV-L endogenous retrovirus family, protects the host against infection by the murine leukaemia virus (MLV) and other types of retroviruses. The Fv1 protein blocks MLV infection through interaction with the capsid protein (Nair and Rein, 2014). Other genes derived from ERV gag and env genes are involved in resistance against retroviral infection in mouse, sheep, cat, and chicken (Varela et al., 2009;Aswad and Katzourakis, 2012). We propose that retrovirus-derived genes that are still to be identified might be involved in resistance against retroviral infections in fish too.

CONCLUSION
This survey of ERVs highlights the relative paucity of knowledge in fish compared to mammals, and a number of missing data. In particular, several ERV groups lack any related exogenous viruses from which they would have originated. As an example, no fish Foamy virus has been described, despite the presence of endogenous foamy viruses in several fish species. This illustrates well how useful is the analysis of endogenous viral sequences in genomes to better characterize the diversity of infectious agents. Further studies on new "aquatic" genomes may uncover additional families or even new types of viruses. Comparative analysis of new genomes will also help reevaluating the age and evolutionary history of retroviruses. For example, detection of Foamy-like endogenous elements in fish strongly supports an ancient marine evolutionary origin (Schartl et al., 2013).
Much work is also required to better understand the evolutionary impact of ERVs on fish genomes: how they contribute to the evolution of genome architecture, RNA and protein repertoire and regulatory networks. Strong effects have been reported in mammals, which present a much lower level of diversity of transposable elements than fish . ERVs and other types of mobile and repeated sequences have the potential to play a very significant role in the huge level of biological diversity observed in fish, which affects many aspects including development, morphology, physiology, behavior, reproduction, and ecology (Volff, 2005). Of particular interest for the aquaculture are retrovirusderived resistance genes restricting new infections (Aswad and Katzourakis, 2012;Chuong et al., 2016). Candidates for such genes could be identified through a screening of genomes for virus-derived sequences that can then be tested at the functional level.
Finally, recent work suggests that a similar analysis can be also applied to RNA viruses without DNA stage and to DNA viruses, for which genomic integrated forms have been also detected in vertebrates (Katzourakis and Gifford, 2010;Holmes, 2011). Accordingly, DNA sequences related to Parvoviruses, which possess linear single-stranded DNA genomes, have been identified in the genome of a pufferfish (Liu et al., 2011). The functional interconnections between host and viruses within genomes in fish might thus be of wider significance than previously thought.

AUTHOR CONTRIBUTIONS
J-NV and MN have drafted the manuscript, MN has analyzed sequence data.

FUNDING
This work has been supported by a grant from the Ecole Normale Supérieure de Lyon.