Additional ORFs in Plant LTR-Retrotransposons

LTR-retrotransposons share a common genomic organization in which the 5′ long terminal repeat (LTR) is followed by the gag and pol genes and terminates with the 3′ LTR. Although GAG-POL-encoded proteins are considered sufficient to accomplish the LTR-retrotransposon transposition, a number of elements carrying additional open reading frames (aORF) have been described. In some cases, the presence of an aORF can be explained by a phenomenon similar to retrovirus gene transduction, but in these cases the aORFs are present in only one or a few copies. On the contrary, many elements contain aORFs, or derivatives, in all or most of their copies. These aORFs are more frequently located between pol and 3′ LTR, and they could be in sense or antisense orientation with respect to gag-pol. Sense aORFs include those encoding for ENV-like proteins, so called because they have some structural and functional similarities with retroviral ENV proteins. Antisense aORFs between pol and 3′ LTR are also relatively frequent and, for example, are present in some characterized LTR-retrotransposon families like maize Grande, rice RIRE2, or Silene Retand, although their possible roles have been not yet determined. Here, we discuss the current knowledge about these sense and antisense aORFs in plant LTR-retrotransposons, suggesting their possible origins, evolutionary relevance, and function.


INTRODUCTION
LTR-retrotransposons are transposable elements (TEs) characterized by the presence of two long direct repeats (long terminal repeats, LTRs) flanking an internal region that contains the gag and pol genes encoding proteins required for transposition (Figures 1A,B). Long terminal repeats provide the promoters and terminators associated with the transcription of the LTR-retrotransposon by RNA polymerase II (Kumar and Bennetzen, 1999). The internal region contains the primer binding site (PBS) and the polypurine tract (PPT), both used during the retrotransposition process. The PBS is a 10-20-nucleotide sequence located next to the 5 LTR that can partly base-pair with the 3 end of a cytoplasmic tRNA. The PBS is used to prime the synthesis of the first DNA strand during the retrotranscription process. The PPT is a short stretch of purine-rich DNA (8-49 nt) located in the internal region next to the 3 LTR and is used to prime the synthesis of the second DNA strand during retrotranscription. The internal region also contains the gag and pol genes, which encode all the proteins necessary for the retrotranscription and integration processes not provided by the cell. Gag encodes the structural proteins, including capsid (CA) and nucleocapsid (NC), that assemble into virus-like particles (VLPs) Dodonova et al., 2019). Pol encodes the proteins that provide the enzymatic machinery for reverse transcription and integration into the host genome: aspartic proteinase (AP), reverse transcriptase (RT), RNase H (RH), and integrase (INT) Kumar and Bennetzen, 1999).
As sequence data accumulate, the recognizing of sequences encoding for additional proteins (aORFs) in the internal region of plant LTR-retrotransposons seems to be more frequent (Neumann et al., 2019). The aORF can be found in LTRretrotransposon families with low or high copy numbers, in sense or antisense orientation with respect to the gag-pol genes, and upstream or downstream of them (Steinbauerová et al., 2011). Among them, those located between pol and 3 LTR in sense or in antisense with respect to the gag and pol are the most frequent.

PLANT LTR-RETROTRANSPOSONS WITH CONSERVED aORFs IN SENSE ORIENTATION BETWEEN pol AND 3 LTR
Retroviruses and LTR-retrotransposons share many structural features, and the main difference is that all retroviruses contain a third coding domain in their internal region called env that is located between pol and 3 LTR. Env encodes for proteins involved in interacting with cellular receptors and mediate fusion of the host and viral membranes (Nisole and Saïb, 2004). Since retrotransposons have an intracellular retrotransposition cycle, it was initially thought that they do not need the ENV protein and, in fact, most do not have it. However, the presence of coding domains in addition to the gag and pol genes between pol and 3 LTR in the same sense as gag and pol (aORFs-3S) have been described in some LTR-retrotransposons in insects (Song et al., 1994) and in plants (Vicient et al., 2001;Wright and Voytas, 2002;Laten and Gaston, 2012). They encode proteins with certain similarities to ENV, suggesting that they may also exist in, at least, some LTR-retrotransposons ( Figures 1C,D).
One of the conserved characteristics of the retroviral env domains is that they code for proteins with transmembrane domains, a characteristic that some of these aORFs-3S also have, which may suggest some functional similarity (Vicient et al., 2001). On the other hand, the retroviral mRNA is usually spliced to give rise to a message capable of expressing the envelope protein. Similar splicing events have been reported for some plant retrotransposons like barley Bagy2 (Vicient et al., 2001). Together, the data suggest that aORFs-3S could encode proteins with functional similarities to retroviral ENV and therefore they are often called env-like domains (Carvalho et al., 2010). However, an ENV-like function in plants remains controversial because the plant cell wall may represent a barrier to the interaction of ENV with cellular receptors present in the plasma membrane. As a consequence, it has been proposed that ENV-like proteins in plants may have a different function than in retroviruses as, for example, to modify the molecular size exclusion limit of plasmodesmata (Carrington et al., 1996) or to serve as chaperone proteins to facilitate replication (Havecker et al., 2004).

PLANT LTR-RETROTRANSPOSONS WITH CONSERVED aORFs IN ANTISENSE ORIENTATION BETWEEN pol AND 3 LTR
The presence of coding domains in addition to the gag and pol genes between pol and 3 LTR in antisense with respect to gag and pol (aORFs-3R) has been described in different plant LTR-retrotransposons. Some examples are maize Grande1, rice RIRE2, Retand from Silene latifolia, and PvRetro13 from Phaseolus vulgaris (Martínez-Izquierdo et al., 1997;Ohtsubo et al., 1999;Kejnovsky et al., 2006; Figures 1E-H). They are also present in non-plant species as, for example, in Boty from Botrytis cinerea and Sclerotinia sclerotiorum (Zhao et al., 2011; Figure 1I). The function of these aORF-3Rs is not known, but their presence in most of the copies of a family, with a degree of sequence conservation similar to that of other retrotransposonencoded proteins, suggests that they may be important for the retrotransposition process (Ohtsubo et al., 1999;Gómez-Orte et al., 2013).
Although a complete analysis of the presence, species distribution, and types of these aORF-3Rs is not yet available, the current data indicates that some of them are distributed in several species. For example, the sequences of the aORF-3R of RIRE2, Wallabi, and Gran3, from different species of the genus Oryza, show similarities with the aORF-3R of Grande from Zea species (Ohtsubo et al., 1999). Retand aORF-3Rs from S. latifolia show homology with sequences from other species, and sequences similar to PvRetro13 aORF-3R were detected in other species (Gao et al., 2014). These homologies indicate an ancient origin, at least for some of the aORF-3R.

ORIGIN OF ANTISENSE aORFs
Retroviruses have the potential to capture complete or parts of cellular genes in a process known as gene transduction. Gene transduction events have also been described in some Class I TEs. For example, human L1 retrotransposons can capture gene fragments by transduction (Goodier et al., 2000). There are also some described examples in plants. Bs1, a maize LTR-retrotransposon, has transduced sequences from different host genes (Elrouby and Bureau, 2010). A total of 400 genes have been identified as transposon-captured genes in maize (Schnable et al., 2009), 672 in rice and 1343 in sorghum (Jiang and Ramachandran, 2013). In maize, the majority of TE-captured genes were from Helitron elements, in rice from Pack-MULEs, and in sorghum from LTR-retrotransposons, and a high percentage of LTR-retrotransposon-captured genes are still functional in sorghum (Jiang and Ramachandran, 2013). However, retrotransposon-transduced gene sequences are usually in the same sense as the retrotransposon gag and pol genes. Moreover, the lack of gene sequences similar to that of aORFs outside the TEs suggests that aORFs are not transduced gene sequences. Another possible origin of the aORF is the insertion of a TE that, once inserted, became part of the element, losing part of its structure. However, although nested insertions of TEs are relatively frequent in plant genomes (SanMiguel and Bennetzen, 1998), the lack of similarity of the aORF sequences with that of other TEs or viruses does not support this hypothesis. In consequence, the origin of most of these aORFs remains unknown.

FUNCTION OF ANTISENSE aORFs
No clear similarities with other proteins in databases have been described for any of the proteins encoded by aORFs-3R. However, some of these peptides localize in the nucleus, as the one encoded by Grande (GENE23; Gómez-Orte et al., 2013), and the one encoded by PvRetro13 (ORF2) contains a conserved SMC domain (structural maintenance of chromosomes) that binds DNA and acts in organizing and segregating chromosomes for partitioning (Marchler-Bauer et al., 2011). These results suggest that these proteins may fulfill some nuclear function. Moreover, the aORF-3R protein encoded by Retand (ORF4) contains a transposase 28 domain (pfam04195), suggesting a possible role in the retrotransposition process. During retrotransposition, the pre-integration complex (PIC) produced in the cytoplasm must translocate to the nucleus (McLane et al., 2008). In some retrotransposons, like the fission yeast Tf1, the nuclear localization signal is provided by GAG (Kim et al., 2005), but in retroviruses some accessory proteins are involved (Vogt, 1997;Kogan and Rappaport, 2011). Many retroviruses encode more ORFs in addition to gag, pol, and env, called accessory factors. These ORFs could be in antisense orientation and be located between pol and 3 LTR. The Accessory factors encode for structural and enzymatic proteins essential for the regulation of transcription (Tat), the transport of unspliced and partially spliced viral RNAs from the nucleus into the cytoplasm (Rev), and others (Vif, Vpr, Vpu, Vpx, and Nef) (Sauter and Kirchhoff, 2018). All these suggest that the proteins encoded by aORFs-3R in LTR-retrotransposons may play a role similar to some of the retroviral accessory proteins, regulating the retrotransposition process.

TRANSCRIPTION OF aORFs-3R
The transcription of gag-, pol-, and env-like genes in LTRretrotransposons is directed by a promoter located in the 5 LTR. In maize Grande, the region corresponding to gene23 (aORF-3R) is ubiquitously transcribed in a relatively high level in antisense with respect to the gag-pol genes. This transcription is directed by a promoter located in the upstream region of gene23 (Gómez et al., 2006;Vicient, 2010;Gómez-Orte et al., 2013). A weak ubiquitous expression was also detected for Retand aORFs-3R (Kejnovsky et al., 2006). Antisense promoters are not unusual in retrovirus. For example, HTLV has an antisense promoter located inside the 3 LTR (Barbeau and Mesnard, 2011). Antisense promoters have been also identified in other retrotransposons as, for example, the apple Mdoryco1-1 (Wang et al., 2017), Arabidopsis thaliana AtRE1 (Kato et al., 2005), and Drosophila hydei micropia (Lankenau et al., 1994). However, in the case of micropia, it transcribes two antisense RNAs of 1.0 and 1.6 kb with no protein coding capacity. Antisense non-coding transcripts have also been described for the Drosophila non-LTR TART element (Danilevskaya et al., 1999) and in mouse IAP elements that have both sense and antisense promoter activities in their LTRs (Druker et al., 2004). In view of these last examples, we cannot rule out that, at least in some cases, the production of antisense mRNAs has itself some regulatory roles. Antisense transcripts may produce double-stranded RNAs (dsRNAs) when they hybridize with the genomic RNA of the LTR-retrotransposon, generating 21-24-nt siRNAs, which can act as inhibitors of the retrotransposition by a dsRNA-mediated silencing mechanism. These two possible functions based on the generation of a protein or on the generation of dsRNA can be true simultaneously and represent a fine regulation of the LTR-retrotransposition mechanism.

CONCLUSION
Additional open reading frames located between the pol gene and the 3 LTR are present in some plant LTR-retrotransposon families. Sense aORFs show some functional and structural characteristics similar to the env genes in retroviruses, although their possible roles in retrotransposition remain unclear. Antisense aORFs are also present in different retrotransposon families, but their functions are yet unknown. The nuclear localization identified in some cases and the comparison with the antisense genes of retroviruses suggest they may play a regulatory role in retrotransposition. Antisense transcription may also play a regulatory role itself, through a dsRNAmediated silencing mechanism. In conclusion, we believe that it is necessary to pay more attention to the presence of this type of additional ORFs in the annotations of the TEs. We also think that it is necessary to look at the possible presence of antisense and spliced transcripts. Finally, we think it would be interesting to carry out research efforts on the possible functions that the transcripts and the proteins they encode could perform.

AUTHOR CONTRIBUTIONS
CV drafted the manuscript with contributions of JC. Both CV and JC revised and approved the manuscript.

FUNDING
This work was funded by grants AGL2016-78992-R from FEDER/Ministerio de Ciencia, Innovación y Universidades-Agencia Estatal de Investigación (Spain) and by the CERCA Programme of the Generalitat de Catalunya. We also acknowledge financial support from the Spanish Ministerio de Economía y Competitividad through the "Severo Ochoa Programme for Centres of Excellence in R&D" 2016-2019 (SEV-2015-0533).