Small RNA-Omics for Plant Virus Identification, Virome Reconstruction, and Antiviral Defense Characterization

RNA interference (RNAi)-based antiviral defense generates small interfering RNAs that represent the entire genome sequences of both RNA and DNA viruses as well as viroids and viral satellites. Therefore, deep sequencing and bioinformatics analysis of small RNA population (small RNA-ome) allows not only for universal virus detection and genome reconstruction but also for complete virome reconstruction in mixed infections. Viral infections (like other stress factors) can also perturb the RNAi and gene silencing pathways regulating endogenous gene expression and repressing transposons and host genome-integrated endogenous viral elements which can potentially be released from the genome and contribute to disease. This review describes the application of small RNA-omics for virus detection, virome reconstruction and antiviral defense characterization in cultivated and non-cultivated plants. Reviewing available evidence from a large and ever growing number of studies of naturally or experimentally infected hosts revealed that all families of land plant viruses, their satellites and viroids spawn characteristic small RNAs which can be assembled into contigs of sufficient length for virus, satellite or viroid identification and for exhaustive reconstruction of complex viromes. Moreover, the small RNA size, polarity and hotspot profiles reflect virome interactions with the plant RNAi machinery and allow to distinguish between silent endogenous viral elements and their replicating episomal counterparts. Models for the biogenesis and functions of small interfering RNAs derived from all types of RNA and DNA viruses, satellites and viroids as well as endogenous viral elements are presented and discussed.

INTRODUCTION Viral small RNAs have been discovered in Nicotiana benthamiana plants inoculated with potato virus X (genus Potexvirus, family Alphaflexiviridae) using RNA blot hybridization, and their abundance was found to gradually increase in the time course of viral infection (Hamilton and Baulcombe, 1999). A pioneering work of Kreuze et al. (2009) and the followup studies listed in Supplementary Tables S1, S2 and discussed below have established that both RNA and DNA viruses as well as viral satellites and viroids can be identified and their genomes partially or fully reconstructed by deep sequencing and bioinformatic analysis of small RNA population (small RNA-ome) from an infected plant. Likewise, small RNA deep sequencing can be used for virus detection and assembly of viral genomes from fungi (Vainio et al., 2015;Yaegashi et al., 2016;Donaire and Ayllón, 2017;Velasco et al., 2018) and from invertebrate animals Aguiar et al., 2015;Fung et al., 2018), including insect vectors of the plant viruses transmitted in a propagative manner (Xu et al., 2012;Fletcher et al., 2016;de Haro et al., 2017). Such universality of the small RNA-omics approach for virus diagnostics is based on evolutionary conservation of the small RNA-generating RNA interference (RNAi) and gene silencing machinery that regulates gene expression and defends against invasive nucleic acids such as transposons, transgenes and viruses in most eukaryotes (Ghildiyal and Zamore, 2009;Nayak et al., 2013;tenOever, 2016). In the case of mammals where an interferon system has evolved to limit viral infections, potential contribution of RNAi to antiviral defenses remains a matter of debates (Jeffrey et al., 2017;tenOever, 2017). Nonetheless, virus-derived small RNAs generated by RNAi and/or other mechanisms are detectable by deep sequencing and could be used for virus identification in mammals and other vertebrates (Parameswaran et al., 2010;Wang F. et al., 2016).
This review article compiles and summarizes growing evidence demonstrating a universal power of small RNA-omics for diagnostics of all types of plant viruses and for exhaustive reconstruction of viromes of various cultivated and noncultivated plants. Plant virus diagnostics and genome assembly by high-throughput sequencing of other target nucleic acids such as long single-stranded or double-stranded RNA and, in the case of DNA viruses, circular viral DNA enriched by a rolling circle amplification method have been previously reviewed (Boonham et al., 2014;Roossinck et al., 2015;Wu Q. et al., 2015;Jones et al., 2017;Roossinck, 2017;Maliogka et al., 2018;Jeske, 2018). Furthermore, this review illustrates mechanisms of biogenesis and function of small interfering RNAs (siRNAs) derived from plant RNA and DNA viruses, which have been dissected by deep sequencing and bioinformatics analysis of viral small RNA profiles in the model plant Arabidopsis thaliana (family Brassicaceae) and its RNAi-deficient mutant lines, and discusses conservation of these mechanisms in other plant species and families. Finally, the review presents models for biogenesis and possible functions of siRNAs derived from endogenous viral elements (EVEs) which represent host genome-integrated counterparts of extant or ancient episomal DNA viruses of the families Caulimoviridae and Geminiviridae and of some RNA viruses, and highlights characteristic differences in siRNA size, polarity and hotspot profiles between a silent EVE and its episomal copy that can potentially be released from the host genome to cause systemic and transmissible infection.

ALL FAMILIES OF LAND PLANT VIRUSES AND VIROIDS SPAWN SMALL RNAs IN INFECTED HOSTS
Reviewing a large number of studies that applied small RNA deep sequencing for virus identification and antiviral defense characterization in naturally or experimentally infected host plants (compiled in Supplementary Tables S1, S2) revealed that all 26 families of land plant viruses (Supplementary Table S1 and Supplementary List S1) and 2 families of viroids (Supplementary Table S2) spawn small RNAs that can be assembled into contigs representing partial or complete virus/viroid genomes. Such characteristic small RNAs, whose size-class and polarity profiles (analyzed in many but not all cases) are consistent with those of bona fide siRNAs generated by the plant RNAi machinery (described below), have been reported for circular single-stranded (ss)RNA viroids replicating in both chloroplasts (Avsunviroidae) and nuclei (Pospiviroidae) Tables S1, S2 and Supplementary List S1). Likewise, siRNAs have been reported for ssRNA satellites (unassigned family) associated with several helper (+)ssRNA viruses from Bromoviridae, Secoviridae, Tombusviridae, and Virgaviridae as well as for ssDNA betasatellites (Tolecusatellitidae) associated with helper ssDNA viruses of Geminiviridae (Supplementary Table S1 and Supplementary List S1). So far, ssDNA alphasatellites (Alphasatellitidae) associated with helper ssDNA viruses of Geminiviridae and Nanoviridae (see Supplementary List S1) have not been analyzed by small RNA sequencing. It is conceivable that alphasatellites are also targeted by the plant RNAi machinery like their helper viruses. Same assumption applies to those genera (and species) within the above-listed plant virus/satellite/viroid families for which small RNA deep sequencing data are not available yet (Supplementary List S1, genera highlighted in red).
The host plant species that accumulated viral small RNAs sufficient for virus, satellite or viroid identification belong to 53 botanical families of the kingdom Plantae, with 49 families from the clade Angiosperms (Supplementary List S2 and Supplementary Tables S1, S2). Within angiosperms the families of eudicots (n = 37), monocots (n = 10) and ANITA grade basal angiosperms (n = 2) are represented with at least one plant species. The basal angiosperms are represented with Amborella trichopoda (family Amborellaceae) that generated small RNAs from three endogenous caulimovirids of a tentative genus Florendovirus (Caulimoviridae) and water lily (Nymphaeaceae) that accumulated small RNAs from (i) cucumber mosaic virus (Cucumovirus, Bromoviridae), (ii) an uncharacterized virus from the genus Cytorhabdovirus (Rhabdoviridae) and (iii) an uncharacterized, likely endogenous, caulimovirid (Kreuze, 2014). Outside angiosperms the representatives of only few plant families have been characterized by the presence by viral small RNAs. These include queen sago palm (Cycadaceae), moss Physcomitrella patens (Funariaceae) and green algae Chara coralline (Characeae), which all accumulated small RNAs from putative endogenous caulimovirids (Kreuze, 2014), and European water clover (Marsileaceae) that accumulated small RNAs from turnip yellows virus (Polerovirus, Luteoviridae) (Kreuze, 2014).
Only a few viruses that can infect land plant species outside the clade Angiosperms have been identified so far, including a few viruses and virus-like agents identified in gymnosperms and ferns (Hull, 2014). Recently, EVEs of the family Caulimoviridae have been identified in the genomes of gymnosperms, ferns and club mosses (Diop et al., 2018;Gong and Han, 2018). Taking into consideration EVEs of the genus Florendovirus (Caulimoviridae) identified in the genomes of angiosperms (Geering et al., 2014), almost every vascular land plant (Tracheophyta) contains endogenous caulimovirids, some of which can potentially give rise to episomal viruses as reported for EVEs from the genera Badnavirus, Solendovirus, and Petuvirus (Ndowora et al., 1999;Lockhart et al., 2000;Richert-Pöggeler et al., 2003;discussed below). Since all land plants and green algae possess the small RNA-generating RNAi machinery (You et al., 2017), it is conceivable that gymnosperms, ferns and club mosses can potentially accumulate siRNAs derived from endogenous caulimovirids and/or their extant not-yetidentified episomal counterparts. Likewise, genomes of all land plants contain long terminal repeat (LTR) retrotransposons of the families Metaviridae (Ty3/Gypsy) and Pseudoviridae (Ty1/Copia) which can give rise to siRNAs as has been reported for the angiosperms A. thaliana (Creasey et al., 2014;Masuta et al., 2017), strawberry (Šurbanovski et al., 2016), mangrove , maize (Alejandri-Ramírez et al., 2018) and wheat (Sun et al., 2013); transposon-derived small RNAs were also reported for gymnosperms such as Picea glauca (Liu and El-Kassaby, 2017) and Cryptomeria japonica (Ujino-Ihara et al., 2018).
Eukaryotic algae host large dsDNA viruses of the family Phycodnaviridae and a few species of RNA viruses (Hull, 2014). The presence of the small RNA-generating RNAi machinery in green and red algae (You et al., 2017;Lee et al., 2018) argues for its possible role in antiviral defenses. A recent study of the brown alga Fucus serratus has revealed predominantly 21-nucleotide (nt) small RNAs with 5 U derived from both strands of a (−)ssRNA bunya/phlebo-like virus (unassigned Bunyavirales) and from both strands of an LTR Copia retrotransposon (Pseudoviridae), which is indicative of a functional antiviral RNAi response (Waldron et al., 2018).

TOWARD MORE EXHAUSTIVE RECONSTRUCTION OF COMPLEX VIROMES BY DEEP SMALL RNA SEQUENCING AND BIOINFORMATICS
In a few reported cases, virus-derived small RNAs were below detection by small RNA sequencing, although the corresponding virus could be identified by other methods. For instance, in a co-infected rice plant, PCR-positive for rice tungro spherical virus (RTSV, Waikavirus, Secoviridae) and rice tungro bacilliform virus (RTBV, Tungrovirus, Caulimoviridae), only RTBV-derived siRNAs could be readily identified by deep sequencing, while RTSV-specific reads were negligible and comparable to those in a control "virus-free" plant (Zarreen et al., 2018). Since another waikavirus and viral species from other genera of Secoviridae are readily identified by small RNA sequencing (see Supplementary  Table S1 and references therein) and because rice plants can generate siRNAs from many types of RNA and DNA viruses (Yan et al., 2010;Jiang et al., 2012;Xu et al., 2012;Kreuze, 2014;Rajeswaran et al., 2014a;Hong et al., 2015;Wu J. et al., 2015;Xu and Zhou, 2017;Yang et al., 2017;Jimenez et al., 2018;Lan et al., 2018; see Supplementary Table S1 and Supplementary List S1), the failure to identify RTSV-specific siRNAs can be explained by low titer of the virus. In this and other similar cases, deeper sequencing of small RNA-ome could have helped to uncover viral siRNAs. Indeed, a recent systematic study has established that, under low sequencing depths, most bioinformatic pipelines that use de novo assembly of small RNAs fail to identify lowtiter persistent viruses in apple and grapevine, while those viruses can be readily identified at higher sequencing depths (Massart et al., 2018). In another study that compared small RNA-seq with long RNA-seq, a member of the genus Cytorhabdovirus (Rhabdoviridae) could be identified by de novo assembly only in the long RNA dataset, because of low abundance of the viral reads in the small RNA dataset (Pecman et al., 2017). Notably, in the same study, 14 distinct viral/viroid species representing various viral families could be readily identified by sequencing small RNAs from eight different hosts (Pecman et al., 2017; see Supplementary Tables S1, S2). Furthermore, viral species representing all four genera of Rhabdoviridae including Cytorhabdovirus have been identified by small RNA sequencing from 10 different plant species (Roy et al., 2013b,c;Kreuze, 2014;Hartung et al., 2015;Yan et al., 2015;Verdin et al., 2017;Yang et al., 2017;Mwaipopo et al., 2018) (see Supplementary Table S1).
It has been speculated that occasional failure to detect viral small RNAs is due to virus-encoded silencing suppressor proteins that block viral siRNA biogenesis. However, available evidence indicates the contrary. For example, wild-type RNA viruses of the families Bromoviridae (cucumber mosaic virus), Potyviridae (turnip mosaic virus) and Virgaviridae (oilseed rape mosaic virus) spawn highly abundant siRNAs constituting 30-70% of total (viral + host) small RNA sequencing reads, whereas their suppressor protein-deficient mutant derivatives spawn siRNAs of much lower abundance that correlates with strongly reduced titers of viral genomic RNA (Garcia-Ruiz et al., 2010;Wang et al., 2010Wang et al., , 2011Malpica-López et al., 2018). The most extreme example of an RNA virus spawning highly abundant siRNAs is pelargonium line pattern virus (Pelarspovirus, Tombusviridae) whose siRNAs constituted ca. 90% of total small RNA reads in symptomless N. benthamiana (Pérez-Cañamás et al., 2017). Among DNA viruses, cauliflower mosaic virus (Caulimovirus, Caulimoviridae)-derived siRNAs constitute ca. 50% of total small RNA reads in A. thaliana at late stages of infection (Blevins et al., 2011), which could allow for de novo assembly of viral siRNA reads into a single terminally redundant contig representing the entire circular viral genome of 8 kb (Seguin et al., 2014).
The universal power of small RNA sequencing for identification and reconstruction of known and unknown RNA and DNA viruses in single and mixed infections can be further exemplified by several comprehensive studies.  have reconstructed the virome of tomatoes in China by sequencing small RNAs from 170 samples and identified 22 viruses representing 12 genera of RNA and DNA viruses and 2 viroids (see "  in Supplementary Tables S1, S2): the complete genomes were reconstructed from small RNAs by de novo and reference-based assembly for 13 of the 22 viruses and near complete ones (>90% genome sequence) for another 5 viruses. Likewise, small RNA-omics has been successfully used for reconstruction of the complex viromes of vegetatively cultivated sweet potatoes (Kreuze et al., 2009;Cuellar et al., 2011;Kashif et al., 2012;De Souza et al., 2013;Mbanzibwa et al., 2014;Cuellar et al., 2015;Untiveros et al., 2016;Cao et al., 2017; see Supplementary Tables S1 and Supplementary List S1) and grapevines Zhang et al., 2011Zhang et al., , 2014Alabi et al., 2012;Giampetruzzi et al., 2012;Wu et al., 2012;Glasa et al., 2014Glasa et al., , 2015Seguin et al., 2014;Maliogka et al., 2015;Chiumenti et al., 2016a;Eichmeier et al., 2016;Barrero et al., 2017;Czotter et al., 2018; see Supplementary Tables S1, S2 and Supplementary List S1), which led to identification of novel RNA and DNA viruses and viroids among other virome components. Mwaipopo et al. (2018) have identified 15 viral species from 10 genera of RNA and DNA viruses in the virome of common bean (see Supplementary Table S1 and Supplementary List S1). Verdin et al. (2017) have used small RNA-omics to survey 46 species of ornamental plants and identify multiple known and unknown viruses representing 22 genera of RNA and DNA viruses and 2 viroids (see Supplementary Tables S1, S2), albeit some of the novel viruses were represented only with short contigs.
In effort to improve identification and reconstruction of different types of viral genomes from various hosts, Barrero et al. (2017) have developed bioinformatics algorithms for small RNA assembly using selected 21-nt and 22-nt versus 24-nt reads which represent most abundant size-classes of viral siRNAs (discussed below). This work followed earlier studies where a host genome filtering step had been employed to allow for assembling viral small RNAs in longer contigs Seguin et al., 2014) that, in some cases, represented the complete genomes of both RNA and DNA viruses and viroids (Seguin et al., 2014). It should be noted that the host genome filtering step removes from a small RNA-ome not only the host-derived endogenous small RNAs but also, in some cases, viral small RNA reads occasionally matching the host genome, which may lead to incomplete assembly of a viral genome (Seguin et al., 2014). This would also be the case for those viruses that have EVE counterparts integrated in the host genome. Nonetheless, for representatives of both DNA (Caulimoviridae and Geminiviridae) and RNA (Virgaviridae) viruses the host genome filtering allowed for de novo assembly of complete viral genomes, albeit this step was implemented after small RNA assembly into contigs by a short read assembler, followed by assembly of all the unmapped contigs by a long read assembler (Seguin et al., 2014). A recently developed tool VirusDetect (Zheng et al., 2017a) implements for small RNA-seq datasets both filtering through a host genome and mapping onto a database of viral reference genomes to enrich for viral reads before assembly. The latter approach may not be applicable for identification of novel viruses with low homology to any known virus in the reference database. Other issues concerning reliability, reproducibility and sensitivity of virus diagnostics by small RNA sequencing and bioinformatics analysis have been recently investigated and discussed by Massart et al. (2018).
Importantly, the sensitivity of small RNA-seq for detection of potyviruses (Potyviridae) in N. tabacum is reportedly 10 times higher than that of quantitative RT-PCR (Santala and Valkonen, 2018). Likewise, sweet potato feathery mottle virus (Potyvirus, Potyviridae) with extremely low titer (below immune detection) and symptomless in sweet potato could be assembled by sequencing viral small RNAs that covered the complete reference genome at an average 470 reads per nucleotide (Kreuze et al., 2009). Notably, Potyviridae is among those viral families for which many species from various hosts have been identified by small RNA-seq (see Supplementary Table S1 and references therein).
Concerning identification of novel viruses, analysis of small RNA contigs using BlastN and BlastX is recommended (Massart et al., 2018). In this case, any bioinformatics tricks making viral contigs longer, such as those described above and additional ones in Massart et al. (2018), should help to identify a virus or viruslike agent with very low homology to known viral sequences available in the NCBI GenBank or other databases.
A notable limitation of the small RNA-omics approach is its inability to fully and reliably reconstruct from mixed infections genome sequences of two (or more) strains or genetic variants of the same virus when they share high sequence identity. Thus, two strains of Potato virus Y (Potyvirus, Potyviridae) coinfecting a potato plant could be de novo assembled into strain-specific small RNA contigs only for a 1 kb 5 -portion of the virus genomes which share 75% nucleotide identity. The remaining portions of the genomes sharing >87% identity merged into one chimeric small RNA contig and could be separated only by a reference-based approach (Turco et al., 2018). Likewise, two strains of Pepino mosaic virus (Potexvirus, Alphaflexiviridae) with 82-87% identity in co-infected tomatoes Turco et al., 2018) and two strains of Potato virus X (Potexvirus) with 80% identity in a co-infected potato (Kutnjak et al., 2014) could be distinguished by small RNA sequencing, but their (near-)complete genome sequences were reconstructed only by using reference-based approaches. In these and similar cases, however, recombinant viral genomes potentially present in the mixed virome quasispecies population cannot be reliably reconstructed from small RNA reads. Such reconstruction can possibly be achieved by single long molecule sequencing (e.g., using PacBio or Nanopore technology) in combination with small RNA-seq. It is worth noting that viral small RNAs faithfully represent a consensus master genome sequence and minor variants in the quasispecies populations of both RNA and DNA viruses (Seguin et al., 2014(Seguin et al., , 2016Kutnjak et al., 2015). Thus, small RNA sequencing-based identification and subsequent correction of three single nucleotide cloning errors in the genome of an RNA tobamovirus (Virgaviridae) enabled construction of a fully biologically active infectious clone of the virus (Seguin et al., 2014;Malpica-López et al., 2018).
Since viral siRNAs are relatively more stable than long RNA and DNA molecules, small RNA deep sequencing can be applied in paleovirology. Thus, Smith et al. (2014) have reported identification and reconstruction of an ancient isolate of Barley stripe mosaic virus (Hordeivirus, Virgaviridae) by sequencing small RNAs extracted from a 700 years-old seed of barley, with 99.4% of the contemporary virus reference genome being covered by small RNA contigs. Likewise, Hartung et al. (2015) have used small RNA-seq and reference-based assembly to reconstruct citrus leprosis virus cytoplasmic type 2 (Cilevirus, unassigned family) and citrus leprosis virus nuclear type (Dichorhavirus, Rhabdoviridae) from herbarium specimens of orange fruit peals collected in 1967 and 1948, respectively. The genome sequences of these isolates were found to be respectively 99% and 80% identical to those of contemporary isolates of the two viruses that had previously been identified by sequencing and de novo assembly of small RNAs from fresh citrus samples (Roy et al., 2013a,b,c) (Supplementary Table S1).

COMPONENTS OF THE PLANT SMALL RNA-GENERATING RNAi AND GENE SILENCING MACHINERY
Components and functionality of the plant small RNAgenerating RNAi machinery in endogenous gene silencing pathways have been dissected using combined genetic, small RNA-seq, long RNA-seq and biochemical approaches for the model eudicot A. thaliana using a comprehensive collection of its RNAi-deficient mutant lines. Similar but less comprehensive studies have also been performed for other angiosperms such as the eudicot N. benthamiana and the monocot Oryza sativa, and also for non-angiosperms such as the moss P. patens. Based on these studies as well as small RNA-omics, transcriptomics and phylogenomics analyses of other plant species, the endogenous small regulatory RNAs have been divided into miRNAs and siRNAs which are generated from stem-and-loop ssRNA and dsRNA precursors, respectively, by Dicer-like (DCL) family proteins. Both miRNAs and siRNAs are then loaded onto Argonaute (AGO) family proteins to guide the resulting RNA-induced silencing complexes (RISCs) to complementary target RNA and/or DNA molecules (reviewed in Rogers and Chen, 2013;Borges and Martienssen, 2015;Fang and Qi, 2016). Plant endogenous siRNAs are further subdivided in several classes including inverted repeat hairpin-derived siRNAs (hpsiRNAs), natural antisense transcript siRNAs (natsiRNAs), double-strand-break-induced siRNAs (diRNAs), secondary siRNAs and heterochromatic siRNAs (hcsiRNAs) (Borges and Martienssen, 2015). Secondary siRNAs include phased siRNAs (phasiRNAs), trans-acting siRNAs (tasiRNAs) and epigenetically activated siRNAs (easiRNAs) and their biogenesis requires miRNA-directed cleavage of target mRNA or other Pol II transcripts and then involves complementary RNA synthesis by RNA-dependent RNA polymerase (RDR) family protein(s) (RDR6 or RDR1), followed by DCL4-and/or DCL2-mediated processing of the resulting dsRNA into 21-nt and 22-nt siRNAs. Secondary siRNAs are loaded onto AGO1 or AGO2 clade proteins to post-transcriptionally repress their target RNAs through in cis or in trans cleavage and degradation (Borges and Martienssen, 2015). HcsiRNAs are produced from multiple heterochromatic loci with cytosinemethylated DNA at CG, CHG, and CHH sites and modified histones, which include inactive transposons and repetitive DNA elements, by a mechanism that involves plant-specific DNA-dependent RNA polymerase IV (Pol IV) transcribing methylated DNA, RDR2 converting the Pol IV transcripts into dsRNA and DCL3 processing the resulting dsRNA into 24-nt siRNAs (Blevins et al., 2015). HcsiRNAs are loaded onto AGO4/6/9 clade proteins to maintain heterochromatic state through siRNA-directed DNA methylation (RdDM) of cognate DNA loci mediated by de novo methyltransferase DRM2. Maintenance of RdDM also requires another plantspecific DNA-dependent polymerase, Pol V , while initiation of RdDM likely involves Pol II (reviewed in Borges and Martienssen, 2015;Matzke et al., 2015).
Plant miRNAs of 21 and 22 nts in length, produced from the MIR genes Pol II transcripts by DCL1 and loaded onto AGO1, act post-transcriptionally through cleavage or translational repression of endogenous target mRNAs (reviewed in Rogers and Chen, 2013) and do not contribute directly to the repression of RNA or DNA viruses mediated by siRNA-generating RNAi machinery (Figure 1; described below). However, several miRNAs are involved in antiviral defense via regulation of plant defense genes and components of the RNAi machinery (reviewed in Carbonell and Carrington, 2015;Pooggin, 2016). Moreover, some miRNAs are involved in recognition of transcriptionally active LTR (and other) transposons and initiation of RDR6and DCL4-dependent biogenesis of abundant 21-nt and 22nt easiRNAs (Creasey et al., 2014). Production and activity of easiRNAs likely leads to a transition from post-transcriptional to transcriptional silencing when Pol II transcription would be replaced by Pol IV/V transcription, switching from 21-nt or 22nt to 24-nt siRNA production and epigenetic silencing by RdDM (Borges and Martienssen, 2015). Similar mechanisms might be involved in recognition and repression of EVEs upon their initial integration into the host genome or after stress-dependent activation of silent EVEs (discussed below).
All classes of plant endogenous (and viral) small RNAs are methylated at 2 OH of 3 -terminal nucleotide by the methyltransferase HEN1, which protects small RNA from degradation. Notably, suppressor proteins of some (+)ssRNA viruses (Tobamovirus, Potyvirus, Tombusvirus) can interfere with HEN1-mediated methylation of miRNAs and viral and endogenous siRNAs, likely through binding and sequestering FIGURE 1 | Small interfering RNA (siRNA)-directed RNA interference (RNAi) and gene silencing in plants. Double-stranded RNA (dsRNA) is a trigger of RNAi and gene silencing at both post-transcriptional and transcriptional levels (PTGS and TGS). Dicer-like (DCL) family enzymes catalyze processing of dsRNA into siRNA duplexes. One of the duplex strands gets associated with an Argonaute (AGO) family protein and the resulting RNA-induced silencing complex targets cognate genes for PTGS through target mRNA cleavage and/or TGS through target DNA methylation (CH 3 ). Positive feedback loops reinforce both PTGS and TGS. In PTGS, the mRNA cleavage products (aberrant RNAs) are converted by RNA-dependent RNA polymerase (RDR) activity into dsRNA precursors of siRNAs. In TGS, the methylated DNA and the target (to-be-methylated) DNA are transcribed by Pol IV and Pol V, respectively. The Pol IV transcripts are converted by RDR activity into dsRNA precursors of siRNAs, while the Pol V transcripts serve as scaffolds that interact with siRNA-AGO complexes and recruit DNA methyltransferase that mediates de novo RNA-directed DNA methylation (RdRM). Pol II is likely involved in initiation of RdDM and TGS by producing aberrant transcripts that are recognized by RDR and converted into dsRNA precursors of siRNAs. small RNA duplexes produced by DCLs (Akbergenov et al., 2006;Blevins et al., 2006;Lózsa et al., 2008;Malpica-López et al., 2018). Such suppressor activity, however, does not lead to decreased accumulation of viral siRNAs (Malpica-López et al., 2018). Representatives of dsDNA-RT (Cauliflower mosaic virus, Caulimovirus, Caulimoviridae) and ssDNA (Cabbage leaf curl virus, Begomovirus, Geminiviridae) viruses do not interfere with HEN1 activities in A. thaliana .
Loading of endogenous (and viral) small RNAs onto specific AGOs is determined largely by the small RNA 5 -terminal nucleotide and length, but also other factors (reviewed in Borges and Martienssen, 2015;Carbonell and Carrington, 2015;Fang and Qi, 2016). Thus, 21-nt and 22-nt small RNAs with 5 U are predominantly loaded onto AGO1 (or, in some cases, AGO10), those with 5 C onto AGO5, and those with 5 A onto AGO2 (or, in some cases, AGO7), whereas 24-nt small RNAs with 5 A are preferentially loaded onto AGO4/6/9 clade proteins. It is not known which AGO (if any) is specific for small RNAs with 5 G such as, e.g., abundant 20-nt 5 G-RNAs identified in Musa acuminata banana (Rajeswaran et al., 2014b).
Interestingly, infections of A. thaliana with cucumber mosaic virus (Cucumovirus, Bromoviridae) and turnip mosaic virus (Potyvirus, Potyviridae) lead to activation of DCL4-and RDR1dependent production of endogenous 21-nt siRNAs from multiple protein-coding genes. These virus-activated siRNAs (vasiRNAs) are incorporated into AGO1 and AGO2 RISCs and silence the corresponding host genes post-transcriptionally (Cao et al., 2014). This and other observations indicate that viral infections can dramatically perturb the endogenous plant small RNA pathways.
The small RNA-generating RNAi machinery has evolved in the common ancestor of land plants and green algae (You et al., 2017). DCLs, RDRs, and AGOs have diversified early in the land plants. Interestingly, although the four distinct classes of DCLs that generate miRNAs (DCL1) and various types of endogenous and viral siRNAs (DCL2, DCL3, DCL4) are present in all angiosperms, some (but not all) of the three siRNA-generating DCL genes may be occasionally absent in some species of nonangiosperm lineages (gymnosperms, monilophytes, lycophytes, bryophytes, or ferns), with DCL2 being most frequently lost (Ma et al., 2015;You et al., 2017). Likewise, Pol IV/V or some other components of the angiosperm RdDM machinery appear to be occasionally missing in some species of the non-angiosperm lineages (Ma et al., 2015;You et al., 2017).

BIOGENESIS AND FUNCTION OF VIRAL siRNAs
The biogenesis and functionalities of 21-nt, 22-nt and 24-nt siRNAs in plant antiviral defenses, dissected mostly in the eudicot A. thaliana using its mutants deficient in core components of the RNAi machinery (DCLs, AGOs, RDRs, HEN1, Pol IV/V, etc.) and, to a much lesser extent, in the eudicot N. benthamiana and the monocot Oryza sativa, have been reviewed in great detail (Pooggin, 2013(Pooggin, , 2016Carbonell and Carrington, 2015). Likewise, viral strategies of suppression and evasion of antiviral RNAi have been reviewed (Pooggin, 2013(Pooggin, , 2016(Pooggin, , 2017Csorba et al., 2015). This review illustrates the mechanisms of viral siRNA biogenesis and action in the aforementioned angiosperm plant species infected with RNA and DNA viruses (Figures 2-4) and, based on analysis of the available small RNA-seq data (Supplementary Tables S1, S2 and Supplementary List S1), discusses possible conservation of these mechanisms in other angiosperm and nonangiosperm plants infected with all types of viruses, satellites, and viroids.

RNA Virus-Derived siRNAs
The hallmark of plant infections with RNA viruses is accumulation of 21-nt and 22-nt viral siRNAs that are generated by DCL4 and DCL2, respectively (Xie et al., 2004;Akbergenov et al., 2006;Blevins et al., 2006;Bouché et al., 2006;Fusaro et al., 2006;Garcia-Ruiz et al., 2010;Wang et al., 2010). RNA viruses with all types of genomes, (+)ssRNA, (−)ssRNA and dsRNA, spawn predominantly 21-nt and 22-nt small RNA (see Supplementary Table S1 and references therein). The relative abundance of 21-nt vs. 22-nt viral siRNAs differs depending on host plant species or other factors (e.g., specific activities of viral silencing suppressors expressed by different types of RNA viruses in targeting different components of RNAi machinery; Csorba et al., 2015) and likely reflects the relative expression levels of DCL4 and DCL2 and/or their relative activities in accessing and processing siRNA precursors (reviewed in Pooggin, 2016). When both DCL4 and DCL2 activities are diminished by knockout or knockdown mutations in A. thaliana Bouché et al., 2006;Garcia-Ruiz et al., 2010;Wang et al., 2010) and N. benthamiana (Qin et al., 2017), DCL3 takes over to produce RNA virus-derived 24-nt siRNAs, albeit much less efficiently than DCL4 or DCL2, likely because of predominantly nuclear activities of DCL3 in endogenous RdDM and in defense against DNA viruses (discussed below). Nonetheless, in rare cases, 24-nt small RNAs are detectable (together with more abundant 21-nt and 22-nt siRNAs) in wild-type plants infected with some (+)ssRNA and (−)ssRNA viruses in some hosts (see Supplementary Table S1 and references therein).
For all types of RNA viruses [(+)ssRNA, (−)ssRNA, dsRNA], 21-nt and 22-nt small RNAs are produced from both strands of the viral genome without any strong bias in most cases (Supplementary Table S1 and references therein), suggesting that those small RNAs are bona fide siRNAs processed by DCL4 and DCL2 from perfect dsRNA precursors in the and the RNA(-) templates, respectively. Both processes produce dsRNA intermediates that are recognized by host DCL enzymes. DCL4 is a primary dicer that catalyzes processing of viral dsRNA into 21-nt siRNA duplexes. DCL2 is a secondary dicer that generates 22-nt siRNA duplexes. The resulting primary viral siRNAs are associated with available AGO1/5/10 clade or AGO2/3/7 clade proteins to form RNA-induced silencing complexes containing either sense or antisense strand of the siRNA duplex. Both sense and antisense siRNA-AGO complexes can potentially target the complementary viral RNA(-) and RNA(+), respectively, for cleavage. RDR6 (or RDR1) can potentially convert the cleavage products into dsRNA precursors of secondary viral siRNAs, which reinforces antiviral RNAi. This model was adopted and extended from Pooggin (2016). form of 21-nt and 22-nt duplexes, respectively. However, a strong (+)strand bias of viral small RNAs has been reported for some (+)ssRNA viruses such as cucumber mosaic virus (Cucumovirus, Bromoviridae) in tomato and N. benthamiana but not N. tabacum, and citrus tristeza virus (Closterovirus, Closteroviridae) in four out of nine different Citrus species hosts, and for some members of Tombusviridae and Virgaviridae in some hosts. On the other hand, (+)ssRNA viruses of Tymoviridae often exhibit (−)strand bias in viral siRNAs (see Supplementary Table S1 and references therein). Interestingly, preferential accumulation of (+)strand-derived small RNAs has been reported for a satellite RNA of beet black scorch virus (Betanecrovirus, Tombusviridae) but not for the helper virus itself that spawned 22-nt and 21-nt siRNAs from both strands in N. benthamiana (Xu et al., 2016). A strong (+)strand bias for cymbidium ringspot virus (Tombusvirus, Tombusviridae)derived small RNAs in N. benthamiana was initially attributed to DCL-mediated processing of secondary structures of singlestranded viral genomic RNA (Molnár et al., 2005;Donaire et al., 2009;Szittya et al., 2010). However, among DCLs only DCL1 has evolved to process hairpin-like secondary structures of miRNA precursor transcripts, while optimal substrates for other DCLs (DCL2, DCL3 and DCL4) are (i) perfect dsRNAs produced by RDR activities (i.e., precursors of tasiRNAs, phasiRNAs, easiRNAs and hcsiRNAs), (ii) perfect dsRNAs formed via annealing sense and antisense transcripts from the same DNA locus (natsiRNAs) or (iii) perfect and nearperfect dsRNAs formed through folding back inverted-repeat transcripts (hpsiRNAs) (reviewed in Borges and Martienssen, 2015). Interestingly, viral small RNAs produced in oilseed rape mosaic virus (Tobamovirus, Virgaviridae)-infected A. thaliana dcl2 dcl3 dcl4 triple mutant plants have a strong (+) bias and a much broader size range, unlike those predominantly 21-nt and 22-nt viral siRNAs produced from both strands in control wild-type plants (Malpica-López et al., 2018). This finding has led to a hypothesis that, in conditions when siRNAgenerating DCLs are absent or their activities are diminished, DCL1 or other RNase III enzymes (Shamandi et al., 2015;Elvira-Matelot et al., 2016) can access and inefficiently process secondary structures of highly abundant viral genomic and subgenomic RNAs (Malpica-López et al., 2018). A more trivial reason for the observed strand biases could be technical issues with cDNA library preparation protocols as was more recently demonstrated for the aforementioned cymbidium ringspot virus-N. benthamiana pathosystem, leading to a conclusion that viral siRNAs are produced from perfect dsRNA precursors rather than viral ssRNA secondary structures and therefore do not exhibit any strand bias (Harris et al., 2015). Taking together the above described findings as well as other findings and considerations (Carbonell and Carrington, 2015;Pooggin, 2016), a current model for the biogenesis and function of RNA-virus derived siRNAs (Figure 2) states that both DCL4 and DCL2 can access dsRNA intermediates of viral replication and transcription (mediated by viral RNA-dependent RNA polymerase, vRdRP) and process them into 21-nt and 22nt siRNA duplexes, respectively. These duplexes are loaded onto available AGO proteins from AGO1/5/10 and AGO2/3/7 clades, and one of the duplex strands (sense or antisense) remains in mature RISC complexes. The viral siRNA-AGO RISCs can potentially access cognate viral ssRNA of both sense and antisense polarities through complementary interactions and the AGO slicer activity would result in cleavage of target viral ssRNAs (Carbonell et al., 2012;Schuck et al., 2013;Garcia-Ruiz et al., 2015). The cleavage products can potentially be recognized by available host RDR(s) which would generate dsRNA precursors of secondary viral siRNAs (Figure 2). A proportion of RDRdependent secondary siRNAs in total viral siRNA population is likely very low, because production of RDR6-and/or RDR1dependent siRNAs could be revealed only for some RNA virus mutant derivatives lacking functional suppressor proteins [Cucumovirus (Wang et al., 2010 and Potyvirus (Garcia-Ruiz et al., 2010), but not Tobamovirus (Malpica-López et al., 2018)], whereas wild type virus-derived siRNAs are usually RDRindependent (reviewed in Pooggin, 2016).
The requirements of particular DCLs for biogenesis of viral siRNAs and involvement of other components of the RNAi machinery in biogenesis or action of viral siRNAs have so far been investigated only for (+)ssRNA, ssDNA and dsDNA-RT viruses. Nonetheless, analysis of the viral siRNA size, strand bias and coverage profiles for (−)ssRNA and dsRNA viruses revealed FIGURE 4 | Model for the biogenesis and action of episomal caulimovirid-derived siRNAs. Viral DNA is released from the virion into the nucleoplasm. Gaps in this discontinuous dsDNA left after reverse transcription are repaired by the host repair enzymes to create covalently closed dsDNA. Both repaired and unrepaired forms of viral dsDNA are transcribed by host Pol II. The repaired dsDNA gives rise to pregenomic RNA (pgRNA). Then, pgRNA (and its spiced versions in some genera of Caulimoviridae) is transported to cytoplasm to serve as a polycistronic mRNA for coat protein (CP) and reverse transcriptase (RT). pgRNA also serves as a template for reverse transcription by viral RT within a pre-virion made of CP. On the unrepaired dsDNA, abrupt termination of Pol II transcription at the (-)strand DNA gap (the Met-tRNA primer binding site) results in production of 8S RNA, as demonstrated for members of Caulimovirus and Tungrovirus, or a much shorter RNA in the case of Badnavirus. 8S RNA forms a viroid-like secondary structure that is converted into dsRNA (likely by Pol II). The resulting dsRNA serves as a decoy to engage all DCLs in massive production of 21-, 22-, and 24-nt viral siRNAs, which are then associated with AGO family proteins. Stable secondary structure of the pgRNA leader sequence interferes with complementary interaction of viral siRNA-AGO complexes with pgRNA. Less abundant viral siRNAs are also produced from dsRNA precursors representing other regions of the viral genome, which are likely produced by aberrant antisense transcription of viral circular dsDNA mediated by Pol II or from viral sense RNAs by RDR activity. This model was adopted and extended from Blevins et al. (2011) andPooggin (2016). similarities to those of (+)ssRNA viruses (see Supplementary  Table S1 and references therein), suggesting that the mechanism presented in Figure 2 likely operates in defense against all types of RNA viruses and in all land plants that possess the depicted components of the RNAi machinery.
Coverage of both strands of an RNA virus genome with siRNA sequencing reads is usually uniform, although local hot and cold spots of siRNAs are evident for all types of RNA viruses, which may reflect DCL preferences for certain dsRNA sequences and/or preferential stabilization of certain siRNA sequences by AGOs. Relative abundance of dsRNAs derived from genomic vs. subgenomic RNAs may also account for non-uniform global distribution of siRNAs along the virus genome (e.g., Ruiz-Ruiz et al., 2011;Turco et al., 2018). In the case of segmented or multipartite viruses with RNA or DNA genomes, differences in relative abundance of siRNAs derived from different genomic components (e.g., Wang et al., 2010;Aregger et al., 2012;Zheng et al., 2017b;Patil and Arora, 2018) may reflect their unequal replication/transcription in a sampled plant tissue. A dsRNA decoy strategy of silencing evasion evolved by pararetroviruses (Caulimoviridae; Blevins et al., 2011;Rajeswaran et al., 2014a;discussed below) and perhaps some other viruses can also account for major hotspots of siRNA production.

Episomal DNA Virus-Derived siRNAs
The hallmark of DNA virus infections of land plants is accumulation of 24-nt viral siRNAs, in addition to 21-nt and 22-nt siRNAs (see Supplementary Table S1 and references therein). So far the biogenesis and function of DNA virusderived siRNAs have been dissected for the ssDNA virus Cabbage leaf curl virus (bipartite Begomovirus, Geminiviridae) and the dsDNA-RT virus Cauliflower mosaic virus (Caulimovirus, Caulimoviridae) using combined blot hybridization, small RNAseq and biochemical approaches in A. thaliana and its RNAideficient mutant lines (Akbergenov et al., 2006;Blevins et al., 2006Blevins et al., , 2011Shivaprasad et al., 2008;Aregger et al., 2012;Seguin et al., 2014). Likewise, beet curly top virus (Curtovirus, Geminiviridae) and its suppressor-deficient mutants have been investigated with a main focus on involvement of components of RdDM and histone modification machinery in antiviral responses (Raja et al., 2014;Jackel et al., 2015Jackel et al., , 2016Coursey et al., 2018), albeit small RNA sequencing was not employed in those studies. For other DNA viruses, much less information is available (reviewed in Pooggin, 2013Pooggin, , 2016. Based on the available evidence, models shown in Figures 3, 4 have been proposed earlier (Pooggin, 2013(Pooggin, , 2016 and now updated with a few details and generalized by omitting subgenomic RNAs and specific viral suppressor proteins present in some but on all genera within Geminiviridae (Hanley-Bowdoin et al., 2013;Pooggin, 2013Pooggin, , 2016Ramesh et al., 2017) and Caulimoviridae (Pooggin, 2013(Pooggin, , 2016Pooggin and Ryabova, 2018).
In ssDNA geminivirus (Geminiviridae)-infected cells (Figure 3), bidirectional transcription mediated by Pol II generates read-through sense and antisense transcripts that form perfect dsRNA precursors of the primary viral siRNAs representing the entire circular virus genome (including the promoter region not covered with viral mRNAs). These dsRNA precursors are processed preferentially by DCL4, DCL2, and DCL3 generating 21-nt, 22-nt, and 24-nt siRNAs, respectively, while DCL1 much less efficiently contributes to generation of 21-nt siRNAs Aregger et al., 2012). All viral siRNA classes likely associate with AGO proteins to target viral transcripts for cleavage, as was deduced from virus-induced gene silencing studies using derivatives of cabbage leaf curl virus in A. thaliana single, double, and triple DCL mutants Aregger et al., 2012). AGO-viral siRNA complexes have not been characterized yet for any ssDNA virus. RDR6-and DCL4dependent secondary siRNAs represent only a minor proportion of 21-nt viral siRNAs, but appear to play a role in cell-to-cell spread and amplification of antiviral silencing Aregger et al., 2012) (Figure 3). Notably, Pol IV, Pol V and RDR2, being involved in endogenous 24-nt siRNA-directed DNA methylation (RdDM) pathways (Borges and Martienssen, 2015;Matzke et al., 2015), are not required for the biogenesis of ssDNA (and dsDNA) virus-derived siRNAs Aregger et al., 2012;Jackel et al., 2016), indicating that episomal DNA viruses evade RdDM and transcriptional silencing of viral genes in the nucleus, despite accumulation of highly abundant 24-nt viral siRNAs (Pooggin, 2013). Consistent with the evasion of transcriptional gene silencing, Pol II bidirectional promotor and terminator regions of bipartite and monopartite Geminiviridae are usually devoid of siRNA hotspots and the hotspots of 21-nt, 22-nt, and 24-nt viral siRNAs of both polarities are usually concentrated within viral ORFs (Aregger et al., 2012;Fuentes et al., 2016; see other references in Supplementary Table S1).
In episomal dsDNA-RT virus (Caulimoviridae)-infected cells (Figure 4), Pol II monodirectional transcription generates pregenomic RNA (pgRNA) which is transported from nucleus to cytoplasm for translation on ribosomes and reverse transcription mediated by viral reverse transcriptase (RT) in pre-virions composed of viral coat protein (CP) (reviewed in Pooggin and Ryabova, 2018). Discontinuous dsDNA produced via reverse transcription of pgRNA is transported back to nucleus where the discontinuities on both strands are sealed by the host DNA repair machinery, creating covalently closed dsDNA templates for Pol II transcription. If the discontinuity on (−)strand at the Met-tRNA RT primer binding site is not repaired, Pol II produces run-off transcript (so-called 8S RNA) which is converted into perfect dsRNA likely by Pol II-mediated synthesis of complementary RNA, generating dsRNA precursor of viral siRNAs (Figure 4). The latter hypothesis is based on genetic evidence, small RNA-seq and precise transcript mapping for cauliflower mosaic virus (Caulimovirus) in A. thaliana (Blevins et al., 2011) and small RNA-seq and transcript mapping for RTBV (Tungrovirus) in O. sativa (Rajeswaran et al., 2014a). The 8S dsRNA of ca. 600 bp serves as a decoy engaging all four DCLs in massive production of 21-nt (DCL1 and DCL4), 22nt (DCL2), and 24-nt (DCL3) siRNAs (Blevins et al., 2011) (Figure 4), thereby protecting from repressive siRNAs other regions of virus genome (ca. 8 kbp) including the Pol II promotor. The dsRNA precursors of much less abundant 21-nt, 22-nt, and 24-nt siRNAs generated by respective DCLs from non-decoy regions are likely produced via aberrant Pol II mediated antisense transcription on covalently closed circular viral dsDNA, because possible involvement of RDR1, RDR2, RDR6, Pol IV, or Pol V was ruled out for cauliflower mosaic virus in A. thaliana (Blevins et al., , 2011. Note that RDR gamma-clade genes present in the A. thaliana genome (RDRs 3a, 3b, and 3c) and genomes of other plant species have not been conclusively shown to be involved in biogenesis of endogenous or viral siRNAs, although a tomato Ty-1/Ty-3 RDR gene is implicated in defense against tomato yellow leaf curl virus (Begomovirus, Geminiviridae) and amplification of viral siRNAs (Butterbach et al., 2014). Based on a position of the (−)strand discontinuity downstream of the Pol II transcription start site and the viroid-like, strong secondary structure of pgRNA leader, the 8S RNA-like decoy strategy of silencing evasion has been predicted for most but not all genera of Caulimoviridae (Pooggin and Ryabova, 2018). In particular, members of the genus Badnavirus have the (−)strand discontinuity at a very short distance from the Pol II start site, which may not be compatible with efficient production of a dsRNA decoy (Pooggin and Ryabova, 2018). Indeed, small RNA-seq data obtained for six episomal species of Badnavirus in persistently infected Musa acuminata banana plants have revealed that hotspots of viral 21-nt, 22-nt, and 24-nt siRNAs are not concentrated in the pgRNA leader region and highly abundant siRNAs occur within ORFs (Rajeswaran et al., 2014b). Similar results have been obtained for other species of the genus Badnavirus in raspberry (Kalischuk et al., 2013), pagoda , taro (Kazmi et al., 2015) and grapevine (Howard and Qiu, 2017). Nonetheless, the Pol II promoter region is devoid of siRNA hotspots in five of the six banana badnaviruses and viral circular dsDNA is not methylated in all the six viruses (Rajeswaran et al., 2014b), indicating that badnaviruses can evade RdDM and transcriptional silencing similar to other genera of Caulimoviridae and ssDNA viruses (Pooggin, 2013). Consistent with RdDM evasion, highly abundant 24-nt viral siRNAs accumulating in cauliflower mosaic virus-infected A. thaliana were barely detectable on AGO4 protein (using immunoprecipitation followed by RNA blot hybridization), whereas AGO1-associated 21-nt and 22-nt viral siRNAs were readily detectable in this pathosystem (Blevins et al., 2011).
In the case of multipartite circular ssDNA viruses of the family Nanoviridae, very little information is available on viral siRNAs (see Supplementary Table S1 and references therein). It is conceivable that all four DCLs including DCL3 are involved in viral siRNA biogenesis and defense against Nanoviridae, because of their similarities to Geminiviridae in nucleus-based rolling circle replication and to betasatellites of Geminiviridae in small size and monodirectional transcription (reviewed in Gronenborn, 2004;Mandal, 2010). Indeed, 21-nt, 22-nt, and 24nt siRNAs represent both virion and complementary strands of tomato yellow leaf curl China betasatellite (Tolecusatellitidae) and cotton leaf curl Multan betasatellite (Tolecusatellitidae), similar to their helper viruses of Geminiviridae (Yang et al., 2011;; see Supplementary Table S1). In the case of Geminiviridae and Nanoviridae and their satellites (Alphasatellitidae and Tolecusatellitidae), dsRNA precursors of viral siRNAs may also arise from pervasive sense and antisense transcription mediated by Pol II on multimeric linear dsDNA byproducts of recombination-dependent replication (see Pooggin, 2013).
Among EVEs of Caulimoviridae, certain integrants from the genera Badnavirus, Solendovirus, and Petuvirus can give rise to episomal virus infections upon abiotic stress and/or interspecific hybridization (Ndowora et al., 1999;Lockhart et al., 2000;Richert-Pöggeler et al., 2003;reviewed in Chabannes and Iskra-Caruana, 2013). In the case of petunia vein clearing virus (PVCV, Petuvirus), some of its EVE copies integrated in the genome of Petunia hybrida are arranged in tandem repeats, which allows for Pol II transcription of complete pgRNA followed by reverse transcription, giving rise to episomal viral dsDNA (Richert-Pöggeler et al., 2003). The release of episomal PVCV upon abiotic stress correlates with accumulation abundant 21-22-nt viral siRNAs and less abundant 24-nt viral siRNAs, while in the absence of episomal virus, EVE-derived siRNAs are barely detectable by blot hybridization (Noreen et al., 2007). Furthermore, the integrated PVCV sequences are methylated and associated with modified histones, suggesting that epigenetic silencing through RdDM is involved in repression of these EVEs (Richert-Pöggeler et al., 2003;Noreen et al., 2007). Small RNA deep sequencing analysis of Citrus sp. has revealed that the Petuvirus-like EVEs representing Citrus endogenous pararetrovirus spawn predominantly 24-nt siRNAs along with less abundant 21-nt and 22-nt siRNAs (Barrero et al., 2017), further supporting the involvement of RdDM. Likewise, the Petuvirus-like EVE sequences at the centromeres of Fritillaria imperialis are associated with predominantly 24nt siRNAs and methylated cytosines at both symmetric (CG, CHG) and asymmetric (CHH) sites, which is a clear signature of RdDM (Becher et al., 2014). Furthermore, EVEs of the genus Florendovirus spawn predominantly 24-nt siRNAs of low abundance in Eucalyptus grandis (Marcon et al., 2017). 21-24 nt siRNAs derived from both strands of Florendovirus EVEs have also been reported for other angiosperms such as grapevine and Amborella trichopoda (Geering et al., 2014), although relative levels of 24-nt vs. 21-nt and 22-nt species were not analyzed. Small RNA sequencing has not been reported so far for the infective endogenous caulimovirids from the genus Badnavirus integrated in the Musa balbisiana diploid B genome or the B genome of Musa acuminata × balbisiana hybrid species (Ndowora et al., 1999; and from the genus Solendovirus integrated in the genomes of Nicotiana sp. (Lockhart et al., 2000). In the case of six episomal viruses of the genus Badnavirus in Musa acuminata (with triploid AAA genome), virus-derived siRNAs of 21-nt and 22-nt classes are more abundant than those of 24-nt class (Rajeswaran et al., 2014b).
Based on conserved methylation pattern of non-infective Solendovirus EVEs in Nicotiana glutinosa and some other Nicotiana species, it has been proposed that those EVEs might confer resistance to exogenous viral counterparts through an epigenetic mechanism (Mette et al., 2002;Gregor et al., 2004). Likewise, endogenous caulimovirid sequences integrated in the genomes of Solanum species and up to 83% identical to Tobacco vein clearing virus (the infective Solendovirus integrated in Nicotiana sp.; Lockhart et al., 2000) are methylated at nonsymmetrical cytosines (CHH) and associated with predominantly 24-nt siRNAs (as estimated by blot hybridization), which is the hallmark of RdDM and transcriptional silencing (Staginnus et al., 2007). In this and other cases, the silent EVE-derived siRNAs can potentially maintain RdDM (24-nt siRNA), preventing the release of infective EVE copies (if any), and, at the same time, targeting any cognate (incoming) episomal virus in sequencespecific manner at both posttranscriptional (21-22 nt siRNAs) and transcriptional (24-nt siRNA) levels.
Similar to caulimovirids, ssDNA viruses of the family Geminiviridae have also been able to get integrated in their host plant genomes (Murad et al., 2004;Filloux et al., 2015; and reference therein). However, unlike endogenous caulimovirids, geminiviral EVEs have not so far been reported to give rise to episomal viruses. Interestingly, geminiviral EVEs integrated in the genomes of water yam (Dioscorea alata) and other Dioscorea species appear to express the viral replication protein Rep (Filloux et al., 2015). In D. alata, one of the two EVEs possessing uninterrupted ORFs for Rep and replication enhancer (but not coat) proteins spawns siRNAs covering both strands of the integrated sequence, with 21-nt and 24-nt species being the first and the second most abundant (Filloux et al., 2015), indicative of post-transcriptional and transcriptional silencing.
In rare cases, EVEs representing plant RNA viruses have been reported. Interestingly, an EVE representing RNA1 of the segmented (+)ssRNA virus Cucumber mosaic virus (CMV, Cucumovirus, Bromoviridae) is integrated in the genome of Glycine max (soybean) (Da Fonseca et al., 2016), while EVEs representing a satellite RNA of this virus are integrated in the genomes of Nicotiana species (Zahid et al., 2015). In both cases, these EVEs spawn 24-nt siRNAs, indicative of epigenetic silencing through RdDM (Zahid et al., 2015;Da Fonseca et al., 2016), which is in contrast to the replicating CMV and its satellite that both spawn almost exclusively 21-nt and 22-nt siRNAs in infected host plants (Shimura et al., 2011;Fang et al., 2015;Shen et al., 2015; see other references in Supplementary Table S1). Notably, the CMV RNA1 is integrated in the soybean genome as an inverted repeat of its near-complete 3.4 kb sequence, interrupted with an intervening 0.5 kb sequence of plant origin, and siRNAs are derived exclusively from the invertedly repeated viral sequence, with 22-nt siRNAs being the most abundant, followed by 21-nt and 24-nt classes (Da Fonseca et al., 2016). Based on the similarities to an inverted repeat IR-71 in A. thaliana spawning DCL2-, DCL3-, and DCL4-dependent 22-nt, 24-nt, and 21-nt hpsiRNAs (Henderson et al., 2006) and the integration just downstream of an LTR Copia retrotransposon promotor (Da Fonseca et al., 2016), the CMV RNA1 EVE inverted repeat region is likely transcribed by Pol II and the resulting transcript with invertedly repeated sequence folds back to form a hairpin dsRNA precursor of siRNAs processed by soybean DCL2, DCL3, and DCL4 (Da Fonseca et al., 2016) (Figure 5). It can be further speculated that the Pol IV-and RDR2-dependent pathway generating 24-nt siRNAs is also involved in this case and other cases of EVE loci containing inverted repeats ( Figure 5).
As discussed above for transcriptionally active LTR transposons (reviewed in Borges and Martienssen, 2015), the switch from Pol II transcription to Pol IV/Pol V transcription can, in a similar manner, lead to the establishment and maintenance of epigenetic silencing at EVE loci. In the case of EVEs lacking inverted repeats, Pol II transcripts might be recognized and converted to dsRNA by RDR6 (or RDR1), followed by DCL4-and DCL2-mediated production of 21-nt and 22-nt siRNAs and post-transcriptional silencing. Likewise, Pol II bidirectional sense and antisense transcription at some EVE loci might presumably generate dsRNA precursors of 21-nt and 22-nt siRNAs (Figure 5). Potential access of DCL3 to both RDR-dependent and RDR-independent dsRNAs produced at EVE loci in the nucleus would lead to production of 24-nt siRNAs and eventual switch to RdDM and epigenetic silencing maintained by the Pol IV/RDR2/DCL3 pathway (Figure 5). The relative abundance of 21-nt and 22-nt vs. 24-nt siRNAs would reflect relative contribution of post-transcriptional versus transcriptional silencing pathways targeting a particular EVE. In most cases, EVEs are expected to be targeted by the RdDM machinery maintaining epigenetic silencing and generating predominantly 24-nt siRNAs, which would prevent release of those infective EVEs described above for Caulimoviridae.
Virus infections (like other stress factors) and especially activities of viral suppressor proteins might potentially perturb the RdDM machinery and lead to activation and release of transposons and infective EVEs which would contribute to disease severity. For the purpose of virus diagnostics and virome reconstruction the distinguishing features of episomal dsDNA and ssDNA viruses would be (i) high abundance viral siRNAs of both 21-22 nt and 24-nt classes, reflecting activities of the components of both post-transcriptional and transcriptional silencing machineries, and (ii) coverage of the entire circular viral genome sequences with siRNAs with characteristic hotspots within ORFs rather than Pol II promoter regions (as discussed above; also see Supplementary Table S1 and references therein for episomal infections of Caulimoviridae and Geminiviridae). The distinguishing features of silent EVEs of Caulimoviridae and Geminiviridae as well as RNA viruses would be predominantly 24-nt siRNAs of lower abundance which might be more broadly distributed along the reference genomes of their episomal counterparts and which would represent only those sequences of a reference episomal virus genome that are integrated in the host genome. Furthermore, siRNAs derived from inverted repeats of EVEs (if any) are expected to be more abundant and likely enriched in 22-nt and 21-nt classes in addition to 24-nt class. Further research using small RNA deep sequencing is needed to investigate size, 5 -nt identity, polarity, and hotspot profiles of EVE-derived siRNAs as well as the mechanisms of epigenetic silencing and activation of the infective EVEs giving rise to episomal viruses.
Putative protective potential of EVE-derived siRNAs against cognate virus infections proposed earlier (Mette et al., 2002;Bertsch et al., 2009;Da Fonseca et al., 2016) and discussed above is further supported by the studies of transgenic plants expressing siRNAs from hairpin RNA transgenes carrying invertedly repeated viral sequences. Such RNAi-transgenic plants are resistant and, in some cases, immune to infection with the corresponding virus, and the immunity correlates with transgenic production of phloem-mobile 24-nt siRNAs (reviewed in Pooggin, 2017). Interestingly, abundance of transgenederived 24-nt siRNAs appears to correlate with immunity or increased resistance not only to DNA viruses such as tomato yellow leaf curl virus (Begomovirus, Geminiviridae) in tomato (Leibman et al., 2015;Fuentes et al., 2016;see Pooggin, 2017), but also RNA viruses such as zucchini yellow mosaic virus (Potyvirus, Potyviridae) in cucurbits (Leibman et al., 2011), prunus necrotic ring spot virus (Ilarvirus, Bromoviridae) in sweet cherry (Zhao and Song, 2014a,b) and lettuce infectious yellows virus (Crinivirus, Closteroviridae) in N. benthamiana (Qiao et al., 2018).

Viroid-Derived siRNAs
The hallmark of viroids of the family Avsunviroidae that replicate in chloroplasts is accumulation of almost exclusively 21-nt and 22-nt siRNAs, whereas the hallmark of viroids of the family Pospiviroidae that replicate in nuclei is accumulation of 24-nt siRNAs, in addition to 21-nt and 22-nt siRNAs (see Supplementary Table S2 and references therein). For both types FIGURE 5 | Model for the biogenesis and action of endogenous viral element (EVE)-derived siRNAs. EVEs integrated in the plant genome can potentially be transcribed by Pol II in sense or antisense direction. dsRNA precursors of siRNAs can arise either via annealing of Pol II sense and antisense transcripts produced from the same EVE region or via RDR-mediated synthesis of complementary RNA on sense or antisense transcripts (scheme on the left). The resulting dsRNAs can potentially be processed into 21-nt, 22-nt, and 24-nt siRNAs by DCL4, DCL2, and DCL3, respectively. Likewise, Pol II can potential transcribe the EVE loci containing inverted repeats and the resulting transcript can fold back to form the hairpin dsRNA precursor of siRNAs (scheme in the middle). The hairpin dsRNA is preferentially processed by DCL2 into 22-nt siRNAs and less efficiently by other DCLs into 24-nt (DCL3) and 21-nt (DCL1 or DCL4) siRNAs. Pol II-dependent siRNAs can potentially direct DNA methylation at cognate EVE sequences. The methylated EVE sequences are then transcribed by Pol IV and the resulting transcripts are converted by RDR2 into dsRNA, followed by DCL3 processing into 24-nt siRNAs (scheme on the right). Pol IV-dependent 24-nt siRNAs mediate maintenance of RdDM at cognate EVEs and, in the case of infective EVEs, prevent their release from the genome as an episomal and transmissible virus. All types of siRNAs derived from EVEs can potentially target a cognate episomal virus released from the genome or an incoming exogenous virus. of viroids, siRNAs cover both strands of the entire circular viroid genome, suggesting that DCLs recognize and process perfect dsRNA intermediates of viroid replication or dsRNAs generated by RDR activities.
Because RNAi activity was not reported in chloroplasts or other plastids, Di Serio et al. (2009) have proposed that peach latent mosaic viroid (Pelamoviroid, Avsunviroidae)-derived 21-nt siRNAs and less abundant 22-nt siRNAs are generated by DCL4 and DCL2, respectively, in the cytoplasm of infected peach cells. Likewise, DCL4 and DCL2 have been proposed to mediate the biogenesis of predominantly 21-nt siRNAs and less abundant 22-nt siRNAs derived from both strands of apple hammerhead viroid (tentative Pelamoviroid) in apple (Zhang et al., 2014). The mechanism of biogenesis of chloroplastic viroid-derived siRNAs remains to be further investigated.
In the case of nuclear viroids, their replication in the nucleus via an asymmetric rolling circle mechanism (likely mediated by Pol II; reviewed in Katsarou et al., 2015;Rao and Kalantidis, 2015) generates perfect dsRNA replicative intermediates that might be accessed by DCL3 to generate 24-nt siRNAs. RNA blot hybridization analysis of N. benthamiana and its DCL knockdown lines infected with potato spindle tuber viroid (PSTVd, Pospiviroid, Pospiviroidae) has revealed that 21-nt, 22nt, and 24-nt viroid siRNAs are generated by DCL4, DCL2 and DCL3, respectively (Dadami et al., 2013;Katsarou et al., 2016). DCL1 may also contribute to the biogenesis of 21-nt viroid siRNAs, as evident in dcl2 dcl3 dcl4 triple knockdown lines (Katsarou et al., 2016). This is reminiscent of DNA viruses targeted by all four DCLs in A. thaliana, with both DCL4 and DCL1 contributing to production of 21-nt viral siRNAs (Blevins et al., , 2011Aregger et al., 2012). Interestingly, DCL1 plays a major role in processing 21-nt siRNAs from the dsRNA decoy likely generated by Pol II from viroid-like 8S RNA of cauliflower mosaic virus (Blevins et al., 2011) (Figure 4). It has been demonstrated using small RNA sequencing and blot hybridization that RDR6 is not required for the biogenesis of PSTVd-derived siRNAs of any size-class, although viroid accumulation was increased at the earlier (but not late) time point in N. benthamiana rdr6 knockdown plants (Di Serio et al., 2010). This supports the hypothesis that dsRNA intermediates of viroid replication are processed by DCLs. PSTVd-derived siRNAs are sorted by multiple AGOs based on 5 -nt identity and size, with AGO1, AGO2, and AGO3 being preferentially associated with 21-nt and 22-nt siRNAs, while AGO4, AGO5, and AGO9 additionally bound to 24-nt siRNAs, as demonstrated by immunoprecipitation of A. thaliana AGOs agro-expressed in N. benthamiana followed by small RNA sequencing (Minoia et al., 2014). This indicates that viroid siRNAs form active RISCs that can potentially mediate both post-transcriptional and transcriptional silencing. Indeed, siRNAs derived from both nuclear and chloroplastic viroids can direct silencing of the host plant mRNAs through sequence-specific cleavage and degradation (Navarro et al., 2012;Adkar-Purushothama et al., 2015, 2018, although viroids can also indirectly impact on the host gene expression at both post-transcriptional and transcriptional levels (Castellano et al., 2015;Tsushima et al., 2015;Torchetti et al., 2016;Zheng et al., 2017c). Notably, nuclear viroids can also trigger de novo RNA-directed DNA methylation of the transgenes containing viroid sequences (Wassenegger et al., 1994;Dalakouras et al., 2016). Whether or not viroid-derived 24-nt siRNAs are involved in this RdDM process remains unclear (Dalakouras et al., 2016).
Interestingly, Dianthus caryophyllus retroviroid-like element integrated in the genome of carnation could be identified by small RNA sequencing and assembly (Verdin et al., 2017), suggesting that it is potentially silenced epigenetically like other EVEs.

CONCLUDING REMARKS AND OUTLOOK
As described above, all families of land plant viruses and viroids spawn characteristic small RNAs whose deep sequencing and bioinformatics analysis allows for virus identification and virome reconstruction. The small RNA size, polarity and hotspot profiles are indicative of virus interactions with components of the plant small RNA-generating RNAi machinery and allow to distinguish between exogenous viruses and silent EVEs, some of which can potentially be released from the plant genome to establish systemic and transmissible infections. Based on the conservation of the small RNA-generating RNAi machinery in all land plants and eukaryotic algae, the small RNA-omics approach is universal for diagnostics of known viruses, identification of viruses or virus-like agents associated with diseases of unknown etiology, and exhaustive reconstruction of viromes of any plant species. Specialized but partially redundant functions of DCLs, RDRs, and AGOs in both endogenous and antiviral siRNA pathways revealed in the model plant A. thaliana imply that in those species of land plants that lack some of the paralogs of the DCL, RDR, or AGO family genes (Ma et al., 2015;You et al., 2017), other remaining paralogs can still function in defense against viruses, EVEs and LTR retrotransposons. Further research of non-model plants is needed to characterize components of the RNAi machinery and their roles in biogenesis and function of viral siRNAs. This research will certainly be facilitated by CRISPR-based gene editing technology, as for example recently implemented in tomato to reveal the role of one of the four paralogs of DCL2 in 22-nt siRNA biogenesis and antiviral defense . As mentioned above, combination of small RNA sequencing with direct sequencing of long RNA and DNA molecules using PacBio and Nanopore technologies should enable reconstruction of complete viral genomes and discovery of their mutant and recombinant variants in mixed virome quasispecies. The successful application of small RNA sequencing for reconstruction of viruses from the old seed and dried plant tissues described above opens up great opportunities for more exhaustive virome reconstruction from herbaria and other collections of dried plant materials used in paleovirology.
Notably, viral siRNA size and hotspot profiles of those propagative RNA viruses that replicate both in the plant and the insect vector differ substantially (Xu et al., 2012;Fletcher et al., 2016;de Haro et al., 2017). Unlike plants, insects possess Dicer-independent, PIWI-interacting RNA (piRNA)-generating machinery that may contribute to the antiviral defenses mediated by Dicer-dependent siRNA-generating machinery (Mongelli and Saleh, 2016;Sattar and Thompson, 2016). It will be interesting to investigate if transmission of propagative and non-propagative plant viruses by their insect or other eukaryotic vectors (Hull, 2014) is regulated by the vector siRNA and/or piRNA pathways.

AUTHOR CONTRIBUTIONS
MP wrote the manuscript, prepared the figures, and the Supplementary Tables and Lists. FUNDING My current research work on small RNAs is supported by grants from French INRA (SPE Department project ViroMix) and French ANR Foundation (AAPG2018 project Rome).

ACKNOWLEDGMENTS
I would like to thank all members of the plant virus study groups of International Committee on Taxonomy of Viruses (ICTV) and participants of the EU COST Action FA1407-DIVAS for their help in verifying and completing the data compiled in Supplementary Tables S1, S2. I apologize for I may have missed any relevant publications not listed in Supplementary Tables S1, S2 or provided incorrect information, and I would be grateful if the authors could inform me about those omissions or mistakes. I thank Dr. Rajendran Rajeswaran for critical reading of the manuscript.