Tracing Mobile DNAs: From Molecular to Population Scales

Transposable elements (TEs, transposons) are mobile DNAs that are prevalent in most eukaryotic genomes. In plants, their mobility has vastly contributed to genetic diversity which is essential for adaptive changes and evolution of a species. Such mobile nature of transposon has been also actively exploited in plant science research by generating genetic mutants in non-model plant systems. On the other hand, transposon mobilization can bring about detrimental effects to host genomes and they are therefore mostly silenced by the epigenetic mechanisms. TEs have been studied as major silencing targets and acted a main feature in the remarkable growth of the plant epigenetics field. Despite the importance of transposon in plant biology and biotechnology, their mobilization and the underlying mechanisms are largely left unanswered. This is mainly because of the sequence repetitiveness of transposons, which makes their detection and analyses difficult and complicated. Recently, some attempts have been made to develop new experimental methods detecting active transposons and their mobilization behavior. These techniques reveal TE mobility in various levels, including the molecular, cellular, organismal and population scales. In this review, we will highlight the novel technical approaches in the study of mobile genetic elements and discuss how these techniques impacted on the advancement of transposon research and broadened our understanding of plant genome plasticity.


INTRODUCTION
Transposable elements (TEs or transposons) are stretches of DNA that move around the genomes and are ubiquitous in most eukaryotic genomes (Feschotte, 2008;Lisch, 2012;Chuong et al., 2017). Particularly, the genomes of major food crops such as barley, wheat and maize contain myriads of transposons making up more than 80% of their genomes (Tenaillon et al., 2010). Among the diverse types of transposons, the long terminal repeat (LTR) retrotransposon is the predominant type of TEs in most plant genomes (Casacuberta and Santiago, 2003;Grandbastien, 2015;Cho, 2018;Satheesh et al., 2021) and thus will be the main focus of this review. The mobilization of an LTR retrotransposon is mediated by the reverse transcription of TE mRNAs to cDNAs (also referred to as extrachromosomal DNA, ecDNA), which happens in virus-like particles (VLPs) and is followed by the insertion to new genomic positions by the integrase (Cho et al., 2019;Satheesh et al., 2021). Due to the mobile nature of transposons and thereby potential danger of genomic instability, they are subject to the host genomes' epigenetic silencing pathways, including chromatin modification and DNA methylation (Slotkin and Martienssen, 2007;Matzke and Mosher, 2014). On the other hand, transposon is one of the major sources of genetic diversity, which is critical for evolution and adaptive changes of plants (Lisch, 2012;Dubin et al., 2018). Besides, TEs have been actively exploited in the plant science field as useful mutagenic reagents. For example, Tos17 in rice is specifically activated by in vitro tissue culture and the resulting random insertional mutants tagged with Tos17 are important genetic resources in the rice functional genomics (Hirochika et al., 1996;Hirochika, 2010). Similarly, Tnt1 was used to generate genetic mutants in Medicago truncatula, Brachypodium distachyon, and Glycine max (D'Erfurth et al., 2003;Tadege et al., 2008;Revalska et al., 2011;Cui et al., 2012;Nandety et al., 2020), and the maize Ac/Ds DNA transposon system was used as a functional genomics tool in Arabidopsis, Oryza sativa, and Glycine max (Long et al., 1993;Mathieu et al., 2009;Wang et al., 2013). Despite the vast importance of transposons, little is known about the regulatory mechanisms of their mobilization, which is largely because of the lack of experimental methods that can detect the transposition events with sufficient sensitivity and precision.
It is well documented that transposons can be transcriptionally activated by the environmental challenges and at specific cell types and developmental stages (Martínez and Slotkin, 2012;Cho, 2018;Cho et al., 2019). However, the mobilization of activated transposons hardly happens likely because of complex regulation at the post-transcriptional steps (Hung and Slotkin, 2021;Kim et al., 2021b). Owing to the scarcity of transposition events and technical difficulty to detect it, it has been challenging to study transposon mobilization. In the past, transposon insertion was inferred by phenotypic abnormalities caused by deleterious mutations of a gene disrupted by TE integration. For example, some of the epigenetic recombinant inbred lines (epiRILs) generated from the met1 mutant in Arabidopsis exhibited various abnormal phenotypes, which were associated with gene disruption caused by the transposition of Evade retroelement Reinders and Paszkowski, 2009). A PCR-based technique called transposon display (TD) and its derivative methods are usually the experimental approaches of choice to detect and locate new insertions of a transposon of interest (Kim et al., 2021a). Briefly, the adapter with known sequence is ligated to the restriction enzyme-digested DNA ends. PCR amplification by the specific sequences of the adapter and transposon ends yields amplicons containing the genomic regions flanking the transposon of interest. Although TD is an efficient and versatile method to study transposition events, it has certain fundamental limitations; for instance, transposon of high copy number is difficult to be amplified and hardly detected for new insertions. In addition, TD requires prior knowledge of TE sequences and thus relies on the quality of TE annotation. Most importantly, TD can only reveal the insertions that are meiotically inherited and fixed in the genomes, thus is not able to detect transpositions in real time and those happened in somatic cells (Figure 1). Over the last several years, there have been significant efforts to unveil the landscape of transpositions in the plant genomes by developing novel experimental methods. These innovative approaches reveal the mobilomes at varying scales from molecular to population levels. In this review, we will introduce and discuss the up-to-date experimental techniques tracing mobile DNAs in the plant genomes.

Molecular Level
The mobilization cycle of an LTR retrotransposon consists of transcription, reverse transcription, and integration to new genomic positions. Since the direct detection of transposon integration is relatively more challenging, the DNA intermediate which is the final product of reverse transcription reaction and the direct target of integration has been studied to infer the transposon mobility. In this section, the cutting edge methods detecting the DNA intermediates of LTR retrotransposons will be highlighted (Figures 2A,B).

Detection of Linear ecDNA
The reverse transcription reaction of transposon gives rise to linear extrachromosomal DNAs (eclDNAs) and it is the linear form of ecDNAs that is capable of integrating to genomic DNA (Cho et al., 2019;Wang et al., 2021a,b). As an attempt to detect eclDNA, Griffiths et al. (2018) established a method named sequence-independent retrotransposon trapping (SIRT). SIRT employs the adapter ligation to the end of eclDNAs and specific amplification targeted to the conserved primerbinding site (PBS) sequence, which is located immediately after the upstream LTR. Using this method a novel family of LTR retrotransposon named DODGER was identified in the Landsberg erecta ecotype of Arabidopsis mutated with MET1 (Griffiths et al., 2018). Unfortunately, SIRT exhibited limited robustness when tested in crop genomes, presumably because of the large size of the crop genomes and abundance of transposonrelated sequences. An improved method was then developed named amplification of LTR extrachromosomal DNA followed by sequencing (ALE-seq), which is able to detect the LTRs of crop genomes with larger size (Cho et al., 2019). ALEseq uses two primers specific to sequences of the adapter and PBS in two separate reactions: in vitro transcription and reverse transcription. Using this novel method, Cho et al. (2019) identified a new Copia-family LTR retrotransposon Go-on in the heat-stressed rice plants. Importantly, the ALE-seq method is particularly useful in non-reference crop species because the final amplicon product can reveal the full-length sequences of the LTR region. Such reference-and annotation-free approach was successfully tested in tomato pericarp samples and identified a novel Gypsy-family retroelement Fruit-Induced RetroElement (FIRE) (Cho et al., 2019). Although ALE-seq is sensitive enough to identify eclDNAs from crop genomes, it can only sequence the 5 LTR regions and it is desired to further improve this method to cover full range of a TE. Altogether, ALE-seq is a versatile, efficient and high-throughput method identifying active LTR retroelements in crop genomes. Because of the scarcity of newly copied DNA, transposon display method is unable to amplify these DNAs which are illustrated as faint bands. (C) The new TE copy that mobilized in germline cells is inherited to the next generation. The transgenerationally maintained new TE DNA can be amplified efficiently and is visible as a discrete band in a gel electrophoresis.

Detection of Circular ecDNA
Two LTRs of eclDNAs are bound by the integrases and homodimerization of integrases place two ends of an eclDNA close next to each other, which is then recognized as a DNA double-strand break by the cellular DNA damage response pathways (Møller et al., 2015(Møller et al., , 2016Lanciano et al., 2017). The homologous recombination and non-homologous end joining pathways repair the LTR-LTR gap, resulting in single-LTR and double-LTR extrachromosomal circular DNAs (eccDNAs), respectively . As a by-product of an activated LTR retrotransposon (albeit incapable of integration) eccDNA is considered to represent active TE mobility. Lanciano et al. (2017) established an experimental method called mobilome-seq that specifically sequences circular DNAs including retrotransposon-derived eccDNAs. The mobilome-seq procedure first initiates with digestion of linear DNA (mostly derived from genomic DNA) and randomly amplifying the remaining circular DNA by the isothermal stand displacement amplification (i.e., rolling circle amplification, Figure 2B). Unlike ALE-seq, mobilomeseq has additional advantage that can sequence full-length retroelement; however, it is also important to note that it reads sequences derived from organellar circular DNAs requiring additional filtering steps to remove them, which compromises the sequencing efficiency Satheesh et al., 2021). Nonetheless, mobilome-seq can be a useful approach to investigate active retroelements because it requires relatively low sequence coverage, which can be particularly useful to studies using rare plant materials and samples with limited availability. For example, Lanciano et al. (2017) discovered a PopRice retrotransposon family that becomes active in the rice endosperm. In addition, Thieme et al. (2017) identified Houba, a Copia-like retrotransposon in rice, that was activated by the treatment of chemical inhibitors of RNA Polymerase II and DNA methylation. Moreover, Esposito et al. (2019) found that nightshade, a Copia/Ale retrotransposon in potatoes, produces large amount of eccDNAs in non-stressed plants, while under the cold stress condition is no longer active, presumably because of the hypermethylation induced by cold stress. More recently, mobilome-seq revealed that Onsen, a Copia-like retrotransposon specifically activated in the heat-stressed Arabidopsis plants, produces eccDNAs mostly from two copies, AT1G11265 and AT5G13205 (Roquis et al., 2021). In summary, mobilome-seq is a useful method detecting the retrotransposon mobility by sequencing eccDNAs.

Long-Read Sequencing
One of the challenges in the study of transposon is that TE sequences are repetitive in genomes and thus cause serious ambiguity in their analysis. This is particularly more troublesome when analyzing short-read sequencing data. Recently, the longread sequencing technologies advanced remarkably and greatly improved the accuracy of transposon sequence analysis. For example, in the recent work of Panda and Slotkin, Oxford Nanopore Technology (ONT) sequencing was tested in the DNA methylation-deficient mutants of Arabidopsis, which significantly improved the quality of TE annotation (Panda and Slotkin, 2020). In an independent work by Lee et al. (2020) the ONT method was tested in the VLP fraction collected from the epigenetic mutants of Arabidopsis. This allowed direct identification of active transposable elements in their full lengths and also revealed diverse forms of DNA intermediates . Overall, the long-read sequencing technology is apparently a game-changer in the field of transposon research and highly expected to unveil the hidden aspects of transposon mobilization which was previously unable to be studied.

Cellular Level
In the previous section, we focused on the methods detecting the DNA intermediates produced from active LTR retrotransposons which could be used as a proxy of TE mobility. It is important to note that the presence of DNA intermediates can be a good indication of TE activation; however, it does not necessarily represent transposition events directly. While in plants there has not been any robust methods detecting transposition events at the cellular level so far, the transposition reporter system used in humans and yeast has served as a standard method assessing transposon mobility. In this section, the transposition reporter assay systems revealing transposon insertion at the cellular level will be introduced ( Figure 2C).
Retrotransposition reporter system was first suggested in yeast using the Ty retroelement TyH3 that includes an intron fragment (Boeke et al., 1985). Heidmann et al. (1988) had later developed an improved version using the neo (neomycin phosphotransferase) gene cassette (neoRT). In this system, the neo gene is disrupted by an artificial intron containing polyadenylation signals, thereby the functional neo proteins can be produced only from the transposed intron-free DNA (Heidmann et al., 1988). This method allows for determination of transposition efficiency when cells are grown in the selective G418-containing media (Heidmann et al., 1988). Similar methods have been developed to study mobilization of other types of TEs including the intracisternal A-type particles (IAPs) in mice and the long interspersed elements (LINEs) in Drosophila and human cells (Heldmann and Heidmann, 1991;Jensen and Heidmann, 1991;Tchenio et al., 1993;Maestre et al., 1995;Esnault et al., 2000). Further improvement of retrotransposition assay system was attempted by Moran et al. (1996) by developing the reporter cassette consisting of an antisense copy of neo gene incorporated in two human L1 elements (L1.2 and LRE2) in a cultured human cell line (Rangwala and Kazazian, 2009). In addition, other alternative methods have also been developed by employing blasticidin S deaminase, his3 auxotrophic marker and a lacZ colorimetric indicator (Curcio and Garfinkel, 1991;Tchenio and Heidmann, 1992;Goodier et al., 2007). However, such intron-containing reporter systems had some drawbacks that retrotransposition assay is dependent on antibiotics resistance and assessed by counting colonies, which usually takes long time and has relatively low throughput. Recently, innovations to this classical method have been made by replacing the antibiotics resistance genes to visual fluorescence (Ostertag et al., 2000) and bioluminescence genes (Xie et al., 2011), which dramatically increases the sensitivity and throughput, enabling large-scale screening experiments. In summary, the retrotransposition assay systems have been widely used to determine the transposition rate of a retroelement mostly in non-plant systems. Introducing such system to the plant systems will enable single-cell detection of transposition and greatly improve our understanding of transposon mobilization.

Droplet Digital PCR
The retrotransposition reporter assay system described in the previous section can be potentially useful for cell-and tissuelevel detection of transposition events. The synthetic artificial retrotransposon mobility assay is powerful because it enables direct visualization of transposition; however, such transgenic approach can be challenging in many non-reference plant species. Determination of copy number changes of an endogenous TE can be one of the easiest alternative methods to assess transpositional activity. It is worth noting, however, that the logarithmic quantitative real-time PCR analysis is difficult to measure subtle differences of copy number Fan and Cho, 2021). Droplet digital PCR (ddPCR) is a far more accurate and sensitive technique that allows for digital measurement of DNA copy number (Hindson et al., 2013;Doi et al., 2015;Campomenosi et al., 2016;Głowacka et al., 2016;Fan and Cho, 2021). The ddPCR experiment performs DNA amplification in thousands of nanoliter-scale droplets that readout positive or negative fluorescence signals ( Figure 2D). The resulting digital data is then processed by a Poisson probability distribution to derive copy numbers. In fact, we previously showed that ddPCR can be a robust method that accurately detects the copy number of a retrotransposon . Importantly, the ddPCR technique only requires a trace amount of DNA and is therefore possible to be performed in DNAs extracted from small amount of tissues and rare samples.

Population Level
Next-Generation Sequencing-Based Transposable Element Mapping Arabidopsis 1,001 genome project produced massive paired-end short-read whole-genome sequencing data from 1,135 accessions from a worldwide collection (Weigel and Mott, 2009;Cao et al., 2011;Alonso-Blanco et al., 2016). Similar attempt has been made in rice generating sequencing data from a total of more than 3,000 germplasm accessions Li Z. et al., 2014). Equipped with relatively well-assembled and annotated reference genomes available for both plant species, TE insertion polymorphisms have been intensively profiled at population level. Several softwares have been developed so far to systematically identify transposon insertions. These tools take advantage of diverse sequencing read information; for instance, split reads in Transposon Insertion Finder (TIF) (Nakagome et al., 2014), SPLITREADER (Baduel et al., 2021b), and RTRIP (Liu et al., 2020), discordant read pair alignment in TRACKPOSON (Carpentier et al., 2019), and combination of these two as demonstrated in TEPID (Stuart et al., 2016). Additionally, in a recent work of Baduel et al. (2021a) SPLITREADER and TEPID pipelines were integrated, building an intensive map of TE landscape in Arabidopsis.
The split-reads method first searches for reads containing the end sequences of a TE and target site duplications (TSDs), which are identical sequences flanking a TE and created as a result of transposition ( Figure 2E). In TIF, the read sequences tagged with transposon end sequences are mapped to the reference genome to identify the locations of de novo insertions (Nakagome et al., 2014). Similarly, SPLITREADER extracts reads that do not properly map to the reference genome and forcedly map to 5 and 3 TE sequence extremities (within a range of 300 bp) by soft clipping (Baduel et al., 2021b). Then, the bona fide insertions and their locations are identified by mapping the clipped reads to the reference genome. Recently, Liu et al. (2020) tested a similar method in rice and generated the RTRIP database, which contains the comprehensive profile of transposon insertion polymorphisms in the rice 3K genome project.
The discordant read pair method employs mapping of reads from one side to the target TE and the other side to a distant genomic region ( Figure 2F). TRACKPOSON, for instance, first maps all reads of a given accession onto each TE family represented by a single consensus sequence, and then maps the unmapped paired reads to the rice reference genome to determine its location (Carpentier et al., 2019). The transposition landscape revealed by these methods uncovered that transposon proliferation is most strongly associated with the presence of a transposon at a specific location, which was somehow activated during the evolutionary process (Carpentier et al., 2019).

CONCLUDING REMARKS AND FUTURE PERSPECTIVES
We reviewed the recent technical advances in transposon research by highlighting several new methods identifying active TEs and detecting transposition events (Figure 2). These novel experimental methods and improved transposon annotation aided by the long-read sequencing technologies and populationscale genome resequencing databases will open-up a new window to unveil a long-lasting mystery of jumping genes. Although the experimental techniques described above has greatly improved our ability to observe transposition events, there are still several issues left to be dealt with. Firstly, detection of transposition events at single-cell level will be obviously the next task to be accomplished. To this end, a novel approach for the single-cell detection of transposon mobilization is highly desired. Secondly, the single-cell genomics will vastly benefit the transposon biology. The transposition reporter systems introduced above rely on the artificially engineered TE sequences. The investigation of the native TEs and their transposition at high resolution will only be possible when the single-cell genomics technologies become more available. Thirdly, the detection sensitivity of transposon research tools will have to be improved further. The new experimental tools to study transposon such as ALE-seq and mobilome-seq are mostly tested in the epigenetic mutants where transposons become unusually active in mobility. Although these methods were sensitive enough to discover novel active retroelements (Go-on and PopRice), the moderately active TEs were difficult to be identified Cho et al., 2019). Considering the rarity of DNAs representing activated TE intermediates or derived from transposed copy, the improvement of detection sensitivity of these methods will help identify new transposons that are present in small niches of cells or activated only to a moderate level. Altogether, the technical advances of transposon research at varying scales have greatly contributed to our understanding of TE life cycle and will broaden the breadth of knowledge on mobile genetic elements and genome plasticity.

AUTHOR CONTRIBUTIONS
WF, LW, JiC, HL, and EK drafted the manuscript. EK and JuC edited the manuscript. JuC revised the manuscript. All authors contributed to the article and approved the submitted version.

FUNDING
The work was supported by the grants from the National Natural Science Foundation of China (31970518, 32150610473, and 32111540256), the Strategic Priority Research Program of the Chinese Academy of Sciences (XDB27030209), and the General Program of Natural Science Foundation of Shanghai (21ZR1470700). EK was the recipient of a President's International Fellowship Initiative (PIFI) young staff fellowship (2021FYB0001) from CAS.