EDITED BY : Dóra Szakonyi, Ana Confraria, Concetta Valerio, Paula Duque and Dorothee Staiger PUBLISHED IN : Frontiers in Plant Science

### Frontiers Copyright Statement

© Copyright 2007-2019 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.

The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.

Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.

Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.

As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.

All copyright, and all rights therein, are protected by national and international copyright laws.

The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use. ISSN 1664-8714 ISBN 978-2-88963-025-7 DOI 10.3389/978-2-88963-025-7

### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# PLANT RNA BIOLOGY

Topic Editors:

Dóra Szakonyi, Instituto Gulbenkian de Ciência, Portugal Ana Confraria, Instituto Gulbenkian de Ciência, Portugal Concetta Valerio, Instituto Gulbenkian de Ciência, Portugal Paula Duque, Instituto Gulbenkian de Ciência, Portugal Dorothee Staiger, Bielefeld University, Germany

Artwork by Dóra Szakonyi

Discoveries from the past decades revealed that RNA molecules are much more than inert intermediates between the coding DNA sequences and their functional products, proteins. Today, RNAs are recognized as active regulatory molecules influencing gene expression, chromatin organization and genome stability, thus impacting all aspects of plant life including development, growth, reproduction and stress tolerance.

Innovations in methodologies, the expanding application of next-generation sequencing technologies, and the creation of public datasets and databases have exposed a new universe of RNA-based mechanisms and led to the discovery of new families of non-coding RNAs, uncovered the large extent of alternative splicing events, and highlighted the potential roles of RNA modifications and RNA secondary structures. Furthermore, considerable advances have been made in identifying RNA-binding and processing factors involved in the synthesis and maturation of different forms of RNA molecules as well as in RNA processing, biochemical modifications or degradation.

This Research Topic showcases the broad biological significance of RNAs in plant systems and contains eight original research articles, one review and four mini-reviews, covering various RNA-based mechanisms in higher plants. Emerging new technologies and novel multidisciplinary approaches are empowering the scientific community and will expectedly bring novel insights into our understanding of the mechanisms through which RNA is regulated and regulates biological processes in plant cells.

Citation: Szakonyi, D., Confraria, A., Valerio, C., Duque, P., Staiger, D., eds. (2019). Plant RNA Biology: Lausanne: Frontiers Media. doi: 10.3389/978-2-88963-025-7

# Table of Contents

### *06 Editorial: Plant RNA Biology*

Dóra Szakonyi, Ana Confraria, Concetta Valerio, Paula Duque and Dorothee Staiger

### RNA STRUCTURE

*09 New Era of Studying RNA Secondary Structure and its Influence on Gene Regulation in Plants*

Xiaofei Yang, Minglei Yang, Hongjing Deng and Yiliang Ding

### SPLICING

*16 A Role of U12 Intron in Proper Pre-mRNA Splicing of Plant* Cap Binding Protein 20 *Genes*

Marcin Pieczynski, Katarzyna Kruszka, Dawid Bielewicz, Jakub Dolata, Michal Szczesniak, Wojciech Karlowski, Artur Jarmolowski and Zofia Szweykowska-Kulinska

*30 Ectopic Transplastomic Expression of a Synthetic MatK Gene Leads to Cotyledon-Specific Leaf Variegation*

Yujiao Qu, Julia Legen, Jürgen Arndt, Stephanie Henkel, Galina Hoppe, Christopher Thieme, Giovanna Ranzini, Jose M. Muino, Andreas Weihe, Uwe Ohler, Gert Weber, Oren Ostersetzer and Christian Schmitz-Linneweber

*44 Alternative Splicing as a Regulator of Early Plant Development* Dóra Szakonyi and Paula Duque

### RNA STABILITY

*53 Functional Characterization of SMG7 Paralogs in* Arabidopsis thaliana Claudio Capitao, Neha Shukla, Aneta Wandrolova, Ortrun Mittelsten Scheid and Karel Riha

### NON-CODING RNAs

*63 Under a New Light: Regulation of Light-Dependent Pathways by Non-coding RNAs*

Camila Sánchez-Retuerta, Paula Suaréz-López and Rossana Henriques


Cristiane P. G. Calixto, Nikoleta A. Tzioutziou, Allan B. James, Csaba Hornyik, Wenbin Guo, Runxuan Zhang, Hugh G. Nimmo and John W. S. Brown

*102 Nuclear Speckle RNA Binding Proteins Remodel Alternative Splicing and the Non-coding Arabidopsis Transcriptome to Regulate a Cross-Talk Between Auxin and Immune Responses*

Jérémie Bazin, Natali Romero, Richard Rigo, Celine Charon, Thomas Blein, Federico Ariel and Martin Crespi

*115 Regulation of Plant Microprocessor Function in Shaping microRNA Landscape*

Jakub Dolata, Michał Taube, Mateusz Bajczyk, Artur Jarmolowski, Zofia Szweykowska-Kulinska and Dawid Bielewicz


José Á. Martín-Rodríguez, Alfonso Leija, Damien Formey and Georgina Hernández

*152 Respective Contributions of URT1 and HESO1 to the Uridylation of 5*′ *Fragments Produced From RISC-Cleaved mRNAs*

Hélène Zuber, Hélène Scheer, Anne-Caroline Joly and Dominique Gagliardi

# Editorial: Plant RNA Biology

Dóra Szakonyi <sup>1</sup> \* † , Ana Confraria<sup>1</sup> \* † , Concetta Valerio<sup>1</sup> , Paula Duque<sup>1</sup> and Dorothee Staiger <sup>2</sup>

1 Instituto Gulbenkian de Ciência, Oeiras, Portugal, <sup>2</sup> RNA Biology and Molecular Physiology, Faculty of Biology, Bielefeld University, Bielefeld, Germany

Keywords: posttranscriptional regulation, RNA processing, RNA stability, non-coding RNAs, RNA structure, transcriptomics

### **Editorial on the Research Topic**

### **Plant RNA Biology**

Initially regarded as mere intermediates between coding DNA sequences and proteins, RNAs have been ascribed a plethora of novel roles in the past decades. Today, they are recognized as active regulatory molecules, influencing processes such as gene expression, chromatin organization or genome stability that affect all aspects of a plant's life. This Research Topic aims at highlighting the broad biological significance of RNAs in plant systems and contains eight original research articles,

### Edited by:

Diane C. Bassham, Iowa State University, United States

### Reviewed by:

Marie-Theres Hauser, University of Natural Resources and Life Sciences Vienna, Austria

### \*Correspondence:

Dóra Szakonyi dszakonyi@igc.gulbenkian.pt Ana Confraria aaugusto@igc.gulbenkian.pt

†These authors have contributed equally to this work

### Specialty section:

This article was submitted to Plant Cell Biology, a section of the journal Frontiers in Plant Science

Received: 02 May 2019 Accepted: 21 June 2019 Published: 09 July 2019

### Citation:

Szakonyi D, Confraria A, Valerio C, Duque P and Staiger D (2019) Editorial: Plant RNA Biology. Front. Plant Sci. 10:887. doi: 10.3389/fpls.2019.00887 one review and four mini-reviews, covering various RNA-based mechanisms in higher plants. As they are being synthesized, RNAs fold into secondary structures, which can influence RNA metabolism at multiple levels, including transcript processing and stability, or protein translation. Yang et al. not only bring attention to the significance of RNA structures, but also summarize the currently available methods to study the folding of specific RNA molecules or profile genome-scale

RNA structuromes. After transcription, RNAs require proper maturation through a multi-step process integrating capping, splicing, and polyadenylation. Pieczynski et al. describe the importance of accurate splicing of the precursor mRNA (pre-mRNA) encoding the small subunit of the Arabidopsis nuclear cap-binding complex, CBP20, which contains seven introns. Of these, intron 4 belongs to the U12 class of introns and separates the gene into an N-terminal half encoding the RNA-binding moiety and a C-terminal half encoding a nuclear localization signal. Using intron swapping and mutagenesis, the authors demonstrate that the U12 intron is crucial for correct CBP20 splicing. Mitochondrial and chloroplastidial transcripts also undergo processing, which in the chloroplast is largely carried out by imported nuclear-encoded RNA-binding proteins and RNA-binding factors encoded by the organellar genome. Chloroplasts of land plants retain a single plastid-encoded splicing factor, intron maturase MatK. Qu et al. ectopically expressed MatK in tobacco chloroplasts and observed variegation of the cotyledons, indicating that adequate MatK levels are required for proper chloroplast development. Interestingly, the splicing pattern of selected plastid genes was not

chloroplast gene expression. Alternative splicing, which results from differential splicing of the same pre-mRNA, represents a major source of transcriptome and proteome diversity and an important regulation mechanism for development and stress responses in plants. Szakonyi and Duque scrutinize the role of alternative splicing during early plant development. They review the current knowledge on how alternative splicing changes during seed maturation, germination, and seedling formation in the dark and light. Furthermore, they compile the splicing factors with crucial roles during these initial steps of development and the alternatively-spliced genes with putative functional roles for different splice forms.

strongly affected in these lines, but rather translation was inhibited, leading to reprogramming of

Alternative splicing or mutations can introduce premature termination codons (PTCs) that target mRNAs to degradation by an evolutionarily-conserved process termed nonsense-mediated decay (NMD). The work presented by Capitao et al. focuses on SMG7 proteins, conserved components of the NMD machinery in eukaryotes. Plants lost several core NMD genes during evolution, but SMG7 genes expanded and acquired novel functions. In contrast to previous findings, the authors report that SMG7 is not an essential gene in Arabidopsis and fulfills an additional, NMD-unrelated function in meiosis. Moreover, SMG7-like (SMG7L), a close paralog arising from gene duplication in dicots, is not functionally redundant with SMG7 though its biological role remains unclear.

Research in the last couple of decades has also uncovered several different types of non-coding RNAs, grouped according to size into long non-coding (lnc) RNAs and a highly diverse pool of small RNAs. The latter can be further classified depending on their biogenesis pathway and include microRNAs (miRNAs), encoded by MIR genes, as well as several classes of short interfering RNAs (siRNAs), generated from double-stranded RNA (Borges and Martienssen, 2015). Non-coding RNAs exert different regulatory functions in plants, playing important roles in developmental programs but also integrating environmental cues and participating in stress responses (Chekanova, 2015; D'ario et al., 2017; Li et al., 2017; Sun et al., 2018). In this context, Sánchez-Retuerta et al. summarize the current knowledge on light regulation of miRNAs and lncRNAs and their functions in photomorphogenesis, circadian clock regulation, and photoperiod-dependent flowering. On the other hand, the review by Cho highlights the potential role of transposon-derived lncRNAs and small RNAs, which suppress harmful effects of transposition, but can also act on non-transposon transcripts affecting development and stress responses. Furthermore, Calixto et al. analyze the effect of cold stress on the expression and processing of regulatory non-coding RNAs, performing an ultradeep sequencing of a diel time-series of Arabidopsis plants. The authors identified specific lncRNAs and primary miRNAs differentially expressed and/or alternatively spliced in response to cold stress that potentially contribute to acclimation and freezing tolerance. Bazin et al. explore the role of Nuclear Speckle RNA-binding proteins (NSRs), which are known alternative splicing modulators, through a regulatory loop involving the lncRNA Alternative Splicing Competitor RNA (ASCO) (Bardou et al., 2014). Using RNA immunoprecipitation coupled with RNA-seq, the authors showed that NSRs directly interact with lncRNAs, many of which are upregulated in the nsra nsrb mutant. This suggests that NSRs regulate the steady-state abundance of lncRNAs, which in turn may contribute to splicing regulation.

In the plant cell nucleus, miRNAs are processed from primary miRNA transcripts by the Microprocessor complex, which is composed of three core proteins in Arabidopsis: Dicer-Like 1 (DCL1), Hyponastic Leaves 1 (HYL1), and Serrate (SE) (Song et al., 2019). Dolata et al.review how the Microprocessor complex is regulated, emphasizing the posttranslational regulation of its components, but also covering posttranscriptional regulatory mechanisms. Chitarra et al. reveal miRNA changes associated to development, photosynthesis, jasmonate signaling, and disease resistance in Vitis vinifera infected with the phytoplasma Flavescence dorée. Importantly, the authors provide a valuable tool for the research community working on Vitis spp. (miRVIT, available at http://mirvit.ipsp.cnr.it/); they assemble a comprehensive database of novel putative gravepine miRNAs, uniformly reannotating and aligning all described accessions to the latest version of the grape genome and listing their validated and predicted target transcripts. In common bean, Martin-Rodriguez et al. explore the role of the nodule-expressed miR319d in the regulation of rhizobial infection and symbiotic nodule development. The authors identified the transcription factor TEOSINTE BRANCHED/CYCLOIDEA/PCF 10 (TCP10) as a target of miR319d and propose that the effect of miR319d-TCP10 on nodulation involves the modulation of jasmonate signaling through the transcriptional regulation of the jasmonate biosynthetic gene LIPOXYGENASE 2 (LOX2). The 5′ fragments of ARGONAUTE 1 (AGO1)-mediated cleavage guided by miRNAs and siRNAs are uridylated at the 3′ end to facilitate further degradation. In Arabidopsis, uridylation of these fragments is carried out by two terminal uridylyltransferases (TUTases) HEN1 SUPPRESSOR 1 (HESO1) and URIDYLYLTRANSFERASE 1 (URT1), whose activity is explored in the study of Zuber et al. The authors optimize a 3 ′RACE-seq method to analyze quantitatively and qualitatively the uridylation at the 3′ ends of AGO1 5′ cleavage products. Showcasing the applicability of 3′RACE-seq, the 5′ cleavage fragments of the miR159 target MYB33 and the miR156/157 target SQUAMOSA-PROMOTER BINDING PROTEIN-LIKE 13 (SPL13) are compared in the wild type and mutants impaired in TUTase activity. The authors uncover different contributions of HESO1 and URT1 to the uridylation of the cleavage fragments, with HESO1 being the main TUTase.

The diverse collection of articles in this Research Topic demonstrates the current vibrant and rapid progression of the plant RNA biology field. Current major challenges include, but are not limited to, identifying functional protein-RNA interactions (Foley et al., 2017), ascribing biological functions to different RNAs (Zhao et al., 2019), and dissecting tissue complexity using single cell transcriptomics (Ryu et al., 2019). Technological advances coupled to the increase in publiclyavailable data are expected to allow the community to address these and other issues at a fast pace in upcoming years, bringing key novel insights into our understanding of the mechanisms through which RNA is regulated and regulates biological processes in plant cells.

### AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

### ACKNOWLEDGMENTS

We thank all authors who submitted their work for this Research Topic as well as the invaluable help of reviewers in manuscript evaluation and the support of professional editorial staff at Frontiers. DS, AC, CV, and PD are funded by Fundação para a Ciência e a Tecnologia (grants: UID/Multi/04555/2013, PTDC/BIA-PLA/1084/2014, PTDC/BIA-FBT/31018/2017, and

REFERENCES


PTDC/BIA-BID/32347/2017). DSt is funded by the German research foundation through STA653/9, STA653/13, and STA653/15.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Szakonyi, Confraria, Valerio, Duque and Staiger. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# New Era of Studying RNA Secondary Structure and Its Influence on Gene Regulation in Plants

Xiaofei Yang<sup>1</sup>† , Minglei Yang<sup>1</sup>† , Hongjing Deng1,2,3† and Yiliang Ding<sup>1</sup> \*

<sup>1</sup> Department of Cell and Developmental Biology, John Innes Centre, Norwich, United Kingdom, <sup>2</sup> State Key Laboratory of Plant Genomics and National Center for Plant Gene Research, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China, <sup>3</sup> College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China

### Edited by:

Dora Szakonyi, Instituto Gulbenkian de Ciência (IGC), Portugal

### Reviewed by:

Cuncong Zhong, University of Kansas, United States Carmen Hernandez, Instituto de Biología Molecular y Celular de Plantas (IBMCP), Spain Aleksandar Spasic, University of Rochester, United States

### \*Correspondence:

Yiliang Ding yiliang.ding@jic.ac.uk †These authors have contributed equally to this work.

### Specialty section:

This article was submitted to Plant Cell Biology, a section of the journal Frontiers in Plant Science

Received: 28 February 2018 Accepted: 02 May 2018 Published: 22 May 2018

### Citation:

Yang X, Yang M, Deng H and Ding Y (2018) New Era of Studying RNA Secondary Structure and Its Influence on Gene Regulation in Plants. Front. Plant Sci. 9:671. doi: 10.3389/fpls.2018.00671 The dynamic structure of RNA plays a central role in post-transcriptional regulation of gene expression such as RNA maturation, degradation, and translation. With the rise of next-generation sequencing, the study of RNA structure has been transformed from in vitro low-throughput RNA structure probing methods to in vivo high-throughput RNA structure profiling. The development of these methods enables incremental studies on the function of RNA structure to be performed, revealing new insights of novel regulatory mechanisms of RNA structure in plants. Genome-wide scale RNA structure profiling allows us to investigate general RNA structural features over 10s of 1000s of mRNAs and to compare RNA structuromes between plant species. Here, we provide a comprehensive and up-to-date overview of: (i) RNA structure probing methods; (ii) the biological functions of RNA structure; (iii) genome-wide RNA structural features corresponding to their regulatory mechanisms; and (iv) RNA structurome evolution in plants.

# Keywords: RNA structurome, gene regulation, regulatory RNAs, RNA structure and function, plant RNA biology

RNA secondary structure plays many essential roles in RNA synthesis, metabolism, and regulatory pathways (Bevilacqua et al., 2016; Vandivier et al., 2016). Previous efforts to determine RNA structure depended on classical and time-consuming techniques, such as nuclear magnetic resonance spectroscopy (NMR), X-ray crystallography, and cryo-electron microscopy (**Table 1**) (Lengyel et al., 2014). However, these methods yielded data limited to a few key RNAs with comparatively short length (less than 200 nt) and high abundance (∼1 µmol).

More recently, enzymatic and chemical structure probing methods have been developed to routinely and efficiently obtain structural information of individual RNAs. Ribonucleases (RNase) cleave either single-stranded (ss) RNA regions or double-stranded (ds) RNA regions to indicate RNA base-pairing status. The most commonly used enzymatic probing reagents include RNase V1 (for dsRNA), RNase S1 (for ssRNA), RNase A (for C/U in ssRNA), and RNase T1 (for G in ssRNA) (Knapp, 1989). The RNase-based RNA structure probing method has been used extensively in studying RNA structure with less toxicity, but with the limitation of cell permeability (Kwok, 2016). For chemical probing, two main types of chemical reagent can be used. One modifies the Watson-Crick base-pairing face on the nucleobase, as a direct measure of single-strandedness. Dimethyl sulfate (DMS) is one of the most commonly used

nucleobase probing reagents as it easily penetrates the cell, a pre-requisite for in vivo chemical probing (Zaug and Cech, 1995). Another example is 1-cyclohexyl-(2-morpholinoethyl) carbodiimide metho-p-toluene sulfonate (CMCT), which targets the unpaired N3 position of uracil and the unpaired N1 position of guanine (Incarnato et al., 2014); while 3-ethoxy-1,1-dihydroxy-2-butanone (kethoxal) attacks the unpaired N1 and unpaired exocyclic amine positions of guanine (Noller and Chaires, 1972). Among these reagents, DMS is predominantly used to probe RNA structures in different organisms (Ding et al., 2014; Rouskin et al., 2014; Talkish et al., 2014; Deng et al., 2018). The other type of chemical reagent modifies the ribose, by selective 2<sup>0</sup> -hydroxyl acylation and which can be analyzed by primer extension (SHAPE) (Mortimer and Weeks, 2007; Spitale et al., 2013). A particular advantage of SHAPE is that it generates structural information for all four nucleotides at the same time.


### Frontiers in Plant Science | www.frontiersin.org

Polyacrylamide gel electrophoresis (PAGE) assays were traditionally used to measure the modified pattern of both enzymatic and chemical reactions (Noller and Chaires, 1972; Knapp, 1989; Zaug and Cech, 1995). However, these gel-based assays were limited to highly abundant and short (less than 200 nt) RNAs. The application of capillary electrophoresis (CE) improved the detection limits of both the length (up to 400 nt) and the abundance of RNA (**Table 1**) (Watts et al., 2009). A recent application of CE on Arabidopsis thaliana long non-coding RNA (lncRNA), COOLAIR, revealed the remarkable complexity of RNA structure up to 750 nt (Hawkes et al., 2016). A further improved method on probing sensitivity, DMS/SHAPE-LMPCR, was developed in Arabidopsis thaliana (**Table 1**). This method achieved "attomole" sensitivity allowing RNA structure probing of low abundance RNAs in living cells (Kwok et al., 2013). By subsequently combining the action of DMS with next-generation sequencing high-depth RNA structural information of very long RNAs was achieved (Lucks et al., 2011; Smola et al., 2016). For instance, the structural information of over 18 kb lncRNA, Xist, was fulfilled in a single experiment (Smola et al., 2016). The development of these approaches has significantly improved the sensitivity and resolution for probing individual RNA structure both in vitro and in vivo. The capability for single nucleotideresolution quantitative measurements on any RNA down to 1 attomole and up to 18 kb enables efficient functional investigation of RNA structure in biological processes.

Genome-wide RNA structure profiling was initially achieved by coupling enzymatic probing with next-generation sequencing, PARS (parallel analysis of RNA structure) (**Table 1**). It was developed in yeast by measuring the catalytic activity of two enzymes, RNase V1 (for dsRNA) and S1 (for ssRNA) (Kertesz et al., 2010). This method was extended in Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, and Homo sapiens (Li et al., 2012a,b; Wan et al., 2014). An enhanced method, PIP-seq (protein interaction profiling sequencing), complements RNA–protein interaction information with in vitro RNA structure profiling (**Table 1**) (Foley et al., 2015; Gosai et al., 2015; Foley and Gregory, 2016). A further improvement on genome-wide scale RNA structure profiling extended to living cells and addressed native RNA folding status. By harnessing the cell permeability of DMS, the first genome-wide in vivo RNA structure profiling method, Structure-seq, was developed in Arabidopsis (Ding et al., 2014, 2015) in parallel with DMSseq and Mod-seq in yeast (Rouskin et al., 2014; Talkish et al., 2014) (**Table 1**). Both methods reveal in vivo RNA structures are more single-stranded than in vitro and in silico computational predicted RNA structures. Use of the Structure-seq method was recently extended to rice (Deng et al., 2018). A follow-up genome-wide in vivo RNA structure profiling method, icSHAPE (in vivo click SHAPE), was developed in mouse by using the SHAPE chemical reagent with the power of four-nucleotide probing (**Table 1**) (Spitale et al., 2015). In addition to measuring reverse transcription stopping, chemical modification can also be determined by mutational profiling (**Table 1**) (Siegfried et al., 2014; Smola et al., 2016; Zubradt et al., 2017).

These powerful genome-wide methods can provide an accurate and quantitative in vivo RNA structure map over tens of thousands of RNA with single nucleotide-resolution. These technological advances create an unprecedented scale for the in-depth study of the global impact of RNA structure in gene regulation. For example, regulatory RNAs are able to act as a master regulator in gene expression. In general, these regulatory RNAs directly turn on or off gene expression by altering RNA secondary structure. A recent study of RNA structure characterization on a range of regulatory RNAs in Arabidopsis is illustrated below (**Figure 1**).

A riboswitch is a type of regulatory RNA that contains specific RNA structure segments, which can change conformation depending on specific ligand binding, e.g., metabolites. A wellstudied example of a riboswitch is the vitamin B1 derivative thiamin pyrophosphate (TPP), which resides in the 3<sup>0</sup> UTR region of the thiamin biosynthetic gene THIC (Wachter et al., 2007) (**Figure 1A**). With a low TPP concentration, the 3<sup>0</sup> end processing of THIC mRNA results in a short 3<sup>0</sup> UTR that permits high expression of the THIC gene. Conversely, with a high TPP concentration, TPP binds directly with the 3<sup>0</sup> end of the RNA and induces a structural change that prevents splicing. This results in a long 3<sup>0</sup> UTR inducing RNA degradation, subsequently reducing THIC gene expression (Wachter et al., 2007). Unlike riboswitches in bacteria that control translation through a structural change in the 5<sup>0</sup> UTR, plants may have evolved a diverse and more complicated alternative 3<sup>0</sup> end processing mechanism in order to cope with a large number of metabolites (Wachter et al., 2007).

Not only are some metabolites able to bind to specific RNA structures to regulate their synthesis pathways, but some are also able to regulate their own expression levels. A plant conserved pre-mRNA of transcription factor IIIA (TFIIIA) contains a 5S rRNA mimic structural element in one of its exons (Hammond et al., 2009) (**Figure 1C**). When ribosomal protein L5 binds to this 5S rRNA mimic, it triggers exon skipping in TFIIIA mRNA to control TFIIIA levels (Hammond et al., 2009). This ribosomal protein–mRNA interaction provides a new-found class of RNAs regulating alternative splicing to control the protein level.

Furthermore, specific RNA structural motifs such as G-quadruplexes (GQS) also play an important role in gene expression regulation. RNA GQSs are typically more stable in the presence of potassium or sodium. Tens of thousands of putative GQSs were identified in Arabidopsis and other plant species (Mullen et al., 2010). A recent study reported the first highly-conserved plant RNA GQS located in the 5<sup>0</sup> UTR of ATAXIA TELANGIECTASIA-MUTATED AND RAD3- RELATED (ATR), inhibiting its translation when forming stable GQS structures (Kwok et al., 2015) (**Figure 1E**). Interestingly, potassium concentrations in plant cells can dramatically increase under drought stress (Mullen et al., 2010). Thus, GQS structural motifs in plants may specifically act as a regulator in response to abiotic stress, such as drought and salinity.

Long non-coding RNAs have also been shown as important regulatory RNAs involved in various biological processes. The study of lncRNA structures has been limited in the past due to their long length and low abundance. Advances in probing methods has enabled the highly-conserved plant lncRNA COOLAIR structure to be determined by chemical profiling with CE (Hawkes et al., 2016). COOLAIR is a key regulator of a

in response to different TPP concentrations, resulting in different 3<sup>0</sup> end processing to control gene expression. (B) The highly-conserved plant lncRNA COOLAIR shows a highly complex structure that links to its biological function in flowering. (C) A 5S ribosomal RNA mimic regulates alternative splicing of transcription factor IIIA pre-mRNAs. (D) Several studies show that RNA structure determines miRNA biogenesis and processing. (E) An RNA G-quadruplex was reported to be able to regulate its own translation.

major plant developmental gene FLC (FLOWERING LOCUS C), in response to vernalisation. The distal COOLAIR isoform in Arabidopsis (**Figure 1**) is highly-structured with numerous secondary structural motifs, an intricate multi-way junction, and two unusual asymmetric 5<sup>0</sup> internal loops (Hawkes et al., 2016) (**Figure 1B**). Interestingly, a single nucleotide polymorphism (SNP) in the natural variation accession, Var2-6, is able to change the structure to affect the RNA stability, resulting in a late-flowering phenotype in Var2-6 (Hawkes et al., 2016). RNA secondary structure determination has progressed our understanding of the structure–function relationship of lncRNAs for the first time in plants.

The other well-known regulatory RNAs, miRNAs, also heavily rely on RNA structure for their regulatory functions (Herr and Baulcombe, 2004). The double-stranded region of miRNA precursors (pri-miRNAs) are recognized and processed by Dicer protein, an RNase III-like enzyme (Herr and Baulcombe, 2004). Previous studies in plants on both individual miRNA precursors and genome-wide assessment of pri-miRNA processing products confirmed that different structure determinants within primiRNAs compete for the processing machinery (Song et al., 2010; Bologna et al., 2013) (**Figure 1D**). A recent RNA structure characterization study by NMR shows the upper stem of a double-stranded region of pri-miR156 is important for Dicer processing at different temperatures, that substantiates the structure-determined Dicer processing feature (Kim et al., 2017). After Dicer processing, an Argonaute (AGO) protein will recognize the processed duplex miRNA to target mRNA containing complementary sequence for either RNA cleavage or translational inhibition (Valencia-Sanchez et al., 2006). Genomewide in vitro RNA structure profiling in Arabidopsis revealed a less structured pattern in miRNA target sites that indicates the relationship between miRNA target efficiency and the singlestranded structural feature (Li et al., 2012b).

Apart from these studies on regulatory RNAs, recent genomewide research also reveals the general role of mRNA structure in a variety of post-transcriptional regulations such as RNA maturation, RNA stability, RNA location and translation.

Alternative splicing is an important process in RNA maturation. More than 40% of Arabidopsis genes possess alternative spliced isoforms (Filichkin et al., 2010). The first in vivo RNA structure profiling in Arabidopsis revealed a

significantly less structural pattern in the 40 nt region upstream of the 5<sup>0</sup> splice site for unspliced events (including exon skipping and intron retention) (Ding et al., 2014). PIP-seq further revealed that this kind of structural pattern results in more RNA-protein interactions in Arabidopsis nuclei (Gosai et al., 2015; Foley and Gregory, 2016). Interestingly, PIP-seq also found the robust structure at the 3<sup>0</sup> splice site is responsible for more protein interactions (Gosai et al., 2015; Foley and Gregory, 2016). Thus, these RNA structural features indicate an important role of RNA structure in regulating alternative splicing.

Another RNA maturation process is alternative polyadenylation (APA) that is found in over 60% of mRNAs in Arabidopsis (Loke et al., 2005). In vivo RNA structure profiling shows a strong structural pattern in the U- and A-rich upstream region of the cleavage site as well as a single-stranded region at the cleavage site (Ding et al., 2014). These patterns may correlate with the recognition of endonucleases for regulating APA. Further study using PIP-seq shows more protein bound up- and downstream of the APA cleavage site as compared to constitutive polyadenylation events (Gosai et al., 2015). However, APA sites do not exhibit altered in vitro RNA secondary structure compared to constitutive sites (Gosai et al., 2015). This suggests there may be different effects of RNA structure on both protein binding and cleavage activity, that warrant closer investigation.

In addition to RNA maturation, the relationship between RNA structure and RNA degradation has also been uncovered by in vitro RNA structurome analysis in Arabidopsis. Unlike yeast, highly-structured mRNAs are more likely to be degraded in Arabidopsis, probably via specific siRNA processing (Li et al., 2012b).

An interesting study on RNA mobility in plants shows that a stem-bulge-stem-loop tRNA-derived structural motif is sufficient to mediate mRNA transport. A large number of mRNAs containing this motif can be moved through graft junction (Zhang et al., 2016). Thus, RNA structure might also affect intercellular communication across plants.

Another major impact of RNA structure is its regulatory role in translation. Both in vitro and in vivo RNA structure profiling show a single-stranded region upstream of the start codon that might facilitate ribosome initiation (Li et al., 2012b; Ding et al., 2014). Moreover, a triplet periodic trend is observed in the CDS region but not in UTRs. These structure patterns are obvious in mRNAs with high translation efficiency and are absent in those with low translation efficiency (Ding et al., 2014). This implies that ribosomes may recognize RNA secondary structure as an additional layer of information alongside sequence content.

Additionally, RNA structure is also strongly associated with RNA methylation sites and RNA binding protein (RBP) sites. For example, N<sup>6</sup> -methylation of adenosine alters the stability of the A·U pair (Harcourt et al., 2017). Cellular RNAs show a decrease in base pairing around sites of m6A when they were methylated (Harcourt et al., 2017). Recent genome-wide studies indicate that the N<sup>6</sup> -methyladenosine (m6A) prefers single-stranded conformations rather than doublestranded structures (Zhao et al., 2017). A genome-wide study in Arabidopsis shows an enrichment of m6A around the start codon, stop codon and 3<sup>0</sup> UTR region (Luo et al., 2014). Interestingly, this enrichment region of m6A is well-correlated with the single-stranded region identified in RNA structure profiling (Luo et al., 2014; Zhao et al., 2017). A study of the RNA structurome in rice also confirmed that higher m6A modification sites tend to have less RNA structure (Deng et al., 2018). This indicates that m6A association may alter RNA structure to more single-strandedness to facilitate gene regulation.

Another key player in post-transcriptional regulations is RBP. Unlike DNA binding protein, RBP associates not only with the primary sequence motifs, but also RNA structural patterns. A recent study combining genome-wide RBP profiling and RNA secondary structure profiling shows that RBP binding sites tend to be more single-stranded (Gosai et al., 2015). Interestingly, a nuclear PIP-seq study confirms that both RNA secondary structure and RBP binding sites show quite different patterns between hair and non-hair cells in plants (Foley et al., 2017). This suggests that cell-type-specific RNA structure and RBP binding may be a new regulatory mechanism during plant development.

From an evolutionary perspective, the conservation and diversity of RNA structurome between species remains poorly understood. A recent study compared, for the first time, the conservation and divergence of in vivo RNA structurome between plant species, to assess the evolutionary adaptation of RNA structure (Deng et al., 2018). This study found that in vivo RNA secondary structure conservation does not correlate with sequence conservation between rice and Arabidopsis. The conservation and divergence in both sequence and RNA secondary structure are highly relevant with specific biological processes (Deng et al., 2018). This indicates evolutionary selection not only modifies sequence, but also alters RNA structure to regulate gene expression. This in turn suggests that RNA secondary structure may serve a different layer of selection to sequence in plants.

Recent methodology advances have overcome previous limitations of both low-throughput and in vitro conditions for studying RNA secondary structure. These new methods, with single-nucleotide resolution, genome-wide scale and high sensitivity, significantly accelerate the study of in vivo RNA structure and associated biological functions. Plants are more sensitive than animals to varying environmental conditions, such as changes in temperature, salinity, acidity, and heavy metal concentrations (Schleuning et al., 2016). These factors are able to affect RNA folding (Tan and Chen, 2011; Wan et al., 2012; Sun et al., 2017). By applying these new RNA structure analysis methods under a range of environmental conditions, we will be able to determine how RNA structure alters in response to these changes. By integrating RNA structure profiling with mutagenesis assays and phenotypic analysis, the relationship between RNA structure and biological function can be investigated in greater details. Extending analyses to other plant species provides scope for exploring evolutionary selection at the RNA structure level. It is notable that this new era of studying RNA secondary structure provides unprecedented opportunity for discovering novel regulatory mechanisms of gene expression in plants.

# AUTHOR CONTRIBUTIONS

fpls-09-00671 May 17, 2018 Time: 19:31 # 6

XY, MY, HD, and YD wrote the manuscript. MY produced the figure. XY, HD, and YD made the corrections. All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

### FUNDING

This study was supported by the Biotechnology and Biological Sciences Research Council (Grant No. BB/L025000/1 to

### REFERENCES


XY, HD, and YD) and the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation program (Grant No. 680324 to MY and YD).

### ACKNOWLEDGMENTS

We apologize to authors whose work has not been cited here, because of the limitations associated with the review format.


animal extinction under climate change. Nat. Commun. 7:13965. doi: 10.1038/ ncomms13965


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Yang, Yang, Deng and Ding. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# A Role of U12 Intron in Proper Pre-mRNA Splicing of Plant Cap Binding Protein 20 Genes

Marcin Pieczynski<sup>1</sup>† , Katarzyna Kruszka<sup>1</sup>† , Dawid Bielewicz<sup>1</sup> , Jakub Dolata<sup>1</sup> , Michal Szczesniak<sup>2</sup> , Wojciech Karlowski<sup>3</sup> , Artur Jarmolowski<sup>1</sup> \* and Zofia Szweykowska-Kulinska<sup>1</sup> \*

<sup>1</sup> Department of Gene Expression, Institute of Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University in Poznan, Poznan, Poland, <sup>2</sup> Department of Integrative Genomics, Institute of Anthropology, Faculty of Biology, Adam Mickiewicz University in Poznan, Poznan, Poland, <sup>3</sup> Department of Computational Biology, Institute of Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University in Poznan, Poznan, Poland

### Edited by:

Dora Szakonyi, Instituto Gulbenkian de Ciência (IGC), Portugal

### Reviewed by:

Song Li, Duke University, United States Hunseung Kang, Chonnam National University, South Korea

### \*Correspondence:

Artur Jarmolowski artjarmo@amu.edu.pl Zofia Szweykowska-Kulinska zofszwey@amu.edu.pl

†These authors have contributed equally to this work.

### Specialty section:

This article was submitted to Plant Cell Biology, a section of the journal Frontiers in Plant Science

Received: 12 January 2018 Accepted: 27 March 2018 Published: 16 April 2018

Citation:

Pieczynski M, Kruszka K, Bielewicz D, Dolata J, Szczesniak M, Karlowski W, Jarmolowski A and Szweykowska-Kulinska Z (2018) A Role of U12 Intron in Proper Pre-mRNA Splicing of Plant Cap Binding Protein 20 Genes. Front. Plant Sci. 9:475. doi: 10.3389/fpls.2018.00475 The nuclear cap-binding complex (CBC) is composed of two cap-binding proteins: CBP20 and CBP80. The CBP20 gene structure is highly conserved across land plant species. All studied CBP20 genes contain eight exons and seven introns, with the fourth intron belonging to the U12 class. This highly conserved U12 intron always divides the plant CBP20 gene into two parts: one part encodes the core domain containing the RNA binding domain (RBD), and the second part encodes the tail domain with a nuclear localization signal (NLS). In this study, we investigate the importance of the U12 intron in the Arabidopsis thaliana CBP20 gene by moving it to different intron locations of the gene. Relocation of the U12 intron resulted in a significant decrease in the U12 intron splicing efficiency and the accumulation of wrongly processed transcripts. These results suggest that moving the U12 intron to any other position of the A. thaliana CBP20 gene disturbs splicing, leading to substantial downregulation of the level of properly spliced mRNA and CBP20 protein. Moreover, the replacement of the U12 intron with a U2 intron leads to undesired alternative splicing events, indicating that the proper localization of the U12 intron in the CBP20 gene secures correct CBP20 pre-mRNA maturation and CBP20 protein levels in a plant. Surprisingly, our results also show that the efficiency of U12 splicing depends on intron length. In conclusion, our study emphasizes the importance of proper U12 intron localization in plant CBP20 genes for correct pre-mRNA processing.

Keywords: U12 introns, U2 introns, mRNA splicing, CBP20, Arabidopsis thaliana

# INTRODUCTION

The cap-binding complex (CBC) is a nuclear heterodimer composed of two cap-binding proteins: cap-binding protein 20 (CBP20) and cap-binding protein 80 (CBP80, also known as ABA Hypersensitive 1, ABH1), which binds to the 5<sup>0</sup> cap of all RNA polymerase II transcripts. In plants, CBC is important for pre-mRNA and pri-miRNA first intron splicing and regulation of pre-mRNA alternative splicing (Szarzynska et al., 2009; Raczynska et al., 2010). The CBC is also involved in miRNA biogenesis (Kim et al., 2008; Laubinger et al., 2008; Ren and Yu, 2012). Amino acid sequence comparison of Arabidopsis thaliana and Oryza sativa CBP20 and CBP80 proteins

**16**

revealed high sequence conservation of the small subunit of CBC (CBP20) and a considerably lower level of conservation of its large subunit (CBP80) (Kmieciak et al., 2002). Interestingly, in comparison to animal CBP20 proteins, plant CBP20s contain an additional carboxy-terminal fragment. The structure of the CBP20 protein in higher plants can be divided into two parts. The core part, built mostly of 138 amino acids, contains the conservative RNA binding domain (RBD), which plays a crucial role in cap structure recognition and binding (Izaurralde et al., 1994), and the tail part, built of approximately 120 amino acids, contains a nuclear localization signal (NLS). In animals, this is the CBP80 protein that carries a functional NLS and is responsible for the import of CBC into the nucleus (Dias et al., 2009). In A. thaliana, two functionally independent NLSs are located in the tail part of the CBP20 protein (Kmieciak et al., 2002). In contrast, plants CBP80s do not contain any NLS; thus, the CBP20 protein targets the whole plant CBC to the nucleus. Moreover, in A. thaliana, the CBP20 protein is stabilized by CBP80 (Kierzkowski et al., 2009). The down-regulation of nuclear cap-binding proteins in Arabidopsis leads to mild developmental abnormalities, such as serrated rosette leaves and delayed development. Interestingly, loss of CBC functions in Arabidopsis plants confers hypersensitivity to abscisic acid (ABA) during seeds germination, significant reduction of stomatal conductance, and in consequence greatly enhances tolerance of the cbp20 and cbp80 mutants to drought (Hugouvieux et al., 2001; Papp et al., 2004; Jäger et al., 2011; Pieczynski et al., 2013) and salinity (Kong et al., 2014).

Arabidopsis and rice CBP20 gene structures are conserved and contain eight exons. The core part of the CBP20 protein is encoded by the first four exons of the gene, while the tail part is encoded by the last four exons. Six out of seven introns of the CBP20 gene belong to the classical and abundant U2 introns, while intron no. 4 represents the much rarer U12 introns. Computational analyses allowed the identification in the A. thaliana genome 246 U12 introns constituting 0.17% of all predicted introns in this species (Alioto, 2007). Transcriptome analyses, however, identified about eight times more U12 introns (2069 vs. 246) than previously found using the computational approach (Marquez et al., 2012). The majority of U12 introns (89.4%) contain GT-AG terminal dinucleotides; however, a small portion (10.6%) is characterized by the presence of other terminal dinucleotides, of which AT-AC comprises almost half of such non-GT-AG introns (4.8%) (Marquez et al., 2012). U12 introns contain a very characteristic and highly conserved branch point region (TTCCTTRAY), and unlike U2 introns, they do not have any polypyrimidine tract (Lewandowska et al., 2004; Simpson and Brown, 2008; Marquez et al., 2012; Turunen et al., 2013). It has been shown that several proteins of U11/U12 di-snRNP are indispensable for the correct splicing of the U12 intron-containing genes, which is crucial for the normal development of A. thaliana (Kim et al., 2010; Kwak et al., 2012; Jung and Kang, 2014; Xu et al., 2016). Moreover, the Arabidopsis quatre-quart1 (QQT1) gene is an indispensable U12 intron-containing gene whose correct splicing is necessary for the wild type phenotype and development of Arabidopsis plants (Kwak et al., 2017).

The phylogenetic distribution of U12 introns shows that the minor (U12) splicing pathway appeared very early in eukaryotic evolution, but during the course of evolution, most U12 introns systematically changed to U2 introns (Bartschat and Samuelsson, 2010; Lin et al., 2010). Despite this process, a few U12 introns were retained in selected genes and remain very stable in some taxa (Basu et al., 2008). CBP20 genes from both Arabidopsis and rice were identified as belonging to this small number of genes containing U12 introns with AT-AC terminal dinucleotides. Moreover, the U12 intron in CBP20 gene in both plant species is located between exons no. 4 and no. 5, splitting the CBP20 coding sequence into the core and tail CBP20 protein parts in both species studied. Surprisingly, the exons encoding the core part of the protein have identical lengths (18t, 224, 139, and 34 nt), but the exons encoding the tail part of CBP20s differ considerably in length.

In this paper, we show that the CBP20 gene structure is conserved across land plant species from liverworts to higher plants. All studied CBP20 genes contain eight exons and seven introns, with the fourth intron belonging to the U12 class. In addition, the length of the first four exons is conserved, while the length of the exons encoding the CBP20 tail part varies considerably in all plant species studied. Furthermore, we show that the CBP20 U12 introns in plants may be as short as 76 nt and as long as 2733 nt. Our experiments demonstrate that the efficiency of U12 splicing depends on intron length. The experiments carried out to explain the conserved localization of the CBP20 U12 intron show that the exchange of the CBP20 U12 intron with the U2 intron leads to undesired alternative splicing events and that the proper localization of the U12 intron in the CBP20 gene secures correct CBP20 pre-mRNA maturation and CBP20 protein levels in a plant.

### MATERIALS AND METHODS

### Plant Material and Growth Conditions

The experiments were performed using A. thaliana (L.) Columbia-0 wt plants (Lehle Seeds, Round Rock, TX, United States) and a homozygous T-DNA insertion line cbp20 (Papp et al., 2004). Solanum tuberosum ssp. tuberosum var. Sante, Nicotiana tabacum var. Xanthi, Hordeum vulgare var. Sebastian and liverwort Pellia endiviifolia sp B were used for the CBP20 gene sequencing.

Arabidopsis plants were grown in 'Jiffy-7 42mm' soil (Jiffy Products International BV, Moerdijk, Nederland) in an MLR-350H Versatile Environmental Test Chamber (Sanyo, Loughborough, Leicestershire, United Kingdom) with a 16 h light/8 h dark photoperiod (approx. 80 µmol m−<sup>2</sup> s −1 ), constant temperature of 22◦C and humidity of 70%. Potato plants were grown in sterile conditions in a greenhouse (22◦C with constant light, approximately 80 µmol m−<sup>2</sup> s −1 ) on <sup>1</sup>/<sup>2</sup> Murashige-Skoog medium pH 5.5–5.6. N. tabacum and H. vulgare plants were grown in a greenhouse (22◦C with 12 h light/12 h dark photoperiod, approx. 120 µmol m−<sup>2</sup> s −1 ) on soil irrigated with mineral nutrients. P. endiviifolia sp B was grown as described by Sierocka et al. (2011).

Arabidopsis thaliana transformation was performed using the floral-dip method according to Clough and Bent (1998).

### DNA and RNA Isolation

fpls-09-00475 October 11, 2019 Time: 16:44 # 3

Total genomic DNA, RNA and plasmid DNA were isolated using a DNeasy Plant Mini Kit (Qiagen), RNeasy Plant Mini Kit (Qiagen), and QIAprep Spin Miniprep Kit, respectively, according to protocols supplied by the manufacturers. Purity and amounts of DNA and RNA were determined using NanoDrop Spectrophotometer (Thermo Scientific).

### DNA Sequencing

DNA sequencing was performed with a BigDye v3.1 sequencing kit (Applied Biosystems, Foster City, CA, United States) on a ABI Prism 3130XL Analyzer (Applied Biosystems) in the Molecular Biology Techniques Laboratory, Faculty of Biology, Adam Mickiewicz University in Poznan, Poland.

### cDNA Synthesis, PCR and DNA Cloning

Four micrograms of total RNA was reverse-transcribed using Superscript III Reverse Transcriptase (Invitrogen) and oligo(dT)<sup>15</sup> primer (Novazym). cDNA of CBP20 genes from potato, tobacco, barley and liverwort was amplified by PCR using primers designed according to ESTs from NCBI database (Benson et al., 2012). In the case of potato, we designed primers according to the EST 706129 sequence. Tobacco CBP20 gene cDNA was amplified using primers designed according to the contig sequence assembled on the basis of nine different EST sequences (accession numbers: AM815125, CV017334, AM818189, EB679251, EB440185, AM827799, AM808999, EB444475, and EB678046). Barley CBP20 cDNA was assembled on the basis of the following EST sequences: TC111197 and TC78828 from TIGR database and BU991417, BJ480285, AJ463125, and AJ475973 from NCBI. Primers for cDNA CBP20 amplification from P. endiviifolia were designed according to the results of whole transcriptome sequencing.

PCR was carried out as described in Szarzynska et al. (2009) and Sierocka et al. (2011). Genomic sequence of the CBP20 gene from potato and tobacco was amplified using the Expand Long Template PCR System (Roche). PCR products were separated on a 1% agarose gel, purified using a QIAquick PCR Purification Kit (Qiagen), cloned into the pGEM-T Easy vector (Promega) and sequenced. Primers used for amplification of genomic and cDNA sequences of the CBP20 gene from different plant species are shown in Supplementary Table S1. Amplification of CBP20 cDNAs as well as genes from potato, tobacco, barley and liverwort is shown in Supplementary Figure S1.

### Expression Construct Preparation

Mini- and midi-gene constructs numbers 1, 8, and 9 were prepared by PCR amplification of appropriate A. thaliana and Physcomitrella patens CBP20 gene fragments. Mini- and midi-gene constructs numbers 2–7 were prepared as presented in Supplementary Figure S2. Each construct contains exons no. 4 and no. 5 from the A. thaliana CBP20 gene. Depending on the construct, these two exons are separated by a U12 or U2 intron originating from different plant species. Mini-gene constructs were prepared in 3- or 4-step PCR. In PCR-1, the U2 or U12 intron was amplified using a forward primer containing on its 5 0 end 20 nucleotides complementary to exon no. 4. The reverse primer used in PCR-1 contained at its 5<sup>0</sup> end 20 nucleotides complementary to exon no. 5. In PCR-2, the sequence of exon no. 5 was amplified using a forward primer complementary to the last 20 nucleotides of the U2 or U12 intron, depending on the individual mini-gene construct. The final mini-gene sequence was amplified in PCR-3, in which PCR-1 and PCR-2 products were used as templates. In mini-gene constructs containing the U12 intron derived from the S. tuberosum or Vitis vinifera CBP20 gene, an additional PCR-0 was performed in which the U12 intron sequence was amplified.

Each mini-gene construct was cloned into the pDH515 vector at a unique BamHI restriction site within an intronless zein gene encoding a maize seed storage protein. Additionally, the vector contains the 35S promoter of Cauliflower mosaic virus (CaMV 35S) and terminator sequence (Lewandowska et al., 2004).

The maxi-gene wt transgene construct was prepared by PCR amplification of the Arabidopsis CBP20 gene sequence encompassing its native promoter (1228 bp) (Kmieciak et al., 2002). The PCR product was cloned into the NotI restriction site in the pENTR vector. The other five maxi-gene constructs were obtained performing a series of PCRs using the wt transgene construct or A. thaliana genomic DNA as a template. Specific sub-fragments of each maxi-gene construct were obtained in the PCR, followed by the consecutive cohesive ends annealing and final PCR amplification (Supplementary Figure S3). The individual maxi-gene constructs were transferred from the pENTR cloning vector to the pEarleyGate302 expression vector using Gateway LR Clonase II Enzyme Mix (Thermo Fisher Scientific). All constructs were confirmed by DNA sequencing. Primers used for construct preparation are listed in Supplementary Table S1.

### Protoplast Isolation, Transfection and Splicing Analysis

Splicing of mini- and midi-gene constructs was analyzed in protoplasts isolated from leaves of N. tabacum var. Xanthi (Lewandowska et al., 2004). Protoplast suspension was transfected with each mini- or midi-gene construct as described by Lewandowska et al. (2004). After overnight incubation, total RNA from protoplasts was isolated. cDNA template was synthesized using the zein3<sup>0</sup> -R reverse primer complementary to the zein sequence flanking mini- and midi-gene constructs in the pDH515 vector (Simpson and Filipowicz, 1996). Splicing analyses were carried out by RT-PCR using following primers: zeinF-FAM labeled with fluorescent phosphoramidite 6-FAM at 5<sup>0</sup> end and zeinR (see Supplementary Table S1).

6-FAM-labeled RT-PCR products were quantitatively analyzed using capillary electrophoresis on an ABI 3130xl Genetic Analyzer. The length of the labeled products was calculated using Peak Scanner Software v1.0 (Applied Biosystems) by comparison with the GeneScan-350 TAMRA

size standards (Applied Biosystems). Quantification of RT-PCR products was carried out after 23rd amplification cycle by measurement of the fluorescent peak areas of the detected fragments. RT-PCR was carried out in three biological replicates. The same RT-PCR products were separated in parallel on a 1% agarose gel. 6-FAM-labeled fragments were eluted from the gel, cloned into the pGEM T-Easy (Promega) vector and sequenced to confirm specific splicing products.

Splicing analysis of maxi-gene transcripts that were introduced into the A. thaliana cbp20 mutant was also carried out using capillary electrophoresis, as described above. Primers used for RT-PCR amplification hybridized to the sequence of exons no. 3 and no. 6 of the CBP20 gene transcript (Supplementary Table S1). The forward primer Splice-FAM-F was labeled at its 5<sup>0</sup> end with the fluorescent phosphoramidite 6-FAM.

### RT-qPCR

Real-time RT-qPCR was performed with Power SYBR <sup>R</sup> Green PCR Master MIX (Applied Biosystems, Warrington, United Kingdom) on a 7900HT Fast Real-Time PCR System (Applied Biosystems) in 10-µl reaction volumes in 384-well plates. To assess the splicing efficiency of mini-genes, an absolute quantification approach was applied. Standard curves were prepared using plasmids containing individual mini-genes. Splicing efficiency was calculated by comparison of spliced and unspliced transcript levels to the total transcript amount. To estimate the total level of the CBP20 transcript in maxi-gene transgenic lines, the fold change was calculated using the 2-11Ct method. The mRNA fragment of elongation factor 1-alpha (EF1-alpha, TAIR locus: At1g07930) was amplified as a reference gene in A. thaliana.

### Western Blot

Protein extracts were separated by 10% SDS-PAGE, transferred to a polyvinylidene fluoride membrane (PVDF; Immobilon <sup>R</sup> -P, Millipore), and analyzed by western blot using antibodies at the indicated dilutions: anti-Actin (691001; MP Biomedicals) at 1:5000 and anti-CBP20 (AS09 530; Agrisera) at 1:1000.

### Bioinformatics Tools

Comparison between DNA and protein sequences was carried out using bioinformatics tools: ClustalW2<sup>1</sup> and BoxShade<sup>2</sup> . Sequencing results were assembled together using the ContigExpress program from Vector NTI (Invitrogen).

### Sequence Accession Numbers

Arabidopsis thaliana, Gene ID: 834443; Populus trichocarpa, Gene ID: 7469563; V. vinifera, Gene ID: 100265980; S. tuberosum, GU046516.1; N. tabacum, GU058037.1; O. sativa, Gene ID: 4329963; H. vulgare, FJ548567.1; S. moellendorffii – Selaginella moellendorffii v1.0, Scaffold 3333196:1<sup>3</sup> ; P. patens – P. patens V1.2\_genome scaffold\_105<sup>4</sup> .

## RESULTS

### Plant CBP20 Gene Structure Is Evolutionarily Conserved and Reveals Preserved Localization of a U12 Intron

The analysis of CBP20 genes from A. thaliana and O. sativa revealed high sequence and structural conservation. To gain deeper insight into the evolutionary conservation of the CBP20 gene, we decided to compare CBP20 sequences from various plant species across the plant kingdom. The CBP20 gene and cDNA sequences from P. trichocarpa, V. vinifera, Selaginella moellendorffii, and P. patens were found in publicly available databases (**Figure 1**). In addition, full-length CBP20 cDNA sequences from S. tuberosum var. Sante, N. tabacum var. Xanthi, H. vulgare var. Sebastian and the liverwort P. endiviifolia subspecies B were established in our laboratory. Based on the obtained CBP20 cDNA sequences, we determined the full-length CBP20 gene sequences in potato, tobacco, barley and liverwort genomes.

We compared the CBP20 gene structure from ten plant species, including representatives of Bryophyta (liverworts, mosses), Leucophyta (lycophyte), and Spermatophyta (monocots and dicots) (**Figure 1A**). In all species analyzed, the CBP20 genes contain 8 exons and 7 introns. Interestingly, their first four exons, coding for the core part of the protein, are of the same length with one exception, S. moellendorffii, whose second exon is three nucleotides shorter than that in the other CBP20 genes studied. In all CBP20 genes, a U12 intron was found separating the evolutionarily conserved core part that encodes a canonical RBD from the carboxy-terminal fragment (tail) that encodes the NLS and generally exhibits a much lower degree of evolutionary conservation. All these U12 introns contain the canonical AT-AC terminal dinucleotides and a branch point sequence typical of U12 introns (**Figure 1B**). The CBP20 U12 intron length varies from 76 nt in S. moellendorffii to 2733 nt in V. vinifera. In general, plant species can be divided into two classes: those carrying short U12 introns (from 76 to 284 nt) and those carrying long U12 introns (from 1619 to 2733 nt) (**Figure 1A**).

There are considerable differences between different plant CBP20 gene lengths. The S. moellendorffii CBP20 gene (1245 bp) is 10 times shorter than that from S. tuberosum (11430 bp). In the potato CBP20 gene, the fifth intron is extremely long, approximately 6050 bp, which largely accounts for the unusual length of this gene. Computational analysis of this intron sequence has revealed that it contains characteristic signatures of the LINE1 retrotransposon. Since we could not identify any full open reading frame for reverse transcriptase, we assumed that this retroelement is defective.

Comparison of all plant CBP20 amino acid sequences revealed the presence of conserved residues within the core domain that

<sup>1</sup>www.ebi.ac.uk

<sup>2</sup>www.ch.embnet.org

<sup>3</sup>https://genome.jgi.doe.gov

<sup>4</sup>www.cosmoss.org

were reported to be responsible for binding to the cap structure and for recognition of RNA (Supplementary Figure S4) (Mazza et al., 2002). In the tail fragment of all CBP20 proteins, we identified potential NLS sequences. In the majority of plant species, the CBP20 gene contains two NLS motives, as was shown before for A. thaliana, while in P. trichocarpa and P. patens, we identified only one motif: the proximal NLS. Generally, sequence homology of full-length CBP20 proteins in plants ranges from 61 to 79% identity, and that of the highly conserved core part ranges from 80 to 89% identity. The homology within the tail part is rather low (39–69% identity), but the distal part of this tail region shows again a relatively high degree of conservation, suggesting that this part of the protein may play a structural or functional role.

Brown, 2008). Dashed lines represent individual nucleotide sequences between the 5<sup>0</sup>

It was shown that U12 introns over the course of evolution were usually exchanged with U2 introns (Burge et al., 1998; Bartschat and Samuelsson, 2010). The evolutionarily conserved localization of the plant CBP20 U12 intron separating the conserved core domain from the tail part of the protein in all CBP20 plant genes characterized encouraged us to investigate the role of this intron in CBP20 pre-mRNA maturation.

splice site and branch point. Dots mark alignment gaps. Nucleotides that

### Efficiency of U12 Splicing Depends on the Intron Length

As we have shown, the length of CBP20 U12 introns in the studied plant species varies considerably. To test whether U12 intron length may influence U12 intron splicing efficiency, we prepared five mini-gene constructs. These mini-genes consist of A. thaliana CBP20 exon no. 4 (34 bp) and exon no. 5 (111 bp) separated by the following: construct no. 1 – the original A. thaliana CBP20 U12 intron (134 bp), construct no. 2 – the P. patens CBP20 U12 intron (153 bp), construct

differ from the consensus sequence in the branch point region are marked in black. The conserved adenine nucleotides in the branch point are marked in red.

no. 3 – the O. sativa CBP20 U12 intron (239 bp), construct no. 4 –the S. tuberosum CBP20 U12 intron (1619 bp) and construct no. 5 – the V. vinifera CBP20 U12 intron (2733 bp) (**Figure 2A**). The mini-genes were cloned into the pDH515 expression vector (Lewandowska et al., 2004) and sequenced. These recombinant plasmids were then used for tobacco mesophyll protoplasts transfection. Total RNA was isolated from the transfected protoplasts incubated overnight, and cDNA derived from the mini-gene transcripts was prepared. Splicing analyses were carried out applying RT-PCR with fluorescent-labeled primers. The splicing efficiency of each construct was determined as the mean of three independent experiments. All U12 introns were spliced correctly; no alternative splicing events were observed (**Figure 2B**). However, the splicing efficiency of different mini-genes varied considerably. We found a correlation between the U12 intron length and splicing efficiency as revealed by RT-qPCR using the absolute quantification method (**Figure 2C** and Supplementary Table S2). The splicing efficiency of the 134 nt long U12 intron from A. thaliana was assumed in this comparison to be 1. The U12 introns derived from O. sativa, S. tuberosum and V. vinifera CBP20 genes were spliced 1.28, 7.17, and 34.65 times more efficiently than that from A. thaliana, respectively (**Figure 2C**).

An exception to this rule was observed in the splicing efficiency of the U12 intron derived from P. patens. Despite its almost identical length with the U12 intron from A. thaliana, the splicing efficiency of this U12 intron was five times lower.

### The Replacement of a U12 Intron With a U2 Intron in the CBP20 Mini-Gene Improves Splicing Efficiency but Leads to Deleterious and Undesired Improper Splicing Events

To learn more about the role of the U12 intron in CBP20 plant genes, we constructed a series of mini-genes containing A. thaliana exons no. 4 and no. 5 separated by the original CBP20 U12 intron (134 bp – construct no. 1), the U2 intron derived from the A. thaliana CBP80 gene (intron no. 11, 146 bp – construct no. 6) and the U2 intron derived from the P. sativum legumin gene (intron no. 1, 138 bp – construct no. 7) (**Figure 3A**). Both

U2 introns were chosen because of their similar length to that of the original A. thaliana CBP20 U12 intron. The selected U2 introns have canonical GT-AG splice site dinucleotides and a polypyrimidine tract near the 3<sup>0</sup> splice site and show a low GC content, 26% and 29% for the pea legumin and Arabidopsis CBP80 genes, respectively. U12 intron from the A. thaliana CBP20 gene has also a low GG content – 36%.

Splicing of the mini-gene transcripts was studied in the tobacco protoplast system, as described in the case of mini-genes containing various U12 introns. The original U12 intron was spliced with only 4.77% efficiency, whereas both studied U2 introns, from the Arabidopsis CBP80 gene and pea legumin gene, were spliced, reaching 79.66 and 94.4% efficiency, respectively (**Figure 3B**). These results are supported by data that have been previously obtained in our laboratory (Lewandowska et al., 2004), showing that A. thaliana U12 introns are spliced less efficiently than are U2 introns in tobacco protoplasts. However, in the mini-gene containing the U2 intron derived from the A. thaliana CBP80 gene, an alternative splicing event was observed: two alternative 5<sup>0</sup> splice sites were recognized by the tobacco U2 splicing machinery, leading to the proper splicing of the CBP20 mini-gene transcript (42.61%) or to the inclusion of an additional 38 nt from the 5<sup>0</sup> intron end into the spliced mRNA (37.05%) (**Figures 3C,D**). The 38 nt long insertion introduces a premature stop codon resulting in premature translation termination. To test whether this alternative splicing event also occurs naturally in the CBP80 gene, we analyzed the splicing of this intron in the CBP80 transcript.

We were able to detect only constitutively, properly spliced CBP80 mRNA, without any additional alternative splicing events (Supplementary Figure S5).

## Arabidopsis CBP20 U12 Intron Surrounded by Additional Exons and U2 Introns Is Not Efficiently Recognized by the U12 Splicing Machinery

It has been shown that the extension of mini-genes composed of two exons from GSH2 interrupted by a U12 intron with neighboring exons and U2 introns improves U12 splicing efficiency (Lewandowska et al., 2004). To test whether this is also true for other genes containing U12 introns, we prepared two midi-genes that were based on A. thaliana and P. patens CBP20 genes. A. thaliana (construct no. 8) and P. patens (construct no. 9) midi-genes consist of E3-i3(U2)-E4-i4(U12)-E5-i5(U2)-E6 from the A. thaliana and P. patens CBP20 genes, respectively (**Figure 4A**). The midi-genes were studied in tobacco protoplasts as described before in the case of mini-genes.

The splicing patterns of A. thaliana and P. patens midi-transcripts were similar (**Figure 4B**). We obtained one main and many additional minor products. Sequencing of these products has shown that the main band (265 bp in A. thaliana and 241 bp in P. patens) represents a spliced product consisting of exons no. 3 and no. 6. Thus, splicing in that case leads to the exon skipping event by the removal of all three introns and exons no. 4 and no. 5. The remaining minor bands represent various spliced products listed in **Figure 4C**, among which a fully and properly spliced transcript containing all four exons (E3-E4-E5-E6) was also detected. Quantitative analysis shows that the main improperly spliced product was present in approximately 50% of all other differentially spliced RNA molecules, in both Arabidopsis and Physcomitrella midi-genes (**Figures 4C,D**). The fully spliced transcript containing all four exons (E3-E4-E5-E6) was present only in approximately 12.47% of all splicing products in Arabidopsis and 3.75% in Physcomitrella. Thus, in contrast to the GSH2 midi-gene, the neighboring U2 introns and exons do not enhance the recognition of the U12 intron in the case of Arabidopsis and moss CBP20 midi-genes. Instead, the U12 intron signals are mainly not recognized properly in these constructs.

### The Proper Localization of the U12 Intron in the CBP20 Gene Secures Correct Pre-mRNA Maturation

To study the influence of the U12 intron on the splicing efficiency of the full CBP20 pre-mRNA in plants, we prepared constructs composed of the natural A. thaliana CBP20 gene promoter and the whole wild-type (wt) gene sequence or its variants (the maxi-genes series) (**Figure 5B**). In addition to the wt maxi-gene (wt transgene), we constructed CBP20 gene variants in which (i) the U12 intron was replaced with the U2 intron derived from the A. thaliana CBP80 gene (intron no. 11; U12→U2), (ii) the U12 intron was removed (1U12), and (iii) exons no. 4 and no. 5 were swapped (exon swap). Additionally, we prepared two constructs based on the CBP20 gene variant U12→U2. In the first construct, the U12 intron was moved upstream in the CBP20 gene and replaced the U2 intron no. 3 (U12 in core); by this, we moved the U12 intron into the core-encoding part of the CBP20 gene. In the second construct, the U12 intron replaced the original U2 intron localized between exons no. 5 and no. 6 (U12 in tail); in this construct, the U12 intron was moved into the tailencoding part of the CBP20 gene. A. thaliana cbp20 mutant plants were transformed with these constructs, and three independent homozygous lines of each gene variant were further tested.

Qualitative RT-PCR-based analyses of the CBP20 mRNA levels revealed the presence of one dominant product representing fully spliced transcripts in the case of wt plants as well as transgenic plants carrying the wt CBP20 gene construct (wt transgene) and the U12→U2 construct. A small amount of partially unspliced products was also detected (**Figure 5A**, upper panel). In transgenic plants carrying mutated CBP20 genes in which the U12 intron was deleted (1U12), we also observed the presence of fully and properly spliced CBP20 mRNA. Moreover, the transgenic plants exhibited a wt phenotype, indicating that the CBP20 protein is produced in the transgenic lines analyzed (**Figure 5C**, upper panel).

Only transgenic plants carrying the CBP20 gene with swapped exons (exon swap) gave mRNAs in which the CBP20 coding sequence was disrupted. This impaired mRNA accumulated at a lower level than in wt and wt transgene plants containing a wt copy of the CBP20 gene (**Figure 5A**, upper panel). The exon swapping mRNA contains a premature stop codon (in exon no. 5) that might cause the synthesis of putative shorter proteins with a disturbed core fragment. As expected, this mutant plants' phenotype is very similar to that of the null cbp20 mutant, exhibiting serrated rosette leaves and growth retardation (**Figure 5C**, lower panel), confirming that CBP20 is indeed not produced in these plants. In the case of mutants in which the U12 intron was moved into the core- or tail-encoding parts of the gene, additional splice products were observed (**Figure 5A**, upper panel; the lower panel represents a zoomed in part of the upper one).

Quantitative RT-qPCR analyses using fluorescent-labeled primers and capillary electrophoresis were carried out to measure the levels of individual spliced products. Cloning and sequencing of these products were performed to identify particular splicing isoforms. In the case of 1U12, apart from the fully spliced product, we did not detect any additional mRNA isoforms. In the wt, wt transgene and U12→U2 plants, the fully spliced mRNA represented 96–97% of all spliced products, while U12 or U2-containing mRNA isoforms were present in 3–4% of all spliced products (**Figure 5B**). The exchange of exons no. 4 and no. 5 within the CBP20 gene resulted in the presence of properly spliced mRNA (approximately 95% of all isoforms), the isoform still containing the U12 intron (approximately 3% of all splicing isoforms) and an additional mRNA isoform in which the 3<sup>0</sup> alternative U12 splice site was selected within exon no. 4 (approximately 2.5% of all splicing isoforms).

Two A. thaliana lines in which the U12 intron was relocated into the core- or tail-encoding part of the CBP20 gene (U12

in core and U12 in tail) produced complex patterns of mRNA isoforms. In the case of the U12 in core variant, we identified and quantitatively measured the levels of five mRNA isoforms. The fully and correctly spliced CBP20 mRNA isoform represented only 26% of all splicing products (**Figure 5B**). Surprisingly, the most abundant mRNA isoform showed an improperly spliced U2 intron replacing the original U12 intron. The alternative 5<sup>0</sup> splice site was selected, leading to the inclusion of an additional 38 nt from the 5<sup>0</sup> intron end into the processed mRNA (58.1%) (**Figure 5B**). However, these transgenic plants exhibit the wt phenotype, suggesting that the lower level of properly spliced CBP20 mRNA is sufficient to fulfill plant requirements for the CBP20 protein (**Figure 5C**). Interestingly, an identical improper splicing event was observed in the mini-gene containing the same U2 intron instead of the original U12 intron (**Figure 3B**). Three additional mRNA isoforms were observed in small amounts: (1) the isoform in which the U12 intron as well as the U2 intron that replaced the original U12 intron were retained (8.16%), (2) the isoform in which the U2 intron that replaced the original U12 intron was retained (4.0%), and (3) the isoform in which the U12 intron was retained and the U2 intron located between exons no. 3 and no. 4 was incorrectly spliced (3.8%).

In the U12 in tail maxi-gene variant we identified and measured the levels of three mRNA isoforms. The fully and correctly spliced CBP20 mRNA isoform represented almost 72% of all mRNA isoforms (**Figure 5B**). A minor portion of spliced products were identified as an isoform still containing the U2 intron that replaced the original U12 intron (4.9%). We also detected an mRNA isoform identical to that in the U12 in core maxi-gene, which contained the improperly spliced U2 intron that replaced the original U12 intron (23.2%) (**Figure 5B**). As expected, all these plants exhibit a wt phenotype (**Figure 5C**). These results suggest that shifting the U12 intron into other intron positions in the Arabidopsis CBP20 gene disturbs splicing events, leading to substantial downregulation of the properly spliced mRNA level.

FIGURE 4 | Plant CBP20 U12 intron surrounded by four exons and two U2 introns is not efficiently recognized by the U12 splicing machinery. (A) A table showing structures of two midi-gene constructs containing A. thaliana or P. patens CBP20 gene fragment of the following composition: exon 3 – U2 intron 3 – exon 4 – U12 intron 4 – exon 5 – U2 intron 5 – exon 6. (B) Qualitative RT-PCR analysis of splicing of midi-gene transcripts in tobacco protoplasts. Arrows point to the improperly spliced products in which exons no. 4 and no. 5 are skipped. Green squares represent properly spliced midi-gene CBP20 transcripts. The rest of the bands above the main amplification products represent a variety of partially spliced transcripts. (C) Quantitative analysis of midi-gene transcript RT-PCR products using capillary electrophoresis. Splicing efficiency is calculated as a percentage of the sum of all detectable products. Values are shown as the mean ± SD (n = 3) from three independent experiments. Green color depicts properly spliced CBP20 mRNA isoform. (D) Structure of midi-genes and fully spliced transcript representing the most abundant splicing event.

FIGURE 5 | The proper localization of the U12 intron in the CBP20 gene secures correct pre-mRNA maturation in plant. (A) Qualitative RT-PCR analysis of splicing of CBP20 full-length maxi-gene transcripts. For each construct, two independent transgenic lines were analyzed. Electrophoretic separation of RT-PCR products was carried out in a 1.2% agarose gel. Abbreviations: K- – negative control (no template); cbp20 – mutant line; Wt – wild-type plants; wt transgene – depicts wild-type CBP20 gene; U12→U2 – the CBP20 gene in which the original U12 intron was exchanged by the U2 intron derived from the Arabidopsis CBP80 gene; 1U12 – the CBP20 gene with U12 intron deletion; exon swap – the CBP20 gene mutant in which exons no. 4 and no. 5 flanking the U12 intron have been exchanged; U12 in core – construct derived from the U12→U2 construct in which the U12 intron has been moved between exons no. 3 and no. 4 (core part of the CBP20 gene); and U12 in tail – construct derived from the U12→U2 construct in which the U12 intron has been moved between exons no. 5 and no. 6 (tail part of the CBP20 gene). All constructs are under a native promoter. Green arrow points the properly spliced CBP20 transcript (E/E/E/E). Red bars show additional detectable partially spliced or improperly spliced products. A panel below shows a zoomed-in fragment of the above gel; blue arrows depict sequenced and quantitatively analyzed products. Actin – loading control. M – molecular DNA marker. (B) A scheme presenting all detected CBP20-gene derived mutant splicing isoforms. Capital A, B, and C letters represent constitutive splicing events for introns no. 3, no. 4, and no. 5, respectively. b – depicts improper splicing events of intron no. 4. Quantitative analysis of mini-gene transcript RT-PCR products was performed using capillary electrophoresis. Splicing efficiency is calculated as a percentage of the sum of all detectable products. Values are shown as the mean ± SD (n = 3) from three independent experiments. (C) Arabidopsis vegetative rosettes of wt and CBP20 maxi-gene lines after 30 days of growth.

To quantify the total level of CBP20 transcript in the wt and maxi-gene transgenic lines analyzed, we performed real-time RT-PCR using primers designed to recognize the 3<sup>0</sup> ends of the transcripts. No significant differences in the CBP20 mRNA levels between wt and transgenic plants were observed (**Figure 6A**). However, we did detect changes between maxi-gene transgenic

lines when the CBP20 protein level was analyzed by western blotting (**Figure 6B**). A similar protein level was observed for the wt transgene and 1U12 lines. As expected, no CBP20 protein was detected in the exon swapping maxi-gene line due to the premature STOP codon existing in exon no. 5 of this construct. Interestingly, the decreased level of CBP20 was observed in the U12→U2 transgenic line as well as in the U12 in core and U12 in tail transgenic lines.

## DISCUSSION

The evolutionary conservation from liverworts to higher plants of the U12 intron localization within the plant CBP20 genes encouraged us to study the role of this intron in the CBP20 pre-mRNA maturation. It has been reported that U12 intron positions are more strongly conserved between animals and plants than are the positions of U2 introns (Basu et al., 2008). Similar to plant CBP20 genes, human, chicken, and starlet sea anemone CBP20 genes also contain U12 introns. However, the animal CBP20 gene structures and positions of U12 introns differ significantly from those of plant CBP20 genes (Supplementary Figure S6). In contrast to their animal orthologs, plant CBP20 genes contain an additional coding sequence representing the so-called tail part of CBP20. In each plant CBP20 gene, a U12 intron separates the core part of the protein from the tail fragment. The tail part of plant CBP20s usually contains two NLSs, which is characteristic of plant small CBC subunits. Our in silico analyses revealed, however, that the Chlamydomonas reinhardtii CBP20 gene does not have the tail-encoding part (GenBank: EDP01640.1). Interestingly, the Chlamydomonas CBP20 gene also does not contain a U12 intron. Moreover, we have not found U12 introns in fungal CBP20 genes that also lack the plant-specific CBP20 tail. Therefore, it can be speculated that the U12 intron in CBP20 genes appeared during land plant evolution and was introduced into the

gene coding sequence together with the tail-encoding part of CBP20.

Comparison of the CBP20 U12 introns of various plants revealed their broad spectrum of length: they can be as short as 76 bp (S. moellendorffii) and as long as 2733 bp (V. vinifera). Browsing the U12 database (U12DB) for U12 introns present in the A. thaliana genome revealed the presence of 246 U12 introns (Alioto, 2007). The length of these introns varies between 68 and 3731 bp. However, the number of U12 introns exceeding 1000 bp in length is rather low (only 6 such introns were described). The number of Arabidopsis U12 introns presented in the U12DB is underestimated, because more recent studies based on transcriptomic data revealed 2069 U12 introns in Arabidopsis (Marquez et al., 2012). Our data show that plant CBP20 U12 introns derived from different plant species, when inserted into the mini-genes containing two flanking A. thaliana exons, exhibit differential impact on their splicing efficiency. Surprisingly, the longer the U12 intron was, the higher splicing efficiency was observed. We have previously reported that the splicing efficiency of plant U12 introns depends on a combination of factors, including TA content, exon splicing enhancer sequences (ESEs), and the presence of an adenosine at the upstream purine position in the plant U12 branch point consensus TCCTTRATY (Lewandowska et al., 2004). All CBP20 U12 introns studied in this paper exhibit a similar high TA content (approximately 65%), and almost all (except P. endiviifolia) contain an adenosine at the upstream position in their branch point sequences (**Figure 1**). Since the flanking exons are identical in all constructs, one can suggest that putative ESEs should influence all studied U12 intron splicing in the same way. Thus, the only evident difference between short and long CBP20 U12 introns is that the longer U12 introns contain more TA-rich sequences, which may help in recognition of these introns by the U12 splicing machinery. Therefore, it is possible that one of the yet-unidentified hnRNP proteins recognizing TA-rich sequences within U12 introns plays an important role in plant minor spliceosome recruitment.

We noticed that the CBP20 U12 intron from P. patens was extremely inefficiently spliced in tobacco protoplasts (**Figure 2**). Previously we also observed very low splicing efficiency of mini-genes containing U12 introns originated from different plant genes (Lewandowska et al., 2004). We showed that U12 intron splicing was enhanced by increased TA-richness of the intron. In this study we found that the branch point of the P. patens CBP20 U12 intron has an additional C nucleotide in the 5<sup>0</sup> part of the consensus sequence compared to the CBP20 U12 introns from other plant species (**Figure 1**). We analyzed all U12 introns from the P. patens genome and compared their branch points to the consensus sequence. Only four P. patens U12 introns containing the additional C nucleotide in the 5<sup>0</sup> part of the branch point were found (124 P. patens U12 introns were analyzed). Therefore, the low efficiency of P. patens CBP20 U12 intron splicing could be due to the specific sequence of its branch point. Searching the U12DB for this additional C in the A. thaliana 5 <sup>0</sup> part of all U12 branch point sequences revealed the presence of only six such examples (Alioto, 2007). This shows that the presence of an additional cytidine in the branch point sequence is acceptable for both Physcomitrella and Arabidopsis U12 splicing machineries but may be a reason for the inefficient splicing of introns containing such branch point sequence. Further investigations are needed to uncover the role of this additional C in plant U12 intron splicing. On the other hand, the low splicing efficiency of the P. patens CBP20 U12 intron may also be associated with the compatibility of specific intron/exon sequences as well as ESEs that are present or absent in different exonic sequences. Comparison of P. patens and other plant species exon sequences that surround CBP20 U12 introns revealed a high similarity between exon no. 4, but in the case of exon no. 5, four insertions (3, 6, 14, and 5 bp long) and a single nucleotide deletion were found in P. patents (Supplementary Figure S7). Therefore, we cannot exclude the possibility that these exon differences between Arabidopsis and Physcomitrella may cause the incompatibility in the proper recognition of P. patens U12 intron-containing mini-gene transcripts by the tobacco U12 minor spliceosome.

To explain a role of the U12 intron in CBP20 pre-mRNA maturation, several mini-, midi- and maxi-gene constructs were prepared. This approach allowed us to dissect the effect of given CBP20 transcript fragments on U12 intron splicing. The shortest constructs (the mini-genes series) contained two CBP20 exons that originally flank the U12 intron. The U12-containing mini-transcripts were correctly but poorly spliced (4.77%). In a similar assay, the pea U2 legumin intron was spliced correctly and very efficiently (94.41%). The CBP80 U2 intron was spliced more efficiently (42.61%) than the original U12 intron but less efficiently than the legumin gene-derived U2 intron. In addition, alternatively spliced products were frequently observed (37.05%) (**Figure 3C**). These results have shown that the replacement of the U12 intron with a U2 intron increases splicing efficiency but may lead to undesired improper splicing events. Our observation agrees with the published data showing that U12 splicing slows down pre-mRNA processing (Lewandowska et al., 2004; Simpson and Brown, 2008). We demonstrated, however, that U2 introns when inserted between exons normally flanking a U12 intron are spliced improperly, which may lead to the downregulation of CBP20 protein expression (**Figure 4**). It has been shown in animal cells that U12 intron splicing is a limiting step in pre-mRNA processing (Patel et al., 2002; Niemelä and Frilander, 2014).

Since the CBP20 U12 intron splicing efficiency of the Arabidopsis mini-gene transcript (construct no. 1, **Figure 2**) was very low (0.34%), we decided to extend the construct, and we modified it by the introduction of two U2 introns and two flanking exons of the Arabidopsis CBP20 gene (the midi-gene series). Indeed, we improved the proper splicing efficiency almost three times (from 4.77 to 12.47%, **Figure 4**). Unexpectedly, this correct splicing was predominated by an extensive skipping event resulting in the production of fused exons no. 3 and no. 6 (approximately 50% of all spliced products). In addition, a plethora of minor alternatively spliced products were observed. A similar effect was observed in the case of midi-gene transcripts containing the Physcomitrella CBP20 intron, suggesting the general character of the results obtained (**Figure 4**). In our earlier studies, we used a similar approach to study analogous mini- and midi-gene constructs of the Arabidopsis GSH2 gene containing a U12 intron (Lewandowska et al., 2004). Unlike the

results of the experiment presented in this paper, in the case of GSH2, we detected only constitutive splicing events, and as a result, we obtained the fully spliced products consisting of two or four exons when the mini- and midi-genes were analyzed, respectively. In addition, the transcript derived from the midi-gene (containing additional exon and intron sequences that flanked the U12 intron) was spliced more efficiently. This suggested that the U2 introns located upstream and downstream of U12 participated in the mechanism of definition of the U12 intron (Lewandowska et al., 2004). Interestingly, overexpression of the RBP45 protein, an hnRNP-like RNA binding protein with affinity for U-rich sequences (Lorkovic et al., 2000 ´ ), resulted in a complete abolishment of the normal splicing pattern of the GSH2 midi-gene transcript. RBP45 increased the overall splicing efficiency of the transcript, but caused an extensive skipping event resulting in the accumulation of RNA molecules that were composed of fused two terminal exons (Lewandowska et al., 2004). In the majority of CBP20 midi-gene constructs analyzed in this paper, the presence of the U12 intron was ignored, and the most proximal 5<sup>0</sup> as well as distal 3<sup>0</sup> U2 splice sites were predominantly selected, without overexpression of RBP45. This can suggest that within the CBP20 midigene transcripts are sequences with high affinity to RBP45 or other yet-unidentified tobacco RBP45-like proteins that can stimulate selection of the most external 5<sup>0</sup> and 3<sup>0</sup> splice sites.

Interestingly, the skipping effects that were observed during splicing of the midi-gene constructs were not detected when full-length CBP20 gene variants (the maxi-genes series) were tested in transgenic plants. The wt transgene transcripts were correctly spliced, and removing the U12 intron (the 1U12 CBP20 gene version) did not change the correct splicing pattern. The splicing pattern was changed, however, when the position of the U12 was altered (moved to the core or tail part of the CBP20 gene), and a U2 intron replaced the original Arabidopsis U12 intron. In transgenic lines expressing these two constructs, incorrect splicing of the U2 intron, which served in the tested gene variants as intron no. 4, was detected. This is interesting since the same intron surrounded only by two U2 introns (a midi-gene) was spliced correctly, without any alternative events. Thus, in context of the whole CBP20 gene, the U12 intron present upstream or downstream of its natural location induced selection of alternative splice sites. Moreover, the exon-swapping version of the CBP20 gene (exons no. 4 and no. 5 were swapped) caused alterations in splice site selection of the U12 intron. These results show that sequences of both exons no. 4 and no. 5, as well as location of the U12 intron, are necessary for the proper

### REFERENCES


maturation of A. thaliana CBP20 pre-mRNA. However, in all maxi-gene constructs analyzed, the levels of their transcripts were comparable to those of the wt (**Figure 6A**), suggesting that the alternative splicing events detected in some CBP20 gene variants do not dramatically influence the levels of primary transcripts produced from the transgenes used in this study. Surprisingly, the protein level of CBP20 originating from the U12→U2 construct was lower than that expressed from the wt transgene. This low level of CBP20 was observed in two independent transgenic lines. Less CBP20 was also observed in the U12 in core and U12 in tail transgenes, and as expected, no CBP20 was detected in the exon swapping. These differences in the levels of CBP20, which are not supported by alterations in CBP20 gene expression at the mRNA level, suggest that mature mRNAs originating from some of our constructs are partially retained in the nucleus and are not exported to the cytoplasm to be used as templates in translation. Further studies must be carried out to fully uncover the connection between U12 intron splicing, mRNA export and translation in the cytoplasm.

### AUTHOR CONTRIBUTIONS

MP and KK designed and performed the experiments, analyzed the data, and wrote the manuscript. DB and JD performed the experiments and contributed to data analysis. MS and WK conducted the bioinformatic analyses. AJ advised on experiments and data analysis and assisted in drafting the manuscript. ZS-K conceived the study contributed to data interpretation and participated in the manuscript writing. All authors read and approved the final manuscript.

### FUNDING

This work was supported by the National Science Centre project UMO-2013/11/B/NZ1/02099 and the Faculty of Biology at Adam Mickiewicz University in Poznan, Poland (Badania Statutowe). Funding for open access charge: Polish Ministry of Science and Higher Education [01/KNOW2/2014].

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2018.00475/ full#supplementary-material

Benson, D. A., Karsch-Mizrachi, I., Clark, K., Lipman, D. J., Ostell, J., and Sayers, E. W. (2012). GenBank. Nucleic Acids Res. 40, D48–D53. doi: 10.1093/nar/ gkr1202

Basu, M. K., Makalowski, W., Rogozin, I. B., and Koonin, E. V. (2008). U12 intron positions are more strongly conserved between animals and plants than U2 intron positions. Biol. Direct. 3:19. doi: 10.1186/1745- 6150-3-19


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Pieczynski, Kruszka, Bielewicz, Dolata, Szczesniak, Karlowski, Jarmolowski and Szweykowska-Kulinska. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Ectopic Transplastomic Expression of a Synthetic MatK Gene Leads to Cotyledon-Specific Leaf Variegation

Yujiao Qu<sup>1</sup> , Julia Legen<sup>1</sup> , Jürgen Arndt<sup>1</sup> , Stephanie Henkel<sup>1</sup> , Galina Hoppe<sup>1</sup> , Christopher Thieme<sup>1</sup> , Giovanna Ranzini<sup>1</sup> , Jose M. Muino<sup>1</sup> , Andreas Weihe<sup>1</sup> , Uwe Ohler<sup>2</sup> , Gert Weber1,3, Oren Ostersetzer<sup>4</sup> and Christian Schmitz-Linneweber<sup>1</sup> \*

1 Institut für Biologie, Humboldt-Universität zu Berlin, Berlin, Germany, <sup>2</sup> Computational Regulatory Genomics, Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine, Berlin, Germany, <sup>3</sup> Helmholtz-Zentrum Berlin für Materialien und Energie, Joint Research Group Macromolecular Crystallography, Berlin, Germany, <sup>4</sup> Department of Plant and Environmental Sciences, The Alexander Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel

### Edited by:

Paula Duque, Instituto Gulbenkian de Ciência (IGC), Portugal

### Reviewed by:

Gorou Horiguchi, Rikkyo University, Japan Xiao-Ning Zhang, St. Bonaventure University, United States

### \*Correspondence:

Christian Schmitz-Linneweber smitzlic@rz.hu-berlin.de

### Specialty section:

This article was submitted to Plant Cell Biology, a section of the journal Frontiers in Plant Science

Received: 29 March 2018 Accepted: 12 September 2018 Published: 04 October 2018

### Citation:

Qu Y, Legen J, Arndt J, Henkel S, Hoppe G, Thieme C, Ranzini G, Muino JM, Weihe A, Ohler U, Weber G, Ostersetzer O and Schmitz-Linneweber C (2018) Ectopic Transplastomic Expression of a Synthetic MatK Gene Leads to Cotyledon-Specific Leaf Variegation. Front. Plant Sci. 9:1453. doi: 10.3389/fpls.2018.01453 Chloroplasts (and other plastids) harbor their own genetic material, with a bacteriallike gene-expression systems. Chloroplast RNA metabolism is complex and is predominantly mediated by nuclear-encoded RNA-binding proteins. In addition to these nuclear factors, the chloroplast-encoded intron maturase MatK has been suggested to perform as a splicing factor for a subset of chloroplast introns. MatK is essential for plant cell survival in tobacco, and thus null mutants have not yet been isolated. We therefore attempted to over-express MatK from a neutral site in the chloroplast, placing it under the control of a theophylline-inducible riboswitch. This ectopic insertion of MatK lead to a variegated cotyledons phenotype. The addition of the inducer theophylline exacerbated the phenotype in a concentration-dependent manner. The extent of variegation was further modulated by light, sucrose and spectinomycin, suggesting that the function of MatK is intertwined with photosynthesis and plastid translation. Inhibiting translation in the transplastomic lines has a profound effect on the accumulation of several chloroplast mRNAs, including the accumulation of an RNA antisense to rpl33, a gene coding for an essential chloroplast ribosomal protein. Our study further supports the idea that MatK expression needs to be tightly regulated to prevent detrimental effects and establishes another link between leaf variegation and chloroplast translation.

Keywords: chloroplast, variegation, MatK, splicing, transplastomic, Nicotiana tabacum

# INTRODUCTION

Land plant chloroplast RNAs are heavily processed post-transcriptionally. Next to trimming, and RNA editing, a prominent post-transcriptional processing step is RNA splicing of group II introns. Group II intron splicing is mediated predominantly by nuclear-encoded RNA binding proteins, of which about a dozen have been described in maize or Arabidopsis (de Longevialle et al., 2010; Stern et al., 2010) In addition to this nuclear complement of splicing factors, there is a single chloroplast-encoded splicing factor named intron maturase K, short MatK. The MatK protein is related to prokaryotic intron maturases (Neuhaus and Link, 1987). Bacterial intron maturases are

encoded within group II introns, i.e., their "home introns." The home intron of MatK is located in the trnK gene.

Group II introns are characterized by six secondary structure elements named domain (D) I – VI that fold into a globular tertiary structure (Michel and Ferat, 1995). The maturase reading frame is always found inside DIV, and the maturases are typically required for splicing of their host intron RNAs (Schmitz-Linneweber et al., 2015). Maturases are generally identified by several conserved motifs. These include a region with sequence similarity to retroviral-type reverse transcriptases (i.e., the RT domain), and a conserved sequence motif similar to the thumb domain of retroviral RTs, denoted domain X (Mohr et al., 1993). Biochemical studies have indicated that maturases make direct contacts with selected intron sequence elements and these contacts are essential for the splicing reaction. For example, the bacterial Maturase LtrA interacts with sequence stretches within DI, DII and DIV. These contacts help to attain a splicingcompetent intron conformation (Matsuura et al., 2001; Rambo and Doudna, 2004). A major advance in the understanding of the roles of maturases in splicing has been recently accomplished by structural analyses of bacterial MATs bound to their cognate group II intron RNA targets (Piccirilli and Staley, 2016; Qu et al., 2016; Zhao and Pyle, 2016). These including the crystal structures of the RT domains of MATs from Roseburia intestinalis and Eubacterium rectale (Zhao and Pyle, 2016), and a cryo-EM analysis of the ribonucleoprotein complex of the Lactobacillus lactis intron-encoded LtrA maturase bound to its host ltrB intron RNA (Qu et al., 2016). The structures of the spliced ltrB intron (at 4.5 Å resolution) and of the ltrB intron in its ribonucleoprotein complex with LtrA (at 3.8 Å resolution) are further revealing functional coordination between the intron RNA with its cognate maturase protein. Remarkably, these structures reveal close relationships between the RT catalytic domain and telomerases, whereas the 'active splicing centers' resemble that of the Prp8 protein (Dlakic and Mushegian, 2011; Galej et al., 2013; Yan et al., 2015), which also resides at the core of the spliceosome.

Aside of their essential role in splicing, intron maturases are also required for genetic mobility of the intron (Lambowitz and Zimmerly, 2004). Bacterial group II introns can spread to other genomic positions in a process called retrohoming that depends on the maturase protein. Chloroplast group II introns of embryophytes are no longer mobile and MatK has lost the protein domains required for intron mobility (Mohr et al., 1993; Barthet and Hilu, 2007). By contrast, it has retained the so-called domain X, which was shown to be required for RNA splicing activity of bacterial introns (Lambowitz and Zimmerly, 2004).

A role in splicing for MatK is strongly supported by studies on chloroplasts devoid of a translational apparatus. In these mutants, the trnK precursor-RNA is not spliced (Vogel et al., 1997). In addition, an entire subgroup of chloroplast introns termed group IIA introns fails to splice as well (Hess et al., 1994; Vogel et al., 1999). The only conceivable factor that would require functioning chloroplast translation for splicing is MatK, which led to the proposition that MatK serves splicing of all group IIA introns. Indeed, direct association of MatK with intron RNA was demonstrated in vitro (Liere and Link, 1995) and with seven group IIA introns also in vivo (Zoschke et al., 2010).

The matK reading frame is found in all known autotrophic land-plant chloroplast genomes that contain group II introns, and is also present in basal streptophyte algae (Turmel et al., 2006). In the streptophyte algae Zygnema, in the fern Adiantum capillus-veneris and also in the parasitic land plants Epifagus virginiana, Cuscuta exaltata, and Cuscuta reflexa, matK is present as a stand-alone reading frame, while the trnK gene has been lost (Wolfe et al., 1992; Turmel et al., 2005; Funk et al., 2007; McNeal et al., 2007). This suggests a function of matK "in trans," most likely for the splicing of pre-RNAs other than its cognate trnK intron. Among all embryophytes analyzed, only parasitic species have lost matK, among them species in the genera Cuscuta, Cytinus, Rafflesia, Pilostyles, as well as the orchid Rhizantella and other orchid genera (Funk et al., 2007; McNeal et al., 2009; Delannoy et al., 2011; Molina et al., 2014; Bellot and Renner, 2015; Roquet et al., 2016). These non-autotrophic plants have also lost their group IIA intron sequences, with the exception of the structurally derived and 'evolutionarily younger' clpP-2 intron (Turmel et al., 2006; Funk et al., 2007; McNeal et al., 2009). Together, these phylogenetic analyses further support the notion that MatK is required for more than just splicing of its home intron. Also, its continuous presence in chloroplast genomes starting from early streptophytes suggests that its retention inside the chloroplast is not a chance event. Unlike matK, many other chloroplast and also mitochondrial genes have been transferred multiple times in different land plant lineages to the nucleus (Kleine et al., 2009). This includes several genes coding for maturases that have been found in the Arabidopsis nuclear genome (Mohr and Lambowitz, 2003). These are all targeted to the mitochondria (one is dually targeted to the mitochondria and chloroplasts), and at least three of them serve splicing of multiple mitochondrial group II introns (Nakagawa and Sakurai, 2006; Keren et al., 2009, 2012; Cohen et al., 2014; Zmudjak et al., 2017). These data demonstrate that maturases can be transferred to the nucleus and still function in splicing in trans. An important question therefore is: 'why is matK evolutionary maintained on the chloroplast chromosome'?. To address this, functional genetic analyses are required. Unfortunately, the matK gene has been recalcitrant to reverse genetic tampering. Mutagenesis of the chloroplast matK reading frame by transplastomic mutagenesis failed so far, which has been taken as evidence that matK is an essential gene (Drescher, 2003; Zoschke et al., 2010). We therefore decided to study matK using a gain of function approach, i.e., by ectopic over-expression from the chloroplast genome in addition to the endogenous copy.

### RESULTS

### Introducing an Additional Copy of Chloroplast MatK Into a Neutral Integration Site Leads to Homoplastomic Tobacco Lines With Variegated Cotyledons

Given that matK is considered to be an essential gene, we opted for engineering an inducible over-expressor of MatK to

avoid lethal affects after constitutive over-expression. Therefore, we used a transformation vector that allows the expression of transgenes based on the theophylline-responsive riboswitch (Verhounig et al., 2010). In brief, the riboswitch utilized in this system is a translational on-switch, i.e., modulates translation initiation through the formation of an alternative structure that sequesters the Shine-Dalgarno and initiation (AUG) codon sequences in the absence of the inducer. The addition of theophylline leads to conformational changes in the 5<sup>0</sup> -UTR, which serves as an entry point for chloroplast ribosomes, allowing the translation of the transgene (Winkler and Breaker, 2005). This system was successfully used for the induced expression of GFP, using the chloroplast transformation vector pAV (Verhounig et al., 2010). The pAV vector also contains a selectable marker, i.e., a chimeric spectinomycin resistance gene, aadA, that facilitates the selection of plants with transgenic chloroplast genomes. The vector includes flanking sequences that allow to target the transgenes to a neutral intergenic spacer region in the chloroplast genome of tobacco by homologous recombination. Here, we replaced the GFP construct within the pAV vector with a matK reading frame, that has only 72% nucleotide sequence similarity with the endogenous tobacco matK sequence, while maintaining the encoded amino acid sequence (resulting vector: pRSAmatK, see also **Figure 1A** and **Supplementary Figure S1**). This prevents undesirable homologous recombination events within the endogenous matK sequence. In addition, we added a C-terminal triple HA tag to allow detection of the transplastomic MatK protein. Previously, we showed that C-terminal tagging of

the endogenous MatK does not interfere with intron splicing and does not entail any detectable macroscopic phenotype (Zoschke et al., 2010).

The pRSAmatK construct was introduced into tobacco plants by biolistic transformation of the chloroplast genome, followed by selection of spectinomycin-resistant cell lines. We isolated a total of 11 putative transplastomic lines. All lines were subjected to additional rounds of regeneration under antibiotic selection. This eliminated residual wild-type copies of the plastid genome, i.e., produced homoplastomic lines. We further tested for homoplastomy and correct integration by restriction fragment length polymorphism analyses (**Figure 1B**). We concluded that the lines were homoplastomic. We named these lines AmatK to reflect the use of a synthetic matK reading frame. When growing F1 plants from various AmatK lines, we noticed the mottled, variegated appearance of cotyledons in AmatK seedlings (**Figure 1C**). By contrast, the primary leaves were indistinguishable from those of the wild-type plants (**Figure 1C** and **Supplementary Figure S2**; in some cases, pale-green tissue was also observed in early primary leaves). The effect on the plants in the absence of the riboswitch inducer, theophylline, can be attributed to the known leakiness of the construct (Emadpour et al., 2015). We next asked, whether addition of theophylline to AmatK lines would modulate the observed variegated phenotype. Indeed, increasing concentrations of theophylline in the growth medium exacerbated cotyledon bleaching, while the control plants did not respond visibly to the addition of theophylline (**Figure 1D**). Together, these phenotypic analyses indicate that

the AmatK insertion (i) interferes with chloroplast biogenesis in a cell-autonomous fashion, (ii) is restricted to cotyledon tissue, (iii) and responds to the riboswitch activation.

# AmatK Leaf Variegation Is Modulated by Light, Sugar and the Inhibition of Chloroplast Translation

Variegation of genetically homogenous tissue has been observed before, and often was shown to be modulated by external signals. For example, variegation of the immutans mutant is strongly affected by light and temperature signals (Rodermel, 2002). We noticed that variegation was variable depending on in which growth cabinet and under which conditions we grew the plants. Both, the extent of pale areas and different degrees of paleness were observed. We therefore systematically tested the effects of different growth conditions on the extent of variegation. Specifically, we applied conditions that are known to affect photosynthesis – the core function of chloroplast genetic information. First, we tested different light conditions. At higher light fluences of 200 µE m−<sup>2</sup> ·s −1 , AmatK cotyledons are paler than at 75 µE m−<sup>2</sup> ·s −1 , while wt plants tolerate both light conditions with no visible alteration in leaf coloration (**Figure 2A**). This indicates that the ectopic insertion of AmatK leads to sensitivity to higher light intensities, i.e., potentially to a compromised photosynthetic apparatus and thus eventually to photoinhibition. Noteworthy, plants grown on soil show less severe variegation and paleness than plants grown in vitro on MS medium. This indicates that different growth conditions, like for example the provision with nutrients, gas exchange and/or light differences between the petri-dish grown plants and the soil-grown plants affect variegation as well.

Sugar is known to be a key regulator of plant metabolism, affecting the expression of many different plant genes (Pego et al., 2000). Accordingly, photosynthesis is known to be affected by the availability of exogenous sugars, which are thought to limit photosynthesis and prevent the proper development of the photosynthetic apparatus (Hdider and Desjardins, 1994; Koch, 1996; Van Huylenbroeck and Debergh, 1996; Rybczynski et al., ´ 2007). AmatK plants grown on sugar show paler cotyledons, with larger patches of reduced chlorophyll, while wt plants were unaffected, suggesting that the effects of sugar and the AmatK transgene are additive (**Figure 2B**).

MatK has been suggested to be important for splicing of RNAs with essential roles for the translational apparatus, including several tRNAs and mRNAs for ribosomal proteins. Therefore, the effect of AmatK might well run via partially compromised translational activity in plastids. This should in turn effect the expression of the selectable marker introduced into AmatK plants, the aadA cassette. aadA confers resistance against spectinomycin. We tested spectinomycin resistance by germinating AmatK seeds on spectinomycin containing medium. As expected, wt seedlings are albino, devoid of chloroplast development. AmatK lines show yellowish cotyledons with almost no green sectors remaining (**Figure 2C**). We conclude that an inhibition of translation exacerbates variegation of AmatK cotyledons.

# Transgene-Derived mRNA, but No Full-Length MatK:HA Protein Is Detected in AmatK Lines

By design, the AmatK transgene was supposed to be translationally silent in the absence of the inducer theophylline. Yet, even in the absence of theophylline, AmatK plants showed a variegated cotyledon phenotype. These effects may be due to basal expression in the absence of inducer (Emadpour et al., 2015).

Possibly, low expression of AmatK already impacts chloroplast development at least in cotyledons. We therefore tested expression of the AmatK gene on the RNA and protein level.

RNA gel blot analyses were carried out with total RNA from AmatK lines grown on MS-medium with 75 µE m−<sup>2</sup> ·s <sup>−</sup><sup>1</sup> of light at 25◦C (16 h light, 8 h darkness). Plants were harvested 12 days after imbibition. The oligonucleotide used for probe preparation is complementary to the junction region of the AmatK reading frame and the triple HA-tag, ensuring that the signal obtained from radioactive probe hybridization is specific to transcripts of the AmatK sequence. Two biological replicates were analyzed for each of two AmatK lines. In addition, we extracted RNA from two independent wt samples and two plants from the previously described pRB70 line, which carries an aadA cassette without the additional AmatK gene within the intergenic region between trnfM and trnG (Ruf et al., 2001). This controls for unwanted signals from the strongly expressed selectable marker gene. The AmatK lines display two signals at 1.8 and 2.2 kb, while no signal was observed for wt or pRB70 control samples (**Figure 3**). The expected size for a complete transcript of the synthetic matK gene spanning the region from the Prrn promoter to the rps16 terminator sequence is approximately 1.8 kb. Termination of transcription in chloroplasts is poorly understood, and transcriptional read through across so-called terminators of transcription of transplastomes has been seen before (Schmitz-Linneweber et al., 2001). Therefore, the longer, 2.2 kb transcript possibly represents read-through transcription across the rps16 terminator and ends at the downstream trnfM gene. The absence of both signals from wt RNA samples demonstrates that the signals are not due to endogenous RNA species but represent mRNAs derived specifically from the introduced transgene.

We next tested accumulation of the transgene-derived MatK protein using an antibody against the C-terminal HA epitope. We prepared total plant protein and/or stroma from AmatK and

FIGURE 3 | Detection of synthetic matK transcripts from AmatK, wt and aadA leaf tissue using RNA gel blot hybridization. 5 µg of total plant RNA was extracted and analyzed by RNA gel blot hybridization using a probe directed against the junction of the AmatK and HA sequence, which is specific to the transgene. The wt lines -1 and -2 as well as the AmatK lines -1 and -2 represent biological replicates.

wt lines. Some AmatK plants were grown in the presence of the riboswitch inducer theophylline. We used the following controls: (1). As controls for potential signals from the aadA cassette, we used the PRB70 lines mentioned above. (2). To control for effects caused by the pale cotyledon tissue, we used a phenocopy generated by treating wt plants with low doses of spectinomycin. This treatment leads to bleached cotyledons, but still allows greening of primary leafs, i.e., mimics the phenotype of AmatK seedlings (**Supplementary Figure S3**). (3). As a positive control, we added a line that carries an HA-tagged plastome-encoded subunit of the plastid RNA polymerase named RpoA:HA (Finster et al., 2013). (4). Total protein preparations from plants that carry an N- or C- terminally HA-tagged MatK, designated HA:MatK or MatK:HA, (Zoschke et al., 2010), were used as positive controls for the detection on MatK in transformed tobacco plants.

Our immunoblots show that MatK can be detected in stroma preparations from the tagged endogenously locus, confirming previous analyses (Zoschke et al., 2010) (**Figure 4**). The protein runs at approximately 55 kDa, well below the calculated molecular weight of 64 kDa, but in line with previous analyses of MatK (Zoschke et al., 2010). We could not detect MatK in any sample from total leaf protein preparations. Also, no signal corresponding to the size of the MatK:HA protein was observed in AmatK lines, including lines treated with theophylline, and independent of sample preparation (stroma or total protein).

controls. Top: Total leaf proteins were denatured and separated on a 12% polyacrylamid gel, blotted and probed with a rabbit anti-HA antibody. pRB70 = control lines containing an aadA-cassette, but no riboswitch cassette; HA:MatK = line with an N-terminal HA-tagged endogenous matK gene. MatK:HA = line with a C-terminal HA-tagged matK gene; theo = theophylline treated AmatK plant; phenocopy = wt plants treated with 17 mg/L spectinomycin; RpoA:HA = line with a C-terminally HA-tagged rpoA gene. Bottom: ponceau S stain of the blot shown above.

However, all stroma preparations from AmatK lines had a faint signal of about 47 kDa, possibly a degradation product (**Figure 4**). This signal is specific to the AmatK lines and is found consistently in different probings (**Supplementary Figure S4**). The signal must represent a C-terminal fragment of MatK, since it is detected by a C-terminal tag. In sum, our analysis indicates that protein accumulation from the synthetic transgene is aberrant and thus does not represent an over-expressor. While the transgene is expressed on the RNA level, no full-length MatK was detected on the protein level. Still, AmatK leads to the accumulation of a well-defined N-terminally truncated MatK fragment and a tissue specific chloroplast defect. This variegated tissue was used for all further experiments, without further transgene induction by theophylline.

# Chloroplast RNA Splicing Is Unaffected in AmatK Lines

There is as yet no formal proof, but strong biochemical and phylogenetic evidence for MatK being a splicing factor. Splicing defects could explain the altered chloroplast development in AmatK, in particular since MatK associates with four tRNA genes (Zoschke et al., 2010), which are essential for translation and thus chloroplast development (Alkatib et al., 2012a,b). We therefore analyzed total RNA from young seedlings at the cotyledon stage for the accumulation of spliced and unspliced RNAs that are considered MatK targets based on previous RNA-co-immunoprecipitation assays (Zoschke et al., 2010). We include phenocopies generated by mild spectinomycin treatment to control for secondary effects caused by failed chloroplast development. Both, probes complementary to intron as well as exon sequences were used to detect spliced as well as unspliced RNAs. Accumulation of unspliced versus spliced RNAs would be indicative of a splicing effect. However, no such differential accumulation was observed in AmatK lines versus controls (**Figure 5**). The stronger signals for unspliced precursors seen for trnK and trnV in AmatK samples versus wt samples are also seen for the phenocopy control and appear to be caused by differences in loading (see methylene blue stains). In sum, RNA splicing of the analyzed target RNAs is not impaired, arguing against an RNA splicing effect leading to the observed variegation.

# AmatK Plants Accumulate RNA Antisense to rpl33 When Treated With Spectinomycin

In the absence of specific splicing defects, we wanted to examine other parts of chloroplast RNA metabolism in AmatK lines. Therefore, we analyzed global chloroplast RNA accumulation by microarray analysis. To this end, we extracted RNA from AmatK seedlings as well as from spectinomycintreated phenocopy controls. The AmatK and phenocopy control RNAs were labeled with Cy3 and Cy5 fluorescent dyes, respectively, and hybridized competitively on a microarray that represents the chloroplast genome of tobacco in a tiling fashion (Finster et al., 2013). However, none of the probes showed a significant change (**Figure 6A**, **Supplementary Table S1**, and **Supplementary Figure S5**). Mild reductions were seen for several tRNAs and a mild, but non-significant increased steady-state level was observed for two mRNAs (marked in **Figure 6A**). Of these, we randomly picked the rpl33-clpP region for validation by RNA gel blot hybridization. We probed four genes located on opposite strands, using strand-specific probes (**Figures 6B,C**). The steady state levels of these RNAs were found to be similar between wild-type and AmatK plants, further supporting the microarray data.

Given that spectinomycin exacerbates the bleaching phenotype (**Figure 2C**), we tested RNA accumulation also on spectinomycin-containing medium. Indeed, transcript levels are dramatically increasing for rpl33, rps18 and clpP under these conditions (**Figure 6B**). An increase can also be observed for rps12, in particular when considering that less total RNA is loaded in this lane. Intended as a negative control, we also probed for RNA accumulating antisense to rpl33. To our surprise, we obtained strong signals for AmatK lines treated with spectinomycin, whereas for all other samples, signals were barely above background.

In sum, plastid RNA levels in AmatK plants are very similar to RNA levels in control plants, thus suggesting that the phenotype is not caused by altered RNA accumulation in the mutants. The situation changes drastically, when plants are treated with spectinomycin, which is also reflected in the more pronounced cotyledon bleaching: Spectinomycin induces over-accumulation of several mRNAs in AmatK lines and also leads to strongly increased antisense transcripts to the rpl33 gene. In conclusion, inhibition of plastid translation in AmatK insertion lines leads to reprogramming of chloroplast gene expression at least for the rpl33-clpP genomic region.

# DISCUSSION

# AmatK-Induced Plastome-Based Leaf Variegation Is Modulated by External and Internal Cues

The differentiation of chloroplasts in land plants depends foremost on light and developmental-dependent signals (Lopez-Juez and Pyke, 2005). A multitude of mutants perturbed in chloroplast development have been isolated by forward and reverse genetic approaches (Leister, 2003). Such mutants usually display uniform bleaching phenotypes, and are often embryonic lethal or seedling lethal. Some mutants have variegated leaves. Leaf variegation is based on the occurrence of tissue sectors that either contain fully developed chloroplasts or pale, aberrant plastids (Kirk and Tilney-Bassett, 1978; Sakamoto, 2003; Aluru et al., 2006). Recessive mutants that cause variegation have been isolated in monocot as well as in dicot plants (e.g., Hagemann and Scholz, 1962; Walbot and Coe, 1979; Carol et al., 1999; Wang et al., 2000, 2004). These are typically caused by nuclear-loci. By contrast, AmatK plants represent a chloroplast genetic alteration that is maternally inherited. Plastome-based variegation can be caused by segregation of two genetically distinct plastids, one of

which does not develop into fully green chloroplasts (Baur, 1909; Correns, 1909; Börner and Sears, 1986; Greiner and Bock, 2013). It has also been described for the segregation of two genetically distinct plastid chromosomes, for example segregation of chromosomes lacking a trnN-GUU gene versus wt chromosomes (Legen et al., 2007). Depletion of trnN–GUU pools during segregation of homoplastomic tissue was considered to cause leaf variegation. This cannot be the case with AmatK plants, since

these lines are homoplastomic according to Southern analyses. Thus, since AmatK plants are genetically uniform, variegation must be linked to a different, variable parameter. In AmatK cells, this parameter drops below a hypothetical threshold, which prevents chloroplast development. Three external factors, light, sugar and the antibiotic spectinomycin were shown to modulate the extent of variegation and are thus informative in terms of identifying the putative threshold. Light has been noted before as a booster of variegation (Rosso et al., 2009). Light intensity during growth correlates with pale leaf area in the variegation mutants immutans, spotty, var1, var3, and thf1 (Rosso et al., 2009). This effect has been attributed to increased excitation pressure and the resulting photo-oxidative damage under higher light fluences (Rosso et al., 2009; Huang et al., 2013). In analogy, we hypothesize that the AmatK insertion makes developing chloroplasts more vulnerable to increased excitation pressure. Given the role of MatK as a splicing factor, altered gene expression in AmatK plants might interfere with the correct expression of the photosynthetic apparatus and thus lead to detrimental changes in electron transport that are exacerbated under high light conditions. The nature of the primary defect remains to be determined.

Sugar as the second modifier of the AmatK phenotype has been shown to lead to the repression of nuclear genes for photosynthetic functions, e.g., the light harvesting proteins (Krapp et al., 1991; Price et al., 2004; Li et al., 2006). In addition, sugar represses plastid gene expression in liquid culture (Price et al., 2004; Gonzali et al., 2006; Osuna et al., 2007) and also in seedlings grown on sugar-containing MS media (Van Dingenen et al., 2016). Interestingly, the effect on chloroplast gene expression was observed as early as 3 h following sucrose treatment – roughly half of all repressed genes are from the chloroplast genome (Van Dingenen et al., 2016). Notably, matK and rps18 are the most strongly reacting non-photosynthetic chloroplast genes after sugar treatment (Table 1 in Van Dingenen et al., 2016). We hypothesize that the detrimental effect of AmatK on chloroplast gene expression adds up with the repressive effects of sugars on chloroplast gene expression (including on the endogenous matK), which ultimately results in increased variegation. This implies a threshold of plastid gene expression

during chloroplast development, below which no differentiation of chloroplasts is possible. This is further supported by our results from spectinomycin treatments, which showed an even more pronounced effect on variegation than light and sugar application. Spectinomycin blocks chloroplast translation, but should be detoxified by the expression of the aadA cassette in AmatK plants. This function is compromised given the almost complete bleaching of cotyledons on spectinomycincontaining medium. In conclusion, the expression of the aadA cassette is hampered in the AmatK background. Expression of the aadA cassette is driven by a PEP promoter and requires an active and intact plastid translational machinery. In summary, our data suggest that the AmatK insertion negatively affects plastid gene expression, which in some cotelydon cells is lowered beyond a point necessary for chloroplast development.

# AmatK Is Involved in Cotyledon-Specific Chloroplast Development

The main difference between cotyledons and primary leaves is their developmental origin. While all primary leaves are derived from the shoot apical meristem (SAM) post-germination, cotyledons emerge already at the late globular stage of embryo development. This is a time in development, when seedlings are submerged in the soil and live heterotrophically. The plastids in cotyledons develop into etioplasts and only when they emerge from the soil and are irradiated, etioplasts transform into chloroplasts (Mansfield and Briarty, 1996). This is a very different trajectory than plastids take in the SAM, where they differentiate from proplastids directly into chloroplasts during leaf primordia development. The genetic basis of chloroplast development in cotyledons is mostly unknown, but a smaller number of mutants have been described that display pale or even albino cotyledons, while the primary leaves are able to undergo normal greening. This includes white cotyledons (wco; Yamamoto et al., 2000), and snowy cotyledon 1, 2, and 3 (sco1,2,3; Albrecht et al., 2006, 2008, 2010; Ruppel and Hangarter, 2007; Shimada et al., 2007; Zagari et al., 2017). Other mutants, including sig6 and dg1 mutants, exert a dominant effect on cotyledons, but also lead to measureable alterations in chlorophyll content in primary leaves (Ishizaki et al., 2005; Chi et al., 2008). Only one mutant has been described to date that leads to cotyledon-specific variegation, caused by a disruption of the spd1 gene (Ruppel et al., 2011). All of the listed mutants are recessive and affect nuclear genes. AmatK is distinct in eliciting cotyledon variegation via a plastome modification. It is interesting that – like MatK – several of the mutated nuclear genes underlying variegation phenotypes are involved in plastid gene expression: WCO and SCO1 have been implicated in rRNA maturation and translation elongation, respectively. SIG6 and DG1 interact with each other and play a role for plastid transcription. This demonstrates that defects in chloroplast gene expression can selectively affect chloroplast development in cotyledons – a possible scenario also for the AmatK lines. SCO2 encodes a DNAJ-like protein and is required for the accumulation of photosystem II-LHCII complexes (Zagari et al., 2017). This mechanism for variegation is likely distinct from AmatK-induced variegation in that it does not involve plastid gene expression and since recent data indicate functions for SCO2 beyond cotyledons (Zagari et al., 2017). The functions of SPD1 and SCO3 are unknown. In sum, while the molecular mechanisms behind cotyledon-specific variegation remain unknown, it is intriguing that of the few factors known, most play a role for plastid gene expression. Thus, chloroplasts in cotyledons depend on a yet to be determined threshold of chloroplast gene expression, that is no longer crossed in all plastids during development in variegation mutants.

# A Gain of Function in AmatK Lines Impairs Chloroplast Development

The initial aim of our study was to generate a strong overexpressor of MatK. However, while AmatK mRNA is readily detected, we did not find any full-length MatK protein derived from the transgene, but instead only a weakly accumulating degradation product. Since the tag used for the detection of the degradation product is located at MatK's C-terminus, we conclude that translation proceeds to the stop codon, and that the protein is degraded co- or post-translationally. This is surprising, since the amino acid sequence of AmatK is identical to the stable, endogenously HA-tagged version of MatK (Zoschke et al., 2010). We speculate that translation dynamics of the AmatK mRNA are different from those of the endogenous matK mRNA since the two genes have massively different codon usage (**Supplementary Figure S1**). Changes in translation speed can lead to different folding kinetics of the nascent protein and thus to alternative degradation routes (Kimchi-Sarfaty et al., 2007; Gloge et al., 2014). We speculate that such a truncated protein could act as a dominant negative factor in the chloroplast. However, the truncated AmatK protein levels are not rising after theophylline treatment, while at the same time variegation increases. Thus, the N-terminally truncated AmatK cannot be the cause of increased defective chloroplast development after theophylline addition. We, however, cannot rule out that an AmatK protein fragment representing the N-terminal part of AmatKs exerts the effect. Such a fragment would, however, not be detectable, because it would lack the C-terminal HA tag and thus remains a speculation at present. Alternative explanations, i.e., that the AmatK gene or the AmatK mRNA cause the defect, cannot be ruled out, but are also not in line with the theophylline-dependence of the phenotype. While this needs to be tested in the future, our data suggest that the downstream effects represent a gain of function rather than a loss of function of MatK. First, splicing of MatK target introns appears unaffected in AmatK plants: If we agree to take MatK as a splicing factor (Zoschke et al., 2010), these data suggest that AmatK does not compete or interfere with the endogenous MatK. Second, we discovered a number of effects on the accumulation of RNAs that are dependent on inhibiting translation in AmatK plants. These RNAs are not associated with MatK in RIP-chip assays. For instance, the rpl33 operon accumulates more RNA in AmatK plants than in phenocopy controls, including the accumulation of antisense RNA. In the green alga Chlamydomonas reinhardtii, chloroplast

antisense RNAs are stabilized in the absence of translation by forming double-stranded RNA (Cavaiuolo et al., 2017). In general, over-accumulation of RNAs has been described for mutants with global defects in chloroplast translation and is caused at least in part by an increased activity of selected chloroplast RNA polymerases in albino tissue (Emanuel et al., 2004). We tried to control for such secondary effects by analyzing phenocopy controls, but in fact cannot be sure, whether the extent of translational inhibition in the control is truly mimicking the effect seen in AmatK plants. Therefore, whether the RNA defects seen in AmatK tissue are primary or secondary effects cannot be decided at present. Still, a defect in the expression of rpl33 and its neighboring genes, rps18 and rpl20, could very well explain the bleaching phenotype. rps18 and rpl20 encode essential ribosomal proteins (Rogalski et al., 2006; Rogalski et al., 2008). The antisense RNA detected here is long enough to also make hybrids with rps18 and rpl20 mRNAs. This could interfere with sense translation of both mRNAs and thus eventually to reduced ribosome biogenesis and translation capacity. That reduced translational capacity can lead to bleached tissue is exemplified by spectinomycin treatment and several mutants in genes for translational components (e.g., Bellaoui and Gruissem, 2004; Legen et al., 2007; Meurer et al., 2017). Future transplastomic expression of antisense RNA to selected mRNAs could test this hypothesis. Importantly, the option to modulate the level of variegation in cotelydons of AmatK lines makes these plants a valuable tool to determine the time-window and unravel the threshold at which chloroplast gene expression exerts its effect on plastid differentiation.

# MATERIALS AND METHODS

### Plant Material and Growth Conditions

Tobacco plants (N. tabacum cv. Petit Havana) were grown on agar medium containing 30 g/L sucrose for plastid transformation assays. Transplastomic AmatK or pRB70 plants were grown on the same medium or on medium supplemented with 500 mg/L spectinomycin or supplemented with various concentrations of theophylline. wt plants grown on medium with 17 mg/L spectinomycin served as phenocopy controls. For light treatments, plants were grown under standard conditions in a walk-in chamber (25◦C, humidity 55%, 16 h light/8 dark). Homoplastomy of the regenerates and seedlings of T1 generation were tested by Southern Blot Analysis and/or germination assay on a medium containing 500 mg/L spectinomycin.

## Construction of the Synthetic matK (AmatK) Overexpression Vector

The tobacco plastid rRNA operon promoter Prrn and riboswitch RS were amplified from plasmid pAV6 (Verhounig et al., 2010) with primer pair SacIf 5<sup>0</sup> -GAACAAAAGCTGGAGCTCG-3 0 (SacI site underlined) and AmatKRSr 5<sup>0</sup> - GTAACGCTGGATTTCTTCCATATGATCCTCTCCACGAGA G-3<sup>0</sup> (sequence overlap with AmatK underlined). The PCR product was ligated to the AmatK fragment via overlap PCR with primer pair SacIf and HA5AmatKr 5<sup>0</sup> - CATAATCAGGAACATCATAAGGATACTGGTAGTTTGCCA GGTC-3<sup>0</sup> (sequence overlap with HA underlined). The resulting PCR product was ligated to the 3XHA tag (Zoschke et al., 2010) via overlap PCR with primer pair SacIf and pAVHAr 5<sup>0</sup> - CCTTAATTGAATTTCTCTAGAGCCTCATTAAGCATAATCA GGTACATC-3<sup>0</sup> (XbaI site underlined). The resulting PCR product and pAV6 vector were digested with SacI and XbaI and ligated, resulting in the pRSAmatK vector.

# Plastid Transformation

Stable transformation of plastids was performed by particle bombardment followed by spectinomycin selection. The transformation protocol was carried out according to the protocols of Svab (Svab and Maliga, 1993) and Okuzaki (Okuzaki and Tabei, 2012). Two rounds of transformation were carried out; in the first round, two transplastomic lines with variegated cotyledons were retrieved, in the second round more than a dozen independent transplastomic lines with cotyledon variegation were isolated. Briefly, sterile tobacco leaves from wild type plants cultured on MS were cut into 0.5 cm × 0.5 cm squares. The leaf pieces were placed onto MS plates with the lower leaf side up and cultured overnight. The particle bombardment was carried out using the Biolistic PDS-1000/He Particle Delivery System (1,100 psi, L2 = 6 cm, 10 shots per construct, Bio-Rad) with gold particles (0.6 µM, Bio-Rad) loaded with the transformation plasmids. Spectinomycin resistant calli were regenerated on RMOP medium. Several putative transplastomic lines were obtained and further rounds of selection were carried out to generate homoplastomic lines. The correct integration of AmatK was verified by genotyping PCR and Southern blot. The PCR was performed with the primer pair AmatKseqf 5<sup>0</sup> -GTTCTCGCATTTGGATCTC-3<sup>0</sup> and trnMf 5 0 -TTCAAATCCTGTCTCCGCAA-3<sup>0</sup> .

# Isolation of DNA and Southern Blot

Genomic DNA was isolated from leaves according to standard protocols (Murray and Thompson, 1980). For Southern blot analysis, HindIII-digested total plant DNA was separated on a 0.8% agarose gel, blotted to a Nylon membrane and probed with a PCR product that was body-labeled using α-<sup>32</sup>P-CTP. The Decaprime DNA labeling Kit (ThermoFisher, Berlin, Germany) was used to label PCR-generated probe DNA fragments (see **Supplementary Table S1** for primers). Hybridization was carried out at 55◦C overnight using standard protocols (Sambrook et al., 1989).

## Isolation of RNA and RNA Gel Blot Hybridization

Total cellular RNA was extracted using TriZol (ThermoFisher, Berlin, Germany). 5 µg of total cellular RNA was separated on formaldehyde-containing 1.2% agarose gels and blotted to Hybond N membrane (GE healthcare, Munich, Germany). Strand-specific RNA probes were synthesized using T7 Polymerase (ThermoFisher, Berlin, Germany) with radioactive γ-<sup>32</sup>P-UTP (Hartmann Analytic) using PCR products as templates (see **Supplementary Table S1** for primer sequences). Strand-specific DNA probes were generated with the DecaLabel kit and PCR products according to the manufacturer's instructions (ThermoFisher, Berlin, Germany; see **Supplementary Table S1** for primer sequences). Alternatively, oligonucleotides were end-labeled using T4 polynucleotide kinase (ThermoFisher, Berlin, Germany) and γ-<sup>32</sup>P-UTP (Hartmann Analytic) and used as probes (see **Supplementary Table S1** for oligonucleotide sequences). Hybridizations were done according to standard procedures (Sambrook et al., 1989). Signals were detected using a PMI Imaging system (Bio-Rad, Munich, Germany).

### Immunoblot Analyses

fpls-09-01453 October 1, 2018 Time: 14:37 # 11

Protein extraction was performed according to standard protocols (Schagger and von Jagow, 1987). Protein fractions were separated on polyacrylamide gels and subsequently transferred in Towbin buffer (Towbin et al., 1979) to a PVDF membrane (0.2 µm pore size; GE Healthcare, Munich, Germany). Integrity and equal loading of proteins was detected by Ponceau S staining of the membrane. MatK:HA detection was performed with a rat HA-specific antibody (Roche, Mannheim, Germany). Chemoluminescent signals were detected using a ChemiDoc system (BioRad, Munich, Germany).

### Microarray Analysis

2 µg of total cellular RNA from 10 day old AmatK and phenocopy control plants with red- (Cy5) and green-fluorescing (Cy3) dyes, respectively, using the ULS RNA Fluorescent Labeling Kit according to the manufacturer's instructions (Kreatech, Amsterdam, Netherlands). The samples were hybridized to a whole-genome tobacco chloroplast tiling microarrays (Finster et al., 2013). Hybridization, array washings and signal detection were carried out as described previously (Kupsch et al., 2012).

Microarray analysis was done in R using the packages marray (version 1.52) and multtest (version 2.28). Namely, expression values were loaded in R using marray. After this, loess normalization at the probe level was applied using maNorm function from the marray package for the two biological replicates performed. The array contains 12 copies of each probe located at different spots to reduce any bias regarding its location on the array; the median log2 fold-change expression for the 12 copies was used to calculate the log2 fold-change

### REFERENCES


of the region represented by the probe for each biological replicate independently. A t-test was applied to calculate which regions show an average log2 fold-change different of zero. Multiple hypotheses correction was applied using the Benjamini & Hochberg method (BH) implemented in the multitest package.

### AUTHOR CONTRIBUTIONS

YQ designed the transformation vectors, performed chloroplast transformation and selection for homoplastomic lines, and performed genetic crosses. JA and AW analyzed transgene expression and splicing efficiency by RNA gel blot hybridization. JL performed genotyping PCRs and Southern analyses. SH and GH performed RNA gel blot analyses and microarray analyses. CT performed immunological analyses. GR participated in the screening for transplastomic plants. JM and UO evaluated the microarrays and contributed to manuscript preparation. GW supported the design of the transformation vector and immunological analyses. OO contributed to project design and manuscript preparation. CS-L conceived the study, prepared the figures, and wrote the manuscript.

### FUNDING

Generous funding by the DFG to CS-L (TR175; project A02) and by GIF to CS-L and OO (I-1213-319.13/2012) is gratefully acknowledged. YQ was supported by a fellowship from the China Scholarship Council.

### ACKNOWLEDGMENTS

Technical support by T. Börner (HU Berlin) is gratefully acknowledged. We are grateful to Ralph Bock and Stephanie Ruf for making available the pAV6 vector and pRB70 plants.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2018.01453/ full#supplementary-material


photosynthetic parasitic plant genus Cuscuta (Convolvulaceae). BMC Biol. 5:55. doi: 10.1186/1741-7007-5-55


fpls-09-01453 October 1, 2018 Time: 14:37 # 13


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Qu, Legen, Arndt, Henkel, Hoppe, Thieme, Ranzini, Muino, Weihe, Ohler, Weber, Ostersetzer and Schmitz-Linneweber. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Alternative Splicing as a Regulator of Early Plant Development

Dóra Szakonyi\* and Paula Duque\*

Instituto Gulbenkian de Ciência, Oeiras, Portugal

Most plant genes are interrupted by introns and the corresponding transcripts need to undergo pre-mRNA splicing to remove these intervening sequences. Alternative splicing (AS) is an important posttranscriptional process that creates multiple mRNA variants from a single pre-mRNA molecule, thereby enhancing the coding and regulatory potential of genomes. In plants, this mechanism has been implicated in the response to environmental cues, including abiotic and biotic stresses, in the regulation of key developmental processes such as flowering, and in circadian timekeeping. The early plant development steps – from embryo formation and seed germination to skoto- and photomorphogenesis – are critical to both execute the correct body plan and initiate a new reproductive cycle. We review here the available evidence for the involvement of AS and various splicing factors in the initial stages of plant development, while highlighting recent findings as well as potential future challenges.

### Edited by:

Simon Gilroy, University of Wisconsin–Madison, United States

### Reviewed by:

Ligeng Ma, Capital Normal University, China Lydia Gramzow, Friedrich-Schiller-Universität Jena, Germany

### \*Correspondence:

Dóra Szakonyi dszakonyi@igc.gulbenkian.pt; Dora.Szakonyi@gmail.com Paula Duque duquep@igc.gulbenkian.pt

### Specialty section:

This article was submitted to Plant Cell Biology, a section of the journal Frontiers in Plant Science

Received: 03 April 2018 Accepted: 23 July 2018 Published: 15 August 2018

### Citation:

Szakonyi D and Duque P (2018) Alternative Splicing as a Regulator of Early Plant Development. Front. Plant Sci. 9:1174. doi: 10.3389/fpls.2018.01174 Keywords: alternative splicing, early seedling development, embryogenesis, photomorphogenesis, seed dormancy, seed maturation, seed germination, splicing factors

# mRNA PROCESSING AND ALTERNATIVE SPLICING

Accurate processing of precursor mRNAs (pre-mRNAs) is a major step in gene expression crucial for performing everyday housekeeping functions, executing developmental programs, and responding to intrinsic and environmental cues. It involves modification steps to remove noncoding sequences as well as add the cap and the poly(A) tail to the 5<sup>0</sup> and 3<sup>0</sup> ends of the mRNA, respectively (reviewed in Proudfoot, 2011; Shi and Manley, 2015; Ramanathan et al., 2016). Pre-mRNA splicing, the excision of introns followed by joining of exons, is catalyzed by the spliceosome, a large ribonucleoprotein complex. The spliceosomal subunits assemble at conserved nucleotides at the exon-intron boundaries also known as the 5<sup>0</sup> (or donor) and 3<sup>0</sup> (or acceptor) splice sites (SS), the branch point and the polypyrimidine tract. In addition to the core spliceosomal components, many RNA-binding proteins play key roles in mRNA processing, SS selection and splicing (reviewed in Meyer et al., 2015). In higher eukaryotes, intron-containing genes frequently give rise to multiple mRNAs through alternative splicing (AS) (**Figure 1A**), during which differential recognition of SS can lead to intron retention, exon skipping and/or alternative 5 0 /30 SS selection. AS can significantly enhance a genome's coding capacity by producing protein variants with altered function. It also often affects mRNA stability by introducing premature stop codons in the coding sequence, thus targeting these transcripts to degradation by nonsensemediated decay (NMD). Furthermore, AS can modify gene expression by modulating transcription elongation and/or translation efficiency (reviewed in Reddy et al., 2013; Laloum et al., 2017). It is hence not surprising that AS fulfills important biological functions. In plants, it has been found to control key processes like the circadian clock or flowering time as well as the response to environmental cues, including abiotic stress or pathogen attack (reviewed in Staiger and Brown, 2013; Yang et al., 2014; Laloum et al., 2017; Shang et al., 2017).

# EARLY PLANT DEVELOPMENT

The first stages of a plant's life are essential to establish the basic body pattern, develop different tissue types and initiate a new reproductive cycle (**Figure 1B**). Sexual reproduction of land plants involves the alternation of haploid and diploid stages. Angiosperms have a dominant diploid sporophyte and a relatively short haploid phase consisting of a few microscopic cells. Seeds are produced by double fertilization. One sperm cell fuses with the egg cell to form the diploid embryo, while a second sperm cell fertilizes the diploid central cell to give rise to the endosperm (reviewed in Raghavan, 2003; Berger et al., 2008). During embryogenesis, the one-cell zygote undergoes a tightly regulated developmental program to form a mature embryo. In dicots such as Arabidopsis thaliana (arabidopsis), this process includes distinct morphological stages, called globular, heart, torpedo, and bent cotyledon, leading to the establishment of the basic body plan and main tissue/organ initials including the shoot and root apical meristems (reviewed in Palovaara et al., 2016). Embryo morphogenesis is followed by seed maturation, which involves the accumulation of reserves, acquisition of desiccation tolerance, reduction of metabolic activities and induction of dormancy to enable survival of the embryo until favorable environmental conditions allow germination (reviewed in Graeber et al., 2012). Fresh seeds usually show high dormancy that gradually decreases over time in a process called after-ripening. The release from dormancy depends on environmental factors (e.g., light quality, day length, temperature, water availability, exposure to cold) and internal regulators (e.g., hormones, regulatory proteins, chromatin status) (reviewed in Kucera et al., 2005; Nee et al., 2017). Germination starts with water uptake (imbibition) and rapid expansion of the embryo, leading to rupture of the seed coat and emergence of the radicle. Seedlings growing in the dark display skotomorphogenic development (etiolated growth), characterized by elongated hypocotyls, apical hook, pale cotyledons and short roots. When exposed to light, the seedling undergoes photomorphogenesis to activate vegetative growth, displaying shorter and thicker hypocotyls as well as green and expanded cotyledons (reviewed in Wu, 2014). Hormones are important regulators of early plant development. Embryo formation is governed by auxins and cytokinins, while abscisic acid (ABA) is important for the completion of seed maturation and building up dormancy. ABA is also the major inhibitor of seed germination, with its effect being counteracted by gibberellic acid, ethylene, and brassinosteroids (reviewed in Palovaara et al., 2016).

# GLOBAL ALTERNATIVE SPLICING CHANGES DURING EARLY PLANT DEVELOPMENT

Next-generation sequencing has revolutionized transcriptomic studies. The latest RNA-seq data gathered in higher plants showed that traditional approaches largely underestimated the proportion of genes undergoing AS. Current assessments indicate that up to 70% of plant multiexon genes generate more than one transcript via this mechanism, with intron retention representing the predominant mode of AS (Lu et al., 2010; Zhang et al., 2010; Marquez et al., 2012; Shen et al., 2014; Thatcher et al., 2014; Chamala et al., 2015; Sun and Xiao, 2015; Iniguez et al., 2017; Zhang et al., 2017). In fact, increased sequencing coverage revealed a large number of non-annotated AS events and splice variants (Marquez et al., 2012; Zhang et al., 2017). Most of the plant AS events map to coding regions, thereby altering protein sequence and potentially function or compromising mRNA stability. Indeed, a significant proportion of intron-containing genes are potentially regulated by NMD (Zhang et al., 2010; Kalyna et al., 2012; Drechsel et al., 2013). Although thousands of alternatively spliced mRNAs are detected in genome-wide analyses, detailed genetic and molecular studies will be required to identify functionally relevant AS events.

Numerous plant large-scale studies have focused on gene expression and AS patterns in different tissues and during development, identifying many novel organ- or stage-specific mRNAs with dynamic expression changes and a stage-dependent switch in isoform dominance for many genes (Zhang et al., 2010; Thatcher et al., 2014; Klepikova et al., 2016; Vaneechoutte et al., 2017). Notably, genes encoding alternatively spliced transcripts are not necessarily differentially expressed during developmental transitions, suggesting that AS shapes the transcriptome independently from transcriptional regulation (Srinivasan et al., 2016). These findings are confirmed by deepsequencing studies tracking expression and AS changes during the first stages of plant development (Aghamirzaie et al., 2013; Lu et al., 2013; Sun and Xiao, 2015; Qu et al., 2016; Thatcher et al., 2016; Narsai et al., 2017). The detection of prominent AS switches and of development-specific splice variants corroborates an important regulatory layer of early plant development at the splicing level. Interestingly, RNA-processing factors themselves undergo AS resulting in a potential autoregulatory feedback loop.

During embryogenesis in soybean (Aghamirzaie et al., 2013), AS of 47,331 genes produced 217,371 different transcripts, most of which had not been previously identified. Nearly one third of the genes showed variations in transcript levels during embryo development, including those encoding enzymes involved in carbon or nitrogen metabolism and hormonemediated signaling pathways. Most AS events were detected during the later stages of embryogenesis, i.e., embryo maturation, dehydration, establishment of dormancy, and at the quiescent state. This induction of AS may be explained by the striking clustering of both splicing-related and ABA-associated factors observed at the late phases of seed development. Seed maturation and desiccation, which involve very specific developmental, hormonal, and biochemical processes, were also examined in arabidopsis (Srinivasan et al., 2016), where RNA-seq profiling was performed on developing and mature seeds. Interestingly, transcription and AS showed opposite trends, with transcription declining during seed maturation, while AS increased. Over a quarter of the loci undergoing AS expressed stage-specific splice variants or showed a marked isoform switch, with a striking 88% of the detected AS events being absent from the TAIR10 genome annotation. Again, there were no significant changes in total transcript levels of many alternatively spliced genes,

FIGURE 1 | Alternative splicing and early plant development. (A) Constitutive and alternative splicing. Nascent multi exonic mRNAs need to undergo pre-mRNA splicing. Constitutive splicing removes the non-coding introns, producing a mature mRNA that encodes the full-length protein or transcript with biological functions. The same pre-mRNA molecule can undergo alternative splicing (AS) and produce different transcript variants. AS events occurring in non-coding sequences often impact gene expression, but will result in a protein identical to the full-length isoform. In numerous cases, coding regions are affected by AS, thus originating markedly different mRNAs and potentially distinct proteins that can vary in virtually all functional aspects. (B) Embryogenesis, seed maturation and germination, and early seedling development. After fertilization, the zygote undergoes a rapid succession of highly coordinated cell divisions to form globular stage embryos, which show establishment of the apical–basal axis and a first distinction between outer and inner cells. The embryonic cells further differentiate during the heart stage, when many of the basic cell types (provasculature, endodermis, cortex, and protoderm) and organ primordia (cotyledons, hypocotyl, and primary root) are formed and a bilateral body pattern appears. Expanding cotyledons give the embryo a torpedo shape, and the formation of the shoot and root apical meristems is completed. The next steps involve further cell growth and divisions until the embryo reaches its final shape and size. Seed maturation is completed with the accumulation of reserves and the establishment of desiccation tolerance and seed dormancy. Dry seeds are released from dormancy in response to a combination of environmental cues and internal signals. After water uptake, key biochemical and molecular processes are restored, followed by the rupture of the seed coat and emergence of the radicle, marking the completion of germination. Under darkness, the buried seedling undergoes skotomorphogenesis characterized by a short root, elongated hypocotyl, apical hook and absence of photosynthetic pigments. Upon light exposure, photomorphogenesis is activated leading to inhibition of hypocotyl elongation, opening and expansion of the cotyledons, and initiation of photosynthesis after chloroplast maturation.

pointing to AS as an important regulatory mechanism operating independently from transcription. Most of the genes exhibiting differential splicing were involved in RNA processing, potentially amplifying the AS regulatory effect in preparation for seed germination.

Two recent studies addressed the AS contribution during seed germination. In barley embryos, 14–20% of multiexon genes expressed multiple mRNA isoforms, some of which displayed clear changes during early germination (Zhang et al., 2016). Surprisingly, the most prominent AS event was alternative 3<sup>0</sup> SS

selection, and there were no substantial alterations in total transcript levels for most genes. Assessment of the biological functions of the genes undergoing AS during germination indicated involvement in protein synthesis, energy and carbon metabolism as well as RNA transport and splicing. Overall, seed germination appears to require expression of a specific set of genes, with AS playing a widespread role. The regulatory potential of AS during germination is underscored by a subsequent report in arabidopsis (Narsai et al., 2017) confirming the expression of time- and tissue-specific mRNA variants, the occurrence of dynamic changes in isoform abundance, and that splicing regulators are major AS targets during this developmental process.

AS regulation during early plant growth is also relevant in the context of environmental responses. Light, which is perceived by various photoreceptors, strongly impacts the life cycle of plants, regulating among others early developmental steps such as seed germination and the transition to autotrophic growth. Genome-wide effects of light on plant AS were recently analyzed by RNA-seq (Wu et al., 2014; Mancini et al., 2016), including in very young seedlings (Shikata et al., 2014; Hartmann et al., 2016). Shikata et al. (2014) reported that, during the initial response of etiolated seedlings to red light, the number of genes showing phytochrome-mediated differential gene expression or changed AS pattern is comparable, while later transcription becomes the dominant regulatory mechanism. In the phytochrome-dependent AS dataset, splicing-related genes were overrepresented, including SR proteins and the U1 and U2 spliceosomal subunits, while transcription factors comprised the major group of differentially expressed genes. AS seemed to play a significant role in light-induced chloroplast differentiation, as photosynthesis- and plastid-related genes were also enriched in the differential AS sets. When Hartmann et al. (2016) analyzed the response of etiolated arabidopsis seedlings exposed blue, red, or white light treatments, ∼20% of genes were found to be differentially expressed, with ∼700 AS events being detected, most of which mapped to coding sequences. Again, gene ontology analysis revealed overrepresentation of the RNAbinding category, including many splicing factors. A link between light-induced AS and mRNA stability was also uncovered, with 77.2% of the detected mRNA isoforms more abundant in the dark samples being potential NMD targets. Remarkably, in most of AS events, an isoform switch from a putative instable mRNA variant to a protein-coding alternative occurred upon light exposure. Moreover, mutants lacking the major red or blue light receptors showed impaired AS mainly when subjected to monochromatic red or blue light, indicating that additional signaling pathways influence AS under white light. The authors suggested that metabolic signals, sugars in particular, are implicated in lightmediated AS regulation.

### SPLICING FACTORS REGULATING EARLY PLANT DEVELOPMENT

Compelling evidence from large-scale analyses pointing to an important role for AS during early plant development is being substantiated by accumulating in vivo genetic studies (**Table 1**). Overexpression or complete abrogation of splicing function often causes embryo lethality, indicating that the corresponding genes are essential for viability and development of a functional plant (Kalyna et al., 2003; Schmitz-Linneweber et al., 2006; Liu et al., 2009; Kim et al., 2010; Fouquet et al., 2011; Swaraz et al., 2011; Perea-Resa et al., 2012; Shikata et al., 2012; Sasaki et al., 2015; Tsugeki et al., 2015). Some studies have established a hormonal basis for the embryo and early seedling development defects caused by altered expression of splicing factors (Kalyna et al., 2003; Casson et al., 2009; Tsugeki et al., 2015), with abnormal spatial distribution of auxin arising from erroneous splicing and expression of auxin biosynthesis, transport, and signaling genes. A link between mRNA splicing and auxin signaling was also uncovered in flowers, where subcellular compartmentation of an auxin biosynthetic gene is regulated by AS (Kriechbaumer et al., 2012).

Seed dormancy and germination are also strongly affected in mRNA processing mutants. These effects were mostly reported to relate to splicing (Dolata et al., 2015) and polyadenylation (Cyrek et al., 2016) of the DOG1 gene, a key seed dormancy regulator and known AS target, and to changes in ABA signaling (Xiong et al., 2001; Sugliani et al., 2010; Jiang et al., 2012). Early seedling development can be affected as a manifestation of wider pleiotropic defects (Liu et al., 2010; Swaraz et al., 2011; Perea-Resa et al., 2012; Shikata et al., 2012; Hsieh et al., 2015; Yap et al., 2015) or in weak alleles of embryo lethal mutants (Kalyna et al., 2003; Gutierrez-Marcos et al., 2007; Fouquet et al., 2011; Tsugeki et al., 2015). Observed phenotypes include disturbed cotyledons, hypocotyls, vasculature patterning, roots and/or seedling viability and growth. Notably, mRNA splicing in plastids and mitochondria appears to be crucial for seed development and plant growth in both arabidopsis and maize (Schmitz-Linneweber et al., 2006; Gutierrez-Marcos et al., 2007; Liu et al., 2010; Hernando et al., 2015; Hsieh et al., 2015; Yap et al., 2015; Chen et al., 2017).

Genetic and molecular analyses have confirmed a role for splicing factors in photomorphogenesis, particularly in red-light responses. Phytochrome-dependent light signaling influences AS through specific splicing components, with additional splicing factors such as SR proteins being differentially processed in loss-of-function mutants of these effectors under various light conditions (Shikata et al., 2012; Hernando et al., 2015; Xin et al., 2017). Interestingly, Xin et al. (2017) demonstrated red lightdependent direct interaction and colocalization of a splicing factor and phytochrome B.

### ALTERNATIVE SPLICING TARGETS AFFECTING EARLY PLANT DEVELOPMENT

Despite massive transcriptome changes imposed by AS during early plant development, only a handful of alternatively spliced transcripts have had their functional significance analyzed in detail (**Table 1**). While one group, including the arabidopsis

### TABLE 1 | Splicing factors and targets functioning in early plant development.


(Continued)

### TABLE 1 | Continued


DOG1 gene as well as the OsABI5 and ABI3/VP1 transcription factors, plays roles in seed maturation, dormancy, and ABA responses (McKibbin et al., 2002; Wilkinson et al., 2005; Bentsink et al., 2006; Fan et al., 2007; Zou et al., 2007; Gagete et al., 2009; Sugliani et al., 2010; Gao et al., 2013; Nakabayashi et al., 2015; Wang et al., 2018), another is important for light signaling and includes COP1, HYH, and SPA3 (Zhou et al., 1998; Sibout et al., 2006; Shikata et al., 2014; Li et al., 2017). Moreover, PIF6 regulates both seed dormancy and light responses (Penfield et al., 2010). Despite the few individual events studied, AS is known to act via diverse mechanisms, as illustrated below.

In agreement with results from large-scale studies, the expression of numerous individual mRNA variants was found to be development- or tissue-specific (Zhou et al., 1998; Fan et al., 2007; Gagete et al., 2009; Sugliani et al., 2010; Gao et al., 2013; Wang et al., 2018), with some turning out to be non-functional, either because they did not produce an active protein or no phenotypic consequence was observed as a result of ectopic expression (Wang et al., 2018). In another study, genetic complementation tests indicated that the different splice variants perform functions equivalent to the constitutive form, even when lacking crucial amino acid sequences or domains (Li et al., 2017). Similarly, AS did not fundamentally influence DNA-binding or protein-protein interaction ability of the ABI3 and ABI5 transcription factors from different plant species, though the binding strength appeared to differ among the various isoforms (Zou et al., 2007; Gagete et al., 2009; Gao et al., 2013).

Alternative splice variants can also fulfill similar or distinct functions depending on developmental stage. The constitutive and alternative PIF6 mRNA variants similarly influenced light responses in seedlings, while only the short isoform displayed evident functions during seed germination (Penfield et al., 2010). In the case of COP1 and SPA3, ectopic overexpression of alternative splice forms phenocopied knock-out mutant phenotypes, indicating that some alternative forms can interfere with the function of the full-length protein (Zhou et al., 1998; Shikata et al., 2014). Strikingly, co-expression and direct protein interactions were found to be necessary for full DOG1 function (Nakabayashi et al., 2015). In genetic complementation assays, independent expression of individual DOG1 isoforms driven by the native promoter did not restore seed dormancy, whereas transgenic lines carrying two or more DOG1 variants showed improved dormancy. Detailed analysis of these results supported the hypothesis that, although single isoforms are active, the presence of multiple isoforms is required for adequate DOG1 function. On the other hand, AS-induced changes in protein sequence may lead not only to diminished biological function but, as demonstrated for an HYH isoform lacking a protein interaction domain for proteasomal degradation, also to a more stable and hence more active protein isoform (Sibout et al., 2006).

Subcellular targeting provides specialized locations for intracellular processes and can interfere with the regulatory and biochemical potential of proteins. AS of a pumpkin hydroxypyruvate reductase (HPR) acting in photorespiration affected the C-terminal targeting sequence, with one splice form localizing in the peroxisome and another in the cytosol (Mano et al., 1999). The two mRNAs were expressed at similar levels in darkness, while light promoted the production of the shorter, cytosol-localized variant. Most recently, retention of an mRNA variant of the arabidopsis SR30 splicing regulator in the nucleus was shown to influence mRNA stability by preventing the degradation of a potential NMD target in the cytoplasm and its association to the translation machinery (Hartmann et al., 2018).

### CONCLUSION AND PERSPECTIVES

Recent transcriptome-wide, genetic and molecular studies have demonstrated that regulation of the complex developmental steps from embryogenesis to establishment of a functional plant includes posttranscriptional control via AS. Seed maturation, establishment and maintenance of seed dormancy, and young seedling responses to light stand out as significant AS-regulated processes. The detection of time- and tissue-specific mRNA variants and of notable switches in splicing patterns substantiate crucial roles for AS in other early development processes. Further large-scale analyses in different tissue types using the latest sequencing technologies and single-cell approaches will be key to understand the full extent of AS events occurring during the initial stages of plant development. Improved standardization of data processing and analysis along with more meticulous experimental set-ups should also allow for more reliable comparative studies. Comprehensive publicly available databases, providing a detailed and up-to-date view of AS in plants are still lacking. These will be pivotal in pinpointing promising novel splice forms and assist in functional studies to distinguish biologically relevant AS contributing to proteomic diversity or gene expression regulation from nonfunctional AS events and splicing noise. Importantly, state-ofthe-art methodology such as iCLIP is proving successful in

### REFERENCES


plant systems and should allow identification of the mRNAs targeted directly by splicing factors to control early plant development.

### AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

### FUNDING

DS was supported by a Postdoctoral Fellowship (SFRH/BPD/94796/2013) from Fundação para a Ciência e a Tecnologia (FCT), which also finances research in our lab through Grant PTDC/BIA-PLA/1084/2014. Funding from the GREEN-it research unit (UID/Multi/04551/2013) is also acknowledged.

### ACKNOWLEDGMENTS

Our apologies to the authors whose work was not cited due to space restrictions.



multiple isoforms generated by alternative splicing. PLoS Genet. 11:e1005737. doi: 10.1371/journal.pgen.1005737



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Szakonyi and Duque. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Functional Characterization of SMG7 Paralogs in Arabidopsis thaliana

Claudio Capitao<sup>1</sup>† , Neha Shukla<sup>2</sup>† , Aneta Wandrolova<sup>2</sup> , Ortrun Mittelsten Scheid<sup>1</sup> and Karel Riha<sup>2</sup> \*

<sup>1</sup> Gregor Mendel Institute of Molecular Plant Biology, Austrian Academy of Sciences, Vienna Biocenter, Vienna, Austria, <sup>2</sup> Central European Institute of Technology, Masaryk University, Brno, Czechia

SMG7 proteins are evolutionary conserved across eukaryotes and primarily known for their function in nonsense mediated RNA decay (NMD). In contrast to other NMD factors, SMG7 proteins underwent independent expansions during evolution indicating their propensity to adopt novel functions. Here we characterized SMG7 and SMG7 like (SMG7L) paralogs in Arabidopsis thaliana. SMG7 retained its role in NMD and additionally appears to have acquired another function in meiosis. We inactivated SMG7 by CRISPR/Cas9 mutagenesis and showed that, in contrast to our previous report, SMG7 is not an essential gene in Arabidopsis. Furthermore, our data indicate that the N-terminal phosphoserine-binding domain is required for both NMD and meiosis. Phenotypic analysis of SMG7 and SMG7L double mutants did not indicate any functional redundancy between the two genes, suggesting neofunctionalization of SMG7L. Finally, protein sequence comparison together with a phenotyping of T-DNA insertion mutants identified several conserved regions specific for SMG7 that may underlie its role in NMD and meiosis. This information provides a framework for deciphering the non-canonical functions of SMG7-family proteins.

### Germany

\*Correspondence: Karel Riha karel.riha@ceitec.muni.cz †Shared first co-authorship

Instituto Gulbenkian de Ciência (IGC),

Agricultural Biotechnology Institute,

Eberhard Karls Universität Tübingen,

### Specialty section:

Edited by: Dora Szakonyi,

Portugal Reviewed by: Daniel Silhavy,

Hungary Uener Kolukisaoglu,

This article was submitted to Plant Cell Biology, a section of the journal Frontiers in Plant Science

Received: 31 May 2018 Accepted: 17 October 2018 Published: 06 November 2018

### Citation:

Capitao C, Shukla N, Wandrolova A, Mittelsten Scheid O and Riha K (2018) Functional Characterization of SMG7 Paralogs in Arabidopsis thaliana. Front. Plant Sci. 9:1602. doi: 10.3389/fpls.2018.01602 Keywords: nonsense mediated RNA decay, meiosis, SMG7, gene duplication, Arabidopsis

### INTRODUCTION

In eukaryotic cells, gene expression is controlled by several surveillance mechanisms that assure accurate and robust production of functional proteins. One such control mechanism is nonsense mediated RNA decay (NMD), which degrades aberrant RNAs (He and Jacobson, 2015; Lykke-Andersen and Jensen, 2015; Hug et al., 2016; Raxwal and Riha, 2016). NMD typically targets transcripts containing premature termination codons (PTC) that arise as a consequence of a missense mutation or alternative splicing, leading to the translation of a truncated protein. NMD is an evolutionary highly conserved pathway that may have co-evolved with the acquisition of introns (Farlow et al., 2010). Although NMD is one of the most well-studied RNA surveillance pathways, its exact molecular mechanism is not yet fully resolved. The consensus model, based on studies across a range of organisms, suggests that NMD is induced by the premature translation termination at stop codons in the absence of a full complement of canonical translation termination signals (He and Jacobson, 2015; Lykke-Andersen and Jensen, 2015).

The most detailed insights into the NMD mechanism were obtained in mammals (He and Jacobson, 2015; Lykke-Andersen and Jensen, 2015; Hug et al., 2016). The key step

is the phosphorylation and activation of the RNA helicase UPF1 by the SMG1 kinase in response to a ribosome stalled at a PTC. Mammalian NMD primarily targets transcripts harboring introns in their 30UTR and UPF1 activation is facilitated by the presence of an exon–exon junction 50–55 nucleotides downstream of the PTC. This is sensed via interaction with the exon junction complex, and is mediated by UPF3B and UPF2. Once phosphorylated, UPF1 is recognized by the phosphoserine-binding proteins SMG5, SMG6, and SMG7 that induce degradation of the aberrant mRNA. SMG5, SMG6, and SMG7 belong to the same protein family, which is characterized by a phosphoserine-binding domain that structurally resembles 14-3-3 proteins (Fukuhara et al., 2005). Their association with UPF1 defines two major RNA degradation pathways. SMG6 itself acts as an endonuclease cleaving mRNA in the vicinity of the PTC. SMG5 and SMG7 bind phosphorylated UPF1 as a heterodimer (Jonas et al., 2013; Nicholson et al., 2018). SMG7 recruits the CCR4-NOT deadenylase complex and induces deadenylation-dependent decapping, which is followed by 5<sup>0</sup> - 3 <sup>0</sup>mRNA degradation (Unterholzner and Izaurralde, 2004; Loh et al., 2013). NMD in human cells mainly relies on SMG6 mediated degradation, while the SMG5-SMG7 dimer seems to act as a back-up pathway.

The NMD mechanism described in mammals appears to be conserved in plants, although some aspects differ (Kerenyi et al., 2008; Shaul, 2015). NMD in plants acts efficiently on transcripts containing 30UTR introns, but also on transcripts that possess long 30UTR without introns (Kertesz et al., 2006; Kurihara et al., 2009; Kalyna et al., 2012; Drechsel et al., 2013; Degtiar et al., 2015). This mechanism, known as the faux 30UTR model, was described in budding yeast and postulates that the long distance between the PTC and poly(A)-tail fails to provide the proper context for translation termination and sensitizes transcripts to NMD (He and Jacobson, 2015; Hug et al., 2016). Functional studies in tobacco showed that both NMD pathways require UPF1, UPF2, and SMG7, while components of the exon junction complex are only involved in the intron-based mechanism (Kerenyi et al., 2008). As in mammals, phosphorylation of UPF1 is critical for late steps of plant NMD (Merai et al., 2012; Kerenyi et al., 2013). The SMG1 kinase is present in the majority of plants, but has been repeatedly lost in the Brassicaceae family, suggesting that its function can be substituted by an as yet unknown kinase (Lloyd and Davies, 2013; Causier et al., 2017). Phosphorylated UPF1 recruits SMG7 through its terminal 14-3-3 domain, which in turn mediates RNA degradation and UPF1 relocalization to P-bodies (Merai et al., 2012). Plants do not possess SMG5 and SMG6 paralogs, but the genomes of dicots encode an additional SMG7-like (SMG7L) protein (Riehs et al., 2008). Nevertheless, functional studies in grapevine indicate that SMG7L has lost its NMD activity (Benkovics et al., 2011).

Downregulation of NMD in Arabidopsis leads to a strong immune response that is caused by derepression of TIR-NB-LRR immune receptors (Jeong et al., 2011; Rayson et al., 2012; Riehs-Kearnan et al., 2012; Gloggnitzer et al., 2014; Raxwal and Riha, 2016). UPF1 and SMG7 are understood to act in subsequent steps of a linear pathway; therefore, their inactivation should have similar biological effects. However, the phenotypes of UPF1 and SMG7 in Arabidopsis differ substantially. A null upf1-3 mutation causes seedling lethality due to massive activation of the immune response (Yoine et al., 2006; Riehs-Kearnan et al., 2012). This lethality can be rescued by the inactivation of PAD4, a key component of pathogen signaling, but the upf1-3 pad4 double mutants still exhibit retarded growth and pleiotropic developmental defects. Analysis of plants carrying different T-DNA insertions in the SMG7 gene showed range of different phenotypes. The most N-terminal disruption in the 14- 3-3 domain (smg7-5) was proposed to cause embryonic lethality, as no mutant plants could be recovered (Riehs et al., 2008). Smg7- 1 and smg7-3 mutants carrying insertions in the central region of SMG7 are viable, but exhibit retarded growth caused by pathogen response activation and are infertile due to abortive meiosis (Riehs et al., 2008; Bulankova et al., 2010). While the vegetative growth defects are fully rescued in smg7-1 pad4 double mutants, meiosis is still defective. Furthermore, the meiotic defects are not observed in upf1-3 pad4 mutants, suggesting that the role of SMG7 in meiosis is not linked to NMD (Riehs-Kearnan et al., 2012).

Here, we aimed to clarify whether Arabidopsis SMG7 is an essential gene by characterizing SMG7-loss of function mutants generated by CRISPR-Cas9 mutagenesis. Furthermore, we investigated the requirements for the 14-3-3 domain in meiosis and the redundancy between SMG7 and SMG7L paralogs. Finally, we used phylogenetic analysis to search for motifs that may define SMG7 function in NMD and meiosis.

### MATERIALS AND METHODS

### Plant Material

Seeds of Arabidopsis thaliana were grown in soil in phytotrons at 20–22◦C with a photoperiod of 16 h light/8 h dark. All wild type and mutant plants were derived from A. thaliana ecotype Col-0. The smg7-1 and smg7-6 mutants used in this study were characterized previously (Riehs et al., 2008; Riehs-Kearnan et al., 2012) and the smg7l-1 mutation was obtained from the Arabidopsis Stock Centre (SAIL\_634H06).

### Vectors and Transgenic Plants

We used the CRISPR/Cas9 system developed for targeted mutagenesis in plants (Fauser et al., 2014). The CRISPR sequence guiding the Cas9 nuclease was designed to target the second exon of the SMG7 gene downstream of and near to the ATG codon. Briefly, the guide RNA was created by cloning the duplex oligonucleotide 5<sup>0</sup> -CTTGGCTCGCTCCCATGAAG-3 0 into the BbsI site of pEN-Chimera (Fauser et al., 2014). The region coding for the gRNA was transferred into pDe-CAS9 using the GatewayTM cloning system, creating the binary vector pCCC40. Transgenic plants were generated by Agrobacteriummediated floral-dip transformation of A. thaliana ecotype Col-0. Transgenic plants were screened in the T2 generation by high resolution melting (HRM) point analysis for the presence of sequence polymorphisms at the target site. The CRISPR/Cas9 transgene was out segregated in the T3 generation and plants carrying the smg7-7 allele in a heterozygous

constitution were identified by HRM analysis and sequencing. Plants were characterized in the T4 and T5 generations. The complementation constructs for functional significance of the 14- 3-3 domain of SMG7 were as follows: 6.9 kb of SMG7 genomic sequence including 1.5 kb promoter and 30UTR was PCR amplified using primers SMG7::SMG7-SbfI and SMG7::SMG7- SacI (**Supplementary Table S1**) and cloned into pJET1.2/blunt Cloning Vector (Thermo Fisher Scientific). For site directed mutagenesis, a smaller fragment of SMG7 gene was amplified with the primers F-SMG7-SpeI and R-SMG7-HpaI and the PCR product was subcloned in the pJET1.2/blunt Cloning Vector. Site directed mutagenesis of K77E and R185E was done in two rounds of PCR, first with primers F-SMG7-K77E and R-SMG7-K77E, and second with primers F-SMG7-R185E and R-SMG7-R185E. The mutated fragment was subcloned back into SpeI/HpaI of the SMG7 gene construct. Wild type and mutated versions of the SMG7 gene were subcloned into the PstI/SacI sites of the binary vector pCBK06 (Riha et al., 2002) and resulting constructs were transformed into Agrobacterium tumefaciens GV3101. Complementation lines were generated by transforming Arabidopsis heterozygous for the smg7-1 allele by floral dip technique.

### Fertility Assays

Pollen viability was determined by Alexander staining (Alexander, 1969). Meiosis in pollen mother cells was analyzed by staining whole anthers with DAPI followed by confocal microscopy according to (Brownfield et al., 2015).

### Gene Expression Analysis

Total RNA was isolated from plant leaf tissue using RNA Blue (Top-Bio) following the manufacturer's protocol. Quality and quantity of isolated RNA were checked on a 1.2% denaturing agarose gel and with the NanoDrop 2000C spectrophotometer (Thermo Fisher Scientific), respectively. For each sample, 10 µg of total RNA was treated with DNase I (Roche) to remove genomic DNA and 2 µg DNA-free RNA was reverse transcribed using Superscript IV (Invitrogen) following the manufacturer's protocol. For qRT-PCR reactions, cDNA was diluted two times and all reactions were carried out using EvaGreen dye (Biotium), GoTaq polymerase (Promega, United States), and gene-specific primers on a LightCycler 96 (Roche). Expression was normalized to At2G28390 and relative fold change was calculated using the delta-delta Ct method (Livak and Schmittgen, 2001). All qRT-PCR reactions were performed on at least three biological replicates and 2–3 technical replicates. The sequence of all primers used in this study are listed in **Supplementary Table S1**.

### Protein Alignment

We downloaded the full-length protein sequences of SMG7 and SMG7L from 13 plant species (Physcomitrella patens, Selaginella moellendorffii, Amborella trichopoda, Medicago truncatula, Arabidopsis thaliana, Brassica oleracea, Cucumis sativus, Nicotiana attenuata, Helianthus annuus, Oryza sativa, Zea mays, Sorghum bicolor, Hordeum vulgare) from Ensembl Plants (release 38) and of human (Homo sapiens) from Ensembl (release 91). The following sequences were used for alignment: HsSMG7 (ENSP00000425133), PpSMG7a (PP1S80\_14V6.1), PpSMG7b (PP1S311\_73V6.1), SmSMG7a (EFJ27061), SmSMG7b (EFJ21470), AmtSMG7 (ERN18017), MtSMG7a (KEH28378), MtSMG7b (KEH16467), AtSMG7 (AT5G19400.1), AtSMG7L (AT1G28260.1), BoSMG7a (Bo9g153800.1), BoSMG7b (Bo2g018020.1), BoSMG7La (Bo5g054690.1), BoSMG7Lb (Bo3g143280.1), CsSMG7 (KGN66550), CsSMG7L (KGN64688), NaSMG7 (OIS97991), NaSMG7L (OIT28005), HaSMG7a (OTG27135), HaSMG7b (OTG30173), HaSMG7L (OTF97490), OsSMG7 (Os08t0305300-01), ZmSMG7a (Zm00001d019920\_P002), ZmSMG7b (Zm00001d005502\_P002), SbSMG7 (KXG35214), HvSMG7a (HORVU5Hr1G050800.5), HvSMG7b (HORVU0Hr1G029520.1). Protein alignment was performed with the MegAlign Pro function of DNASTAR Navigator (v12.2.0.80) using the MUSCLE (Multiple Sequence Comparison by Log-Expectation) method. The multiple sequence alignment (MSA) for selected sequence features of the SMG7 protein across the plant kingdom was used to generate a sequence logo using the online version of WebLogo3 (Crooks et al., 2004). SMG7 specific regions were identified by visually inspecting multiple sequence alignment of plant SMG7/SMG7L proteins for regions that are conserved in SMG7 paralogs of monocots and dicots, but diverged in SMG7L.

# RESULTS

# Full Inactivation of Arabidopsis SMG7 Does Not Result in Embryonic Lethality

In our attempts to genetically complement the smg7-5 mutation with a functional SMG7 gene, we failed to recover plants homozygous for the smg7-5 allele. This led us to ask whether the embryonic lethality observed in the smg7-5 line (Riehs et al., 2008) is indeed linked to the T-DNA insertion in the SMG7 gene. To clarify this issue, we used CRISPR/Cas9 targeted mutagenesis to generate another mutation that disrupts the conserved 14-3- 3 domain. We targeted the Cas9 nuclease to the second exon of the SMG7 gene and identified an allele that contains a frame shift mutation 13 amino acids downstream of the ATG codon and hence likely represents a full loss of function allele that we call smg7-7 (**Figure 1A**). We were able to recover viable plants homozygous for the smg7-7 allele that phenotypically resembled the growth-retarded smg7-1 mutants (**Figure 1B**). These plants were sterile and did not produce any pollen (**Figure 1C**). As the infertility of smg7-1 was caused by arrest of meiotic progression in anaphase of the second meiotic division (Riehs et al., 2008), we performed cytogenetic analysis of pollen mother cells in anthers of smg7-7 mutants. This revealed meiocytes with irregularly distributed chromosomes, which is typical for this anaphase II arrest (**Figure 1D**). To test the effect of smg7-7 on NMD, we used an assay that quantifies two alternatively spliced variants of the same transcript, one of which contains a PTC (Gloggnitzer et al., 2014; **Figure 1E**). We observed comparable upregulation of two different PTC containing transcripts in smg7-1 and smg7- 7, suggesting that NMD is impaired to the similar extent in both alleles. Thus, smg7-1 and smg7-7 lead to the same effect on NMD

indicated. (B) Approximately 6-week-old wild type and mutant plants. (C) Anthers with viable pollen visualized by Alexander staining. (D) Developing pollen mother cells within an anther stained by DAPI. Tetrads are apparent in wild type, while late smg7 meiocytes contain randomly distributed chromatids. (E) Effect of smg7 alleles on the relative abundance of two alternatively spliced variants of the same transcript as determined by real time RT-PCR. Error bars represent SEM of three biological replicas. Asterisks indicate statistical significance of difference from wild type (∗P < 0.5, ∗∗P < 0.01, ∗∗∗P < 0.001, two-tailed t-test).

and meiosis, demonstrating that the smg7-1 allele used in our previous studies likely represents a full loss of function mutation and that inactivation of SMG7 is not lethal, in contrast to the conclusion previously reported (Riehs et al., 2008).

# Arabidopsis SMG7L Does Not Compensate for the Loss of SMG7

Our observation that complete loss of SMG7 does not cause embryonic lethality implies that the consequence of SMG7 inactivation is milder than that of UPF1. This led us to ask whether SMG7L can partially compensate for the loss of SMG7 in NMD. To address this question, we generated plants with mutations in both genes. We combined smg7l-1, which carries a T-DNA insertion in the N-terminal 14-3-3 like domain, with either the severe smg7-1, or the hypomorphic smg7-6 (**Figure 2A**). smg7-6 contains a T-DNA insertion in the C-terminus and was reported to be proficient for NMD but partially impaired in meiosis (Riehs-Kearnan et al., 2012). Quantitative RT-PCR analysis showed that mRNA encoding the

N-terminal portion of the SMG7 protein is expressed at the same level in smg7-6 as in wild type plants, suggesting the production of a truncated, partially functional protein (**Figure 2B**). Plants deficient for SMG7L did not show any obvious phenotype and were indistinguishable from wild type plants (**Figure 2C**).

We anticipated that if SMG7L acts redundantly with SMG7, it should exacerbate the phenotypes of both smg7 alleles. However, neither smg7l-1 smg7-1 nor smg7l-1 smg7-6 double mutants exhibited any difference from the respective smg7 single mutants with regards to growth defects and fertility (**Figures 2C**, **3**). While smg7-1 is infertile, smg7-6 plants produce a reduced amount of viable pollen and are partially fertile (**Figures 3A,B**). Usually, the first 15–20 flowers on the main inflorescence bolt are infertile, while later flowers produce viable seeds (**Figure 3C**). Quantification of pollen and seed production did not reveal any difference between smg7-6 and smg7l-1 smg7-6, arguing that SMG7L does not compensate for the meiotic function of SMG7. We assessed the efficiency of NMD by quantifying three different endogenous transcripts known to be degraded by NMD (**Figure 4**; Gloggnitzer et al., 2014). The smg7-1 null mutation displayed increased amounts of all transcripts targeted by NMD (**Figure 4**). Although we have previously reported that the hypomorphic smg7-6 allele is NMD proficient (Riehs-Kearnan et al., 2012), we observed a slight but reproducible increase of the three NMD-targeted transcripts in this study (**Figure 4**) suggesting a very mild NMD defect. Importantly, the smg7l-1 smg7-1 and smg7l-1 smg7-6 double mutants did not show any additional increase in NMD-regulated transcripts. Together, these data allow the conclusion that SMG7L does not act redundantly with SMG7 either in meiosis or in NMD.

### The N-Terminal 14-3-3 Domain of SMG7 Is Important for Meiosis

The genetic separation of NMD and meiotic functions in smg7-6 mutants together with the fact that UPF1 and UPF3 deficient plants lack a meiotic phenotype (Riehs-Kearnan et al., 2012) indicate that SMG7 acts in meiosis through a different

mechanism than in NMD. Thus, we wanted to know whether the conserved 14-3-3 domain that mediates interaction with phosphorylated UPF1 in NMD is also required for meiosis. We complemented the smg7-1 with a copy of the Arabidopsis SMG7 gene carrying mutations in the conserved K77 and R185 residues that form the phosphoserine-binding pocket and are required for NMD in tobacco (**Figure 5A**; Fukuhara et al., 2005; Merai et al., 2012). We transformed plants heterozygous for smg7-1 with constructs carrying either wild type SMG7 or the mutant SMG7K77E R185E and identified transformants homozygous for smg7-1 in segregating T2 populations. Three independent lines were analyzed for each construct. While the wild type construct readily complemented the growth phenotype, fertility, and pollen production, SMG7K77E R185E plants were

indistinguishable from smg7-1 mutants despite higher level of SMG7 mRNA expression (**Figures 5B,C**). As expected, the SMG7K77E R185E plants were deficient in NMD, but they also failed to produce any pollen (**Figures 5C,D**). This indicates that the phosphoserine-binding activity mediated by the conserved N-terminal 14-3-3 motif is important not only for NMD, but also for the meiotic function of SMG7.

## The Meiotic Function of SMG7 May Depend on Conserved Motifs in Its C-Terminus

Studies in grapevine using virus induced gene silencing (Benkovics et al., 2011) together with our phenotypic characterization of Arabidopsis mutants demonstrated that SMG7 and SMG7L paralogs functionally diverged in dicots. While SMG7 retained its function in NMD, SMG7L apparently acquired a novel function(s) unrelated to NMD and meiosis. We hypothesized that this neofunctionalization would be accompanied by amino acid sequence divergence in regions exclusively important for NMD. According to this hypothesis, SMG7 should have retained motifs involved in NMD whereas some of these motifs may have been lost in SMG7L.

We performed a sequence comparison of multiple plant SMG7 and SMG7L proteins to identify regions that are conserved across the plant kingdom but which diverged during the evolution of SMG7L in dicots. Five such regions were identified in the evolutionary conserved N-terminal half of SMG7, and three additional SMG7-specific motifs were found at the C-terminus (**Figure 6**). While all 6 amino acids residues that form the phosphoserine-binding pocket of the 14-3-3 domain are retained in both SMG7 and SMG7L, two regions in the 14-3-3 domain diverged between these paralogs. When mapped over the structure of human SMG7 (Fukuhara et al., 2005), the plant SMG7-specific regions 1 and 2 correspond to alpha helices α2 and α4 with the connecting loop to α5, respectively. The α2 and α4 helices are aligned on the convex surface of the 14-3-3 domain, and α4 together with the extended loop was reported to form an interaction interface with SMG5 (Jonas et al., 2013). Thus, it is likely that these regions form a protein interaction surface and their divergence in SMG7L reflects altered protein-binding specificity. The SMG7-specific region 3 is located in the loop that connects the 14-3-3 domain with the helical domain and is larger in SMG7 than in SMG7L paralogs (**Figure 6**). The SMG7 specific regions 4 and 5 span helices α16 and α18 at the end of the helical domain. The C-terminal portion of SMG7 is generally less conserved than the N-terminus. Nevertheless, we found three motifs at the C-terminus of SMG7 proteins (motifs 6, 7, and 8) that are shared among monocots, dicots, and moss, but were lost in dicot SMG7L (**Figure 6**).

# DISCUSSION

SMG7 is an ancient phosphoserine-binding protein that is present in most eukaryotes. Its primary function is linked to

NMD as its physical and functional interactions with UPF1 are conserved from plants to animals (Unterholzner and Izaurralde, 2004; Kerenyi et al., 2013). Nevertheless, proteins of the SMG7 family have acquired additional functions during evolution. The best known example is Est1p in budding yeast that recruits telomerase to chromosome ends via binding to phosphorylated Cdc13 (Li et al., 2009; Chen et al., 2018). Hence, SMG7 acts as an adaptor protein that can recruit different molecular machines to the sites of their action in a phosphorylation dependent manner. Arabidopsis SMG7 is involved in at least two independent molecular processes, NMD and meiotic progression. Here we show that its N-terminal 14-3-3 domain is required for both NMD and meiosis. While UPF1 is the substrate that defines the role of SMG7 in NMD (Kerenyi et al., 2013), the meiotic function is likely mediated by another binding protein as Arabidopsis deficient for UPF1 does not exhibit the meiotic defects described for smg7 mutants (Riehs-Kearnan et al., 2012).

Proteins of the SMG7 gene family underwent independent multiplications during evolution that were accompanied by functional diversification. In vertebrates SMG5, SMG6, and SMG7 paralogs play distinct roles in NMD that define two separate RNA degradation pathways. Multiple copies of SMG7 have also independently arisen in plants including maize, poplar, grapevine, or the moss P. patens (Benkovics et al., 2011; Li et al., 2015). In the majority of cases, these are relatively recent species-specific duplications that only occasionally span larger phylogenetic units. SMG7L represents the most ancient plant duplication, originating at the root of dicots (Riehs et al., 2008). Here we demonstrate that SMG7L does not act redundantly with SMG7 in either NMD or meiosis implying that it evolved a novel, yet unknown function. Domain swapping experiments indicated that the N-terminal portion of SMG7L retained its capability to bind UPF1 and complement NMD, whereas the C-terminus lost its capability to trigger RNA degradation in a tethering assay (Benkovics et al., 2011). Nevertheless, it is unlikely that UPF1 is a physiological substrate of SMG7L; sequence divergence in the loops and helices that form the surface interfaces of the conserved N-terminal domains rather suggest that SMG7L binds other proteins.

Phylogenetic comparison of SMG7 and SMG7L protein sequences revealed several regions that were conserved in SMG7, but diverged in SMG7L. Positions of the SMG7-specific motifs

in respect to mutations that have been functionally characterized (**Figure 6A**) may provide clues on their function. Arabidopsis smg7-1 and smg7-3 represent T-DNA insertions that truncate SMG7 at amino acid positions 355 and 675, and both alleles are deficient in NMD and meiosis (Riehs et al., 2008; Riehs-Kearnan et al., 2012). The smg7-6 disruption is only 43 amino acids downstream from smg7-3 at position 718 yet is only very mildly impaired in NMD, implying a functional significance of this region. Nevertheless, the region between amino acids 675 and 718 is highly divergent among plant SMG7 homologs (**Supplementary Figure S1**), which speaks against a specific role for this region in NMD. Since smg7-3 is disrupted shortly after a motif conserved in both SMG7 and SMG7L paralogs, this mutation may affect the proper folding or stability of the truncated protein. Experiments using a transient expression in tobacco showed that the Arabidopsis SMG7 C-terminal domain is required for NMD and that tethering a C-terminal fragment (amino acids 517–1059) to RNA is sufficient to trigger its degradation (Benkovics et al., 2011). Thus, the region responsible for RNA degradation must be located between amino acids 517 and 718, which overlaps with the SMG7 specific region 5 (**Figure 6**). The impaired fertility of smg7-6 indicates a requirement for the SMG7 C-terminus in meiosis. It is likely that the meiotic function of SMG7 is mediated additionally through conserved regions 6, 7, and 8. In support of this notion, Arabidopsis smg7-4 which carry a T-DNA disruption at amino acid position 1013 immediately after the SMG7-specific region 8 are fully fertile (Riehs-Kearnan et al., 2012).

Whether the conserved function of these motifs is specifically linked to reproduction, or whether they underlie a general molecular function, for which deficiency in Arabidopsis is manifested as a meiotic defect, remains to be seen.

# AUTHOR CONTRIBUTIONS

CC, NS, KR, and OMS designed the experiments. CC performed the experiments shown in **Figures 1–5**. NS performed the

### REFERENCES


experiments shown in **Figures 1**, **6**. AW generated the smg7-7 mutant plants. KR and NS wrote the manuscript.

## FUNDING

The authors are grateful for financial support by the Austrian Science Fund (FWF W 1238, DK "Chromosome Dynamics" to KR and OMS), by the Ministry of Education, Youth and Sports of the Czech Republic, European Regional Development Fund-Project "REMAP" (No. CZ.02.1.01/0.0/0.0/15\_003/0000479 to KR), and by the Czech Science Foundation (16-18578S to KR).

# ACKNOWLEDGMENTS

We acknowledge the Plant Science core facility and the core facility CELLIM of CEITEC supported by the Czech-BioImaging large RI project (LM2015062 funded by MEYS CR) for their support with obtaining scientific data presented in this paper.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2018.01602/ full#supplementary-material

FIGURE S1 | Sequence alignment of the following SMG7 proteins: HsSMG7 (ENSP00000425133), PpSMG7a (PP1S80\_14V6.1), PpSMG7b (PP1S311\_73V6.1), SmSMG7a (EFJ27061), SmSMG7b (EFJ21470), AmtSMG7 (ERN18017), MtSMG7a (KEH28378), MtSMG7b (KEH16467), AtSMG7 (AT5G19400.1), AtSMG7L (AT1G28260.1), BoSMG7a (Bo9g153800.1), BoSMG7b (Bo2g018020.1), BoSMG7La (Bo5g054690.1), BoSMG7Lb (Bo3g143280.1), CsSMG7 (KGN66550), CsSMG7L (KGN64688), NaSMG7 (OIS97991), NaSMG7L (OIT28005), HaSMG7a (OTG27135), HaSMG7b (OTG30173), HaSMG7L (OTF97490), OsSMG7 (Os08t0305300-01), ZmSMG7a (Zm00001d019920\_P002), ZmSMG7b (Zm00001d005502\_P002), SbSMG7 (KXG35214), HvSMG7a (HORVU5Hr1G050800.5), HvSMG7b (HORVU0Hr1G029520.1).

TABLE S1 | Primers used in this study.

throughout eukaryotic evolution. Sci. Rep. 7:16692. doi: 10.1038/s41598-0 17-16942-w



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Capitao, Shukla, Wandrolova, Mittelsten Scheid and Riha. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Under a New Light: Regulation of Light-Dependent Pathways by Non-coding RNAs

### Camila Sánchez-Retuerta<sup>1</sup> , Paula Suaréz-López<sup>1</sup> and Rossana Henriques1,2,3 \*

<sup>1</sup> Centre for Research in Agricultural Genomics, CSIC-IRTA-UAB-UB, Barcelona, Spain, <sup>2</sup> School of Biological, Earth and Environmental Sciences, University College Cork, Cork, Ireland, <sup>3</sup> Environmental Research Institute, University College Cork, Cork, Ireland

The biological relevance of non-protein coding RNAs in the regulation of critical plant processes has been firmly established in recent years. This has been mostly achieved with the discovery and functional characterization of small non-coding RNAs, such as small interfering RNAs and microRNAs (miRNAs). However, recent next-generation sequencing techniques have widened our view of the non-coding RNA world, which now includes long non-coding RNAs (lncRNAs). Small and lncRNAs seem to diverge in their biogenesis and mode of action, but growing evidence highlights their relevance in developmental processes and in responses to particular environmental conditions. Light can affect MIRNA gene transcription, miRNA biogenesis, and RNA-induced silencing complex (RISC) activity, thus controlling not only miRNA accumulation but also their biological function. In addition, miRNAs can mediate several light-regulated processes. In the lncRNA world, few reports are available, but they already indicate a role in the regulation of photomorphogenesis, cotyledon greening, and photoperiod-regulated flowering. In this review, we will discuss how light controls MIRNA gene expression and the accumulation of their mature forms, with a particular emphasis on those miRNAs that respond to different light qualities and are conserved among species. We will also address the role of small non-coding RNAs, particularly miRNAs, and lncRNAs in the regulation of light-dependent pathways. We will mainly focus on the recent progress done in understanding the interconnection between these non-coding RNAs and photomorphogenesis, circadian clock function, and photoperiod-dependent flowering.

Keywords: microRNAs, long non-coding RNAs, light, circadian clock, photoperiod

# INTRODUCTION

Light is fundamental for plant life. It not only provides the energy necessary for photosynthesis but also functions as an indicator of time and place. Light quality and duration inform a plant about its neighbors in a process described as shade avoidance, which ultimately shapes its architecture and development. Light cues also function as indicators of time of the day and season [e.g., short days (SDs) in autumn and winter versus long days (LDs) in spring and summer]. Last but not least, light, together with temperature, is an input to the circadian clock, providing circadian entrainment and

### Edited by:

Dora Szakonyi, Instituto Gulbenkian de Ciência (IGC), Portugal

### Reviewed by:

Santiago Mora-Garcia, Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Argentina Umesh K. Reddy, West Virginia State University, United States

> \*Correspondence: Rossana Henriques rossana.henriques@ucc.ie

### Specialty section:

This article was submitted to Plant Cell Biology, a section of the journal Frontiers in Plant Science

Received: 23 March 2018 Accepted: 14 June 2018 Published: 26 July 2018

### Citation:

Sánchez-Retuerta C, Suaréz-López P and Henriques R (2018) Under a New Light: Regulation of Light-Dependent Pathways by Non-coding RNAs. Front. Plant Sci. 9:962. doi: 10.3389/fpls.2018.00962

**63**

contributing to maintain rhythmicity and robustness of the clock. Proper clock resetting is critical for the adequate timing of biological processes, which will then reflect in plant fitness.

In parallel with seasonal and positional information, light also provides important developmental cues. For instance, during photomorphogenesis, when the young seedling emerges through the soil reaching for light, hypocotyl elongation is inhibited, while roots grow, cotyledons open and expand, and the first true leaves develop (Leivar and Monte, 2014). These morphological changes are accompanied by the development of chloroplasts and chlorophyll accumulation. Later in development, light will also regulate the time of flowering. Depending on each species requirements, specific photoperiods (the relative duration of the daily light and dark periods) will provide molecular cues, such as the accumulation of specific regulators, required for the transition of shoot apical meristems into inflorescence meristems (Andrés and Coupland, 2012).

Therefore, considering all light-dependent and/or lightregulated processes, it is not surprising that light signals promote massive transcriptional changes affecting both protein coding genes as well as non-protein coding transcripts (López-Juez et al., 2008; Wang H. et al., 2014). Although light regulation of (proteincoding) gene expression has been studied in extreme detail, the role of light in the regulation of non-protein coding RNAs is less known.

Non-coding RNAs have been divided into different functional groups based on the size of their active forms. Non-coding RNAs shorter than 200 base pairs (bp) fall within the category of small non-coding RNAs, whereas longer ones constitute long non-coding RNAs (lncRNAs). Historically, small noncoding RNAs were the first identified and characterized in great detail. Within this group, microRNAs (miRNAs) are transcribed from MIRNA genes as longer precursors (primary miRNAs, pri-miRNAs) by RNA polymerase II. Plant pri-miRNAs fold into stem-loop structures and are processed by a complex containing DICER-LIKE1 (DCL1), HYPONASTIC LEAVES1 (HYL1), and SERRATE (SE), which will first release the stemloop (pre-miRNA) structure, and then cleave it giving rise to a mature miRNA:miRNA<sup>∗</sup> duplex, normally 20–22 nucleotides (nt) long, with the rarer cases of 23–25 nt, with 2-nt 3<sup>0</sup> overhangs (Voinnet, 2009; Borges and Martienssen, 2015; Yu et al., 2017). This duplex is methylated at its 3<sup>0</sup> ends by the methyltransferase HUA ENHANCER1 (HEN1) and this methylation is necessary to stabilize miRNAs (Yu et al., 2005). The mature miRNA strand is then selectively bound by ARGONAUTE1 (AGO1) and loaded into the RNA-induced silencing complex (RISC), which recognizes transcripts with full or partial complementarity to the miRNA and cleaves and/or represses translation of these miRNA target transcripts, therefore silencing them (Voinnet, 2009; Yu et al., 2017) (**Figure 1**). miRNAs act in particular developmental processes and/or in response to certain environmental cues, sometimes in specific tissues where they accumulate at lower levels than their targets, protein-coding transcripts.

lncRNAs, on the other hand, can exert their biological function without further processing and accumulate to similar levels as miRNAs (Liu et al., 2012). lncRNAs are classified according to their genomic location into intergenic (expressed in between coding regions), intronic (expressed from introns), or natural antisense transcripts (NATs, expressed from the complementary strand to a protein-coding gene or another lncRNA; Liu et al., 2012; Wang H. et al., 2014). Comprehensive screening approaches have revealed thousands of lncRNAs that display spatial-, temporal-, and developmental-specific patterns of expression. Interestingly, among the few lncRNAs whose biological function is known, two are involved in lightregulated processes. HIDDEN TREASURE 1 (HID1) is involved in photomorphogenesis and seedling greening (Wang Y. et al., 2014; Wang et al., 2018), whereas CDF5 LONG NON-CODING RNA (FLORE) mediates the circadian regulation of flowering time (Henriques et al., 2017).

Here, we review the role of miRNAs and lncRNAs associated with, or regulated by light-dependent pathways. We will discuss light regulation of miRNA expression, biogenesis, processing, and function, as well as their role in light-related processes, such as photomorphogenesis and photoperiod-dependent flowering. We will also discuss the available reports characterizing lncRNA function in light-dependent pathways.

# MicroRNAs

Light regulation of plant miRNAs can occur through at least three different mechanisms: regulation of MIRNA gene transcription and its levels, regulation of miRNA processing via effects on components of the miRNA biogenesis machinery, and regulation of miRNA function through control of RISC factors. In turn, miRNAs can target genes encoding components of the light signaling pathways, thereby affecting light responses such as photomorphogenesis and photoperiod-dependent flowering. Below we discuss the recent progresses made in these fields of research. Due to the interconnection between light and the circadian clock, we will also discuss the available reports on circadian-regulated ncRNAs.

### Light Regulation of miRNA Levels

Light, photoperiod, and the circadian clock modulate the expression levels of MIRNA genes. This regulation is achieved by the presence of light-responsive elements in their promoters, and can be triggered by specific light qualities (red, blue, far-red, UV-A, and UV-B). To identify the miRNA families under this control, several screening approaches have been developed, resulting in extensive listings of light-responsive miRNAs.

For instance, the comparison of miRNAs between wild-type rice (Oryza sativa) plants and a phytochrome B (phyB) mutant identified 135 differentially expressed miRNAs, of which 97 were upregulated and 38 were downregulated. These miRNAs include conserved and novel miRNAs, with sizes varying between 21-nt and 24-nt, such as miR156, miR166, miR171, and miR408 (Sun et al., 2015). Differently from a previous report showing that miR172 is regulated by PHYB in potato (Martin et al., 2009), Sun et al. (2015) did not find any miR172 family member that responded to PHYB in rice, suggesting that PHYB-mediated regulation of miRNAs may differ between plant species. Using degradome sequencing, it was found that 70 rice genes were

targeted by 32 differentially expressed miRNAs between the wildtype and the phyB mutant (Sun et al., 2015). A large proportion of them (42%) encode transcription factors, suggesting that regulation of gene expression by miRNA target genes may play an important role in PHYB-mediated light signaling. Members of the transcription factor families identified by Sun et al. (2015) are involved in light signaling in Arabidopsis. It is worth noting that one of the identified miRNA target genes is a homolog of the Arabidopsis circadian-regulated TANDEM ZINC KNUCKLE PROTEIN (TZP) gene, which is involved in blue lightassociated growth (Loudet et al., 2008). However, further research is required to fully understand the role of these rice miRNAs and target genes in light-regulated processes (Sun et al., 2015).

complementarity to the miRNA and represses the translation (8) or cleaves (9) these miRNA targets.

Moreover, ELONGATED HYPOCOTYL 5 (HY5), a major regulator of photomorphogenesis that acts downstream of several photoreceptors, was shown to bind to the promoters and regulate the expression of several MIRNA genes, such as MIR156D, MIR402, MIR408, and MIR858A in Arabidopsis (Zhang et al., 2011). This, together with the effect of PHYB on miRNA levels (Sun et al., 2015), strongly suggests that light affects miRNA abundance. In support of this, small RNA sequencing indicates that the miRNA population of the seedling apical hook changes after treating dark-grown soybean (Glycine max) seedlings with far-red light. The levels of several miRNAs (miR166, miR167, miR168, miR394, miR396, miR530, miR1507, miR1508, miR1509, and miR2218) respond to far-red light in different parts of the soybean seedling, in most cases showing upregulation by far-red light (Li et al., 2014). Therefore, light seems to upregulate miRNAs in the apical hook, leading to downregulation of target genes that presumably repress apical hook and cotyledon opening.

Light has strong effects on plant metabolism, especially in the regulation of fundamental plant processes such as photosynthesis (Kooke and Keurentjes, 2012). To understand the involvement of miRNAs in light regulation of potato metabolism, Qiao et al. (2017) used high-throughput sequencing to study the expression levels of miRNAs and mRNAs in dark-treated and red-lighttreated leaves and detached tuber skin in potato. Among the 69 known and novel miRNAs that were differentially regulated, most were upregulated in leaves, whereas in tuber skin, about half of them were upregulated and half downregulated (Qiao et al., 2017). Notably, the miR399 family and the novel miRNA miRn55 showed opposite light responses in leaves (upregulation) and tuber skin (downregulation). General gene expression analyses revealed that most differentially expressed mRNAs (67%) were upregulated in tuber skin, whereas in leaves, about half were upregulated and half downregulated. Only 14% of differentially expressed miRNAs and 12% of differentially expressed genes were common to leaves and tuber skin, indicating that the molecular response to red light differs between these tissues. Nevertheless, there was an effect of light on genes involved in primary and secondary metabolism in both tissues, especially in carbohydrate metabolism, which is downregulated, and alkaloid metabolism, which is upregulated. Repression of carbohydrate

metabolism was supported by anti-correlation between miRNAs and mRNAs, which suggests downregulation of putative miRNA targets involved in this process (Qiao et al., 2017). Although confirmation of these findings using additional methods is required, this work points to a role of miRNAs in the regulation of metabolic pathways affected by red light in potato.

Additional support to the effect of red light on miRNA levels has been obtained in Arabidopsis. In this species, the most differentially expressed miRNAs in red light versus darkness are miR160, miR163, miR319, miR394, miR779, miR851, miR854, and miR2111 (Sun et al., 2018). At least part of this effect is mediated by PHYTOCHROME-INTERACTING FACTOR4 (PIF4), a transcription factor that is a negative regulator of PHYB-mediated red light signaling. In a pif4 mutant, 22 mature miRNAs, representing 16 miRNA families, show altered levels relative to wild-type plants under red light. PIF4 promotes the expression of genes encoding miR156/157, miR160, miR165/166, miR167, miR170/171, and miR394 and represses the expression of genes encoding miR172 and miR319 by binding to the promoters of several of these genes. In addition to acting as a transcription factor for MIRNA genes, PIF4 regulates miRNA processing, as discussed below (Sun et al., 2018).

Besides light exposure, photoperiod can also control miRNA accumulation. Li et al. (2015) used high-throughput sequencing to identify miRNAs regulated by photoperiod in soybean. Comparing miRNA levels in seedlings grown under LDs and SDs revealed 37 miRNA families to be day-length-responsive. Five members of the miR156 family were induced under LD conditions and this correlated with downregulation of four miR156-target SQUAMOSA PROMOTER-BINDING PROTEIN-LIKE (SPL) genes. Given that miR156 overexpression delays flowering in soybean (Cao et al., 2015), lower miR156 levels under SD than LD may explain the fact that soybean flowering is induced by SD. Conversely, the expression of six members of the miR172 family was induced under SD conditions and this was coordinated with the repression of ten predicted targets belonging to the APETALA2 (AP2)-like family (Li et al., 2015). In addition to miR156 and miR172, other conserved miRNAs, such as miR159/319, miR166, miR167, miR395, and miR408 were also shown to be photoperiod regulated in soybean.

The simplified model of the circadian clock places light as an input that confers clock entrainment. Similarly to light, the clock also controls the expression of a vast set of genes, especially at the dark/light transition (Michael et al., 2008), thus allowing plants to anticipate certain daily processes. Interestingly, the circadian clock can also regulate the expression of proteincoding as well as non-coding transcripts (Hazen et al., 2009). To uncover circadian-regulated miRNAs, Siré et al. (2009) compared mature and pri-miRNA transcript levels under light/dark cycles or under free-running (continuous light) conditions for two consecutive days. It was found that miR167, miR168, miR171, and miR398 levels oscillated under light/dark cycles, peaking around the light/dark transition. However, for certain miRNAs, the waving pattern was not accompanied by a clear antiphasic expression of their targets but rather by a phase delay. This was the case of miR171/SCARECROW-LIKE 6 (SCL6, also named SCL6-IV, HAM3, and LOM3) and miR398/Cu/Zn SUPEROXIDE DISMUTASE 2 (CSD2) pairs, where target gene expression is reduced as the miRNA levels increase. However, in the case of the miR168/AGO1 and miR167/AUXIN RESPONSE FACTOR 6 (ARF6) pairs, an in-phase pattern of accumulation was found, suggesting that miRNAs could regulate their targets by incorporating feedback loops. However, when these modules were tested under free-running conditions, a clear oscillation was not found, an indication that these miRNAs are most likely light regulated.

In another report, Li et al. (2016) addressed the mechanism by which the circadian clock regulates carbon and nitrogen metabolism in rice plants grown in field conditions. Using a comprehensive approach, the authors investigated how light/dark cycles control certain biological processes. They investigated metabolic and enzymatic activities, as well as gene expression and alternative splicing events that could show rhythmic behavior. In their search for oscillatory miRNAs associated with carbon and/or nitrogen metabolism, Li et al. (2016) uncovered miR1440b, miR1877, miR2876-5p, and miR5799 as possible rice miRNA candidates. These miRNAs oscillate under light/dark cycles and negatively correlate with their targets at the majority of time points analyzed. However, it still remains to be seen whether these are bona fide circadian miRNAs oscillating under free-running conditions.

### UV-Responsive miRNAs

Since different light qualities can trigger diverse developmental responses, some of which can be detrimental, several screening approaches have been designed to understand how light, especially high light, UV-A, and UV-B radiation can modulate accumulation of miRNAs and their targets. Zhou et al. (2016), using seedlings from Brassica rapa exposed to blue light and UV-A, identified 15 conserved and 226 novel differentially expressed miRNAs that could target genes encoding regulators of plant growth, development, and photomorphogenesis (Zhou et al., 2016). Blue light and UV-A exposure slightly downregulated miR156/157 expression, which correlated with upregulation of their targets, SPL9 (Bra004674) and SPL15 (Bra003305), which encode transcription factors involved in the regulation of many processes, including the juvenile-to-adult transition, flowering, and secondary metabolism, especially anthocyanin biosynthesis (Wu et al., 2009; Gou et al., 2011; Wang and Wang, 2015). In Arabidopsis, miR156 and SPLs are involved in complex feedback loops (Wu et al., 2009; Yant et al., 2010). As plants age, miR156 levels decrease and SPLs (e.g., SPL9) accumulate, inhibiting anthocyanin biosynthesis (Gou et al., 2011). Zhou et al. (2016) hypothesize that blue and UV-A light activate genes involved in light signaling and some of them promote anthocyanin biosynthesis. When anthocyanin levels increase, then the regulatory feedback loops involving miR156/157 would act to balance its metabolism.

Similar to UV-A, UV-B radiation affects plant growth, physiology, and metabolism (Dotto and Casati, 2017). In one of the first reports on the effect of light on miRNAs, Zhou et al. (2007) used a computational approach to predict MIRNA genes induced by UV-B light in Arabidopsis. miRNAs belonging to 11

miRNA families were identified as putatively upregulated by UV-B light (Zhou et al., 2007). Although these results have not been experimentally confirmed in Arabidopsis, several reports have shown the effect of UV-B on miRNA levels in aspen (Populus tremula), maize (Zea mays), and wheat (Triticum aestivum). Jia et al. (2009) found that the level of eight miRNA families increased and that of seven families decreased in response to UV-B light stress in aspen. In maize, from the 16 UV-Bregulated miRNA families detected, 6 were upregulated and 10 downregulated (Casati, 2013). UV-B-treated wheat plants showed upregulation of three and downregulation of another three known miRNAs, in addition to a novel wheat miRNA, which was slightly upregulated shortly after UV-B exposure and then was downregulated (Wang et al., 2013). Thirteen UV-B-responsive miRNA families, i.e., miR156/157, miR159, miR160, miR164, miR165/166, miR167, miR169, miR170/171, miR172, miR393, miR395 miR398, and miR399, are common to at least two plant species, indicating substantial conservation in the miRNAs that respond to UV-B light. Although the direction of the effect (upregulation or downregulation) on several miRNAs differs between species, others are consistently promoted (miR165/166, miR167, and miR398) or repressed (miR395) by UV-B light in three species. This conservation probably reflects important roles of these miRNAs in plant responses to UV-B radiation.

The upstream regulatory regions of the UV-B-regulated MIRNA genes contain numerous light-related motifs, such as GT-1 sites, G boxes, I-boxes, CCAAT-boxes, and GATA-boxes (Zhou et al., 2007; Jia et al., 2009; Wang et al., 2013), consistent with the effect of light on these genes. Stress-responsive ciselements were also found in several of those promoters (Jia et al., 2009; Wang et al., 2013). Given that some of these miRNAs are involved in responses to abiotic and biotic stresses (Casati, 2013; Li et al., 2017), we hypothesize that these miRNAs might coordinate adaptive responses to diverse threats.

Among the targets of the UV-B-regulated miRNAs are transcription factors, several factors involved in auxin signaling and factors involved in responses to other stresses (Zhou et al., 2007; Jia et al., 2009; Casati, 2013). In most cases, as would be expected, there is an inverse correlation between the effect of UV-B on miRNAs and on their corresponding target transcripts (Zhou et al., 2007; Jia et al., 2009; Casati, 2013; Wang et al., 2013). For instance, in maize, an increase in miR165/166 correlated with inhibition of their target gene, encoding a HD-ZIP III transcription factor, ROLLED LEAF1 (RLD1; Casati, 2013). In agreement with the role of these miRNAs in delimitating HD-ZIP III transcripts to the adaxial side of leaf primordia (Juarez et al., 2004), UV-B-dependent repression of RLD1 was associated with leaf arching, suggesting a role for miR165/166 in this UV-B response (Casati et al., 2006). In aspen and wheat, miR395 downregulation is associated with upregulation of its targets encoding ATP sulfurylases (APS), enzymes involved in inorganic sulfate assimilation (Jia et al., 2009; Wang et al., 2013). miR395 and APS are inversely regulated by sulfate starvation, and miR395 overexpression reduces APS transcript levels and affects the response of Arabidopsis to abiotic stress conditions (Jones-Rhoades and Bartel, 2004; Kim et al., 2010), linking again a UV-B-regulated miRNA with other stress responses.

This is also the case of miR164, whose downregulation in UV-B-treated maize leaves is associated with increased levels of two stress-responsive NAC-DOMAIN PROTEIN target transcripts, as well as EXOSTOSIN PROTEIN-LIKE and an ASPARTYL PROTEASE (Hegedus et al., 2003; Casati, 2013). Interestingly, other miRNAs (miR171, miR172, miR156, and miR529) that target genes involved in critical developmental transitions (juvenile-to-adult transition and/or flowering) were also downregulated by UV-B light (Lauter et al., 2005; Xie et al., 2006; Chuck et al., 2008; Cuperus et al., 2011). This suggests that UV-B exposure promotes miRNA-mediated responses that delay certain developmental transitions, allowing plants either to repair UV-B-induced damage or to adapt to these stressful conditions.

Overall, the analysis of UV-B-regulated miRNAs in several plant species indicates that there is a core set of conserved UV-Bresponsive miRNAs. Nevertheless, the findings also suggest that different species may recruit distinct miRNAs in order to cope with UV-B stress. Understanding the role of these miRNAs in UV-B-regulated processes awaits further investigation.

### miRNAs Accumulated in High Light Conditions

Besides UV light, high light can also be detrimental for the development of different plant species. To address how light, especially high light, controls miRNA expression in ma bamboo (Dendrocalamus latiflorus) grown under LD conditions, Zhao et al. (2013) generated small RNA libraries and determined how miRNA families were regulated. Several conserved miRNAs were identified, miR168 being the most abundant family, followed by miR156/157, miR535, miR165/166, and miR167. Interestingly, miR172 was not found, probably due to the unique flowering cycle of ma bamboo. Using stem-loop RT-qPCR, Zhao et al. (2013) determined the level of novel miRNAs in plants grown under regular white light, high light, or dark. However, the role of these miRNAs in high-light-mediated processes still remains to be determined.

A global analysis of the light-regulated miRNAs reported so far reveals that numerous miRNA families are affected by diverse light conditions (**Table 1**). Some of them, such as miR156/157, miR159/319, miR164, miR165/miR166, miR167, miR170/171, miR172, miR395, miR398, miR399, and miR408, have been identified as responsive to several light conditions in at least five plant species, strongly suggesting that they may play roles in light responses (**Table 1**). However, light could also modulate miRNA processing and/or activity toward its targets. Below we present the most recent reports addressing this type of regulation.

## Regulation of miRNA Biogenesis, Processing, and Function by Light

Besides regulating MIRNA gene expression, light can also modulate the levels and the activity of mature miRNAs. This can be achieved by a direct connection between light and regulators of the miRNA biogenesis pathway. HYL1 is a RNA-binding protein involved in miRNA processing (Yu et al., 2017). In Arabidopsis, HYL1 protein levels are regulated by CONSTITUTIVE PHOTOMORPHOGENIC 1 (COP1), an E3 ubiquitin ligase that mediates the proteasomal degradation of light signaling factors, such as photoreceptors

### TABLE 1 | miRNA families differentially regulated by light.


(Continued)

### TABLE 1 | Continued


In this table, we present a summary of the different miRNAs that respond to two or more light treatments. <sup>1</sup>miRNA families reported to be regulated by light in at least two papers have been included. miRNA families are ordered according to the number of papers reporting their light regulation. <sup>2</sup>W, white light; FR, far-red light; R, red light; B, blue light; L/D, light/dark; LD/SD, long days/short days. <sup>3</sup>Ath, Arabidopsis thaliana; Gma, Glycine max (soybean); Mdo, Malus x domestica (apple); Osa, Oryza sativa (rice); Ppy, Pyrus pyrifolia (Chinese sand pear); Ptr, Populus tremula (aspen); Stu, Solanum tuberosum (potato); Tae, Triticum aestivum (wheat); Zma, Zea mays (maize).

(Cho et al., 2014). It was found that the reduction of miRNA levels in cop1 mutants relative to wild-type plants greatly overlapped with that observed in the hyl1 mutant. Further analyses showed that COP1 regulates HYL1 protein levels but not through direct ubiquitination. Combining protein synthesis blockers with either autophagy or 26S proteasome inhibitors revealed that HYL1 was degraded by an unknown protease, which removed its N-terminal fragment, where two RNAbinding domains, essential for miRNA processing, are located (Cho et al., 2014). Light regulates COP1 nucleocytoplasmic shuttling (Jang et al., 2010) and, therefore, during the day, light stabilizes HYL1 due to cytoplasmic accumulation of COP1, which would target the protease acting on HYL1 degradation (**Figure 2**). This regulation could control miRNA processing and thus help understand the light/dark accumulation pattern of several mature miRNAs, such as miR167, miR168, miR171, and miR398, previously described by Siré et al. (2009).

HYPONASTIC LEAVES1 protein levels are also controlled by PIF4, another protein involved in photomorphogenesis (Sun et al., 2018). Both HYL1 and DCL1 interact with PIF4, which destabilizes them under red light and stabilizes them in the dark. The mechanism by which PIF4 controls the levels of HYL1 and DCL1 is unknown, but it does not involve transcriptional regulation or the ubiquitin-proteasome pathway. Given that PIF4 interacts with the photoreceptor PHYB and the miRNA processing factors DCL1 and HYL1, the question arises as to whether DCL1 and HYL1 affect light responses (**Figure 2**). Indeed, dcl1 and hyl1 mutants show shorter hypocotyls than wildtype plants under red light, indicating that DCL1 and HYL1 play negative roles in photomorphogenesis (Sun et al., 2018).

Another miRNA processing factor regulated by light signaling pathways is HEN1. Light induction of HEN1 expression results from the coordinated action of HY5 and HY5 homolog (HYH; **Figure 2**), which positively regulate HEN1 downstream of several photoreceptors (Tsai et al., 2014). Consistent with this regulation, HEN1 and HYL1 are negative regulators of photomorphogenesis, in the case of HEN1, this is due to its role as a repressor of key transcription factors of this process (Tsai et al., 2014). In addition, alternative splicing of transcripts encoding several miRNA biogenesis and processing factors, such as DCL1, HYL1, and HEN1, is affected by different light qualities, adding a new layer to the regulation of miRNAs by light (reviewed in Hernando et al., 2017).

The de-etiolation process can also affect miRNA activity toward its targets. Lin et al. (2017) matched small RNA profiling with degradome sequencing in Arabidopsis and found that, although light exposure did not change dramatically the level of most miRNAs, it could modulate their target cleavage activity. In fact, with the exception of miR163, most miRNAs showed little fluctuations in their amount upon light transition. This did not correspond with the analysis of degradome signatures, implying that de-etiolation especially promoted the degradation of miR156/157 and miR396 targets, and thus suggesting higher cleavage activity for these families (Lin et al., 2017). However, although Lin et al. (2017) report that miR168 regulates AGO1

levels under light and could modulate RISC activity and the efficiency of miRNA target cleavage, the mechanisms underlying this regulation by light still need to be demonstrated.

Interestingly, another report has shown AGO1 involvement in light regulation of adventitious rooting and hypocotyl elongation. Arabidopsis ago1 mutants are hypersensitive to light, probably due to deregulation of the PHYA-dependent light signal transduction pathway (Sorin et al., 2005). Moreover, deetiolation of an Arabidopsis hypomorphic ago1-27 mutant in farred light is impaired, indicating that normal photomorphogenesis requires AGO1 (Li et al., 2014). Given that AGO1 is an essential component of the RISC, these findings suggest that miRNA function is involved in far-red light responses.

Therefore, several components of the miRNA machinery are involved in certain light responses. On the other hand, light modulates MIRNA gene expression, accumulation of mature miRNAs by regulating components of the biogenesis pathway, as well as miRNA activity. How miRNAs participate in light as well as photoperiod signaling events will be discussed next.

# miRNAs Involved in Light and Photoperiod Signaling

Our analysis revealed several miRNA families that accumulate upon certain light treatments in different species. Most of these miRNAs mediate critical biological processes necessary for plants to adjust to these changes. They can regulate metabolism, plant development, and certain responses to stress.

### miR396 and UV-B Regulation of Cell Proliferation

For instance, UV-B radiation affects cell proliferation in several species (Dotto and Casati, 2017). In Arabidopsis, UV-B light upregulates miR396 levels, leading to repression of the miR396 target genes GROWTH REGULATING FACTOR1 (GRF1), GRF2, and GRF3, with consequent inhibition of cell proliferation in leaves (Casadevall et al., 2013). In fact, either reducing miR396 activity or expressing miR396-resistant forms of GRF3 or GRF2 results in reduced sensitivity of leaf growth to UV-B irradiation, confirming that GRFs mediate this phenotype. Nevertheless, downregulation of GRF3 by UV-B light also seems to occur through miR396-independent mechanisms. In addition, miR396-mediated repression of leaf cell proliferation by UV-B radiation is dependent on the mitogen-activated protein kinase MPK3, known to be involved in the response to UV-B stress (Casadevall et al., 2013). Interestingly, in UV-B-treated maize leaves, miR396 inhibition had the opposite effect on GRFs, which were upregulated (Casati, 2013). This could be due to the different developmental stages analyzed in Arabidopsis and maize, although further studies are necessary to clarify the role of

miR396 in the regulation of UV-B-dependent responses in dicots and monocots.

### miRNAs Regulating Metabolism

As mentioned before, light controls many metabolic pathways in plants. Within this context, light-regulated miRNAs were shown to modulate metabolic events, such as methylation of specific signaling molecules (e.g., hormones), nutrient allocation, and pigment synthesis/accumulation.

miR163 as well as pri-miR163 were recently shown to be highly induced by light in Arabidopsis (Chung et al., 2016; Lin et al., 2017). Moreover, miR163 may target PXMT1 (At1g66700), a neighbor gene encoding a putative 1,7 paraxanthine methyltransferase that has been implicated in the methylation of natural chemicals such as hormones (Chung et al., 2016). Whereas pri-miR163 is induced in red, blue, and white light, and PXMT1 expression is inhibited upon light exposure, light does not affect pri-miR163 maturation and processing. Both pri-miR163 and its target PXMT1 accumulate in roots, where miR163 accumulation increases root length, especially in the elongation/differentiation zone, most likely by inhibiting PXMT1. Interestingly, the miR163/PXMT1 pair also seems to regulate germination, probably by controlling the early stages of plant development since radicle emergence to seedling de-etiolation. However, to fully understand the role of miR163 in this process, a more extensive functional characterization of PXMT1 would be necessary.

Plant metabolism requires proper nutrient allocation, especially in the case of nutrients that constitute co-factors and are present in different proteins and biological processes. This is the case of copper, since it is required for electron transport in photosynthesis, defense against reactive-oxygen species as well as ethylene perception (Zhang et al., 2014). Maintenance of copper homeostasis is critical to photosynthesis efficiency and thus plant growth. This is achieved by the SPL7 transcription factor, which functions as a copper sensor able to adjust target gene expression upon changing copper levels. Surprisingly, light and copper seem to regulate gene expression of many shared biological processes, suggesting a direct connection between the transcriptional regulators SPL7 and HY5 (Zhang et al., 2014). Using chromatin immunoprecipitation and RNA sequencing, Zhang et al. (2014) characterized the SPL7 regulon and found it to overlap with HY5. This analysis revealed MIR408 as a SPL7/HY5 common target, with SPL7 being the predominant transcription factor determining MIR408 levels, thus confirming previous work showing that HY5 regulates MIR408 expression (Zhang et al., 2011). Nevertheless, HY5/SPL7 coordinated regulation allowed miR408 accumulation upon copper deficiency and high light. Confirming this regulation, MIR408 silencing phenotypes were similar to those of the hy5 spl7 double mutant, whereas miR408 constitutive expression partially rescued them, suggesting that miR408 is a critical component of the SPL7/HY5 network (Zhang et al., 2014). Further analysis of the expression levels of the miR408 targets, LACCASE12 (LAC12) and LAC13, showed that the SPL7/HY5/miR408 module allows differential regulation of very closely related genes. Therefore, this signaling network constitutes a molecular mechanism that integrates light and copper signals to maintain copper homeostasis and consequently efficient photosynthetic rates that promote plant growth.

### **miRNAs and pigment synthesis**

Flavonoids are a group of phenolic compounds that include flavonols, flavones, isoflavones, and anthocyanins (Sharma et al., 2016). Due to their molecular diversity, flavonoids can confer protection against biotic and abiotic stress, as well as regulate plant growth and development. Anthocyanins, for instance, include a wide range of pigments (blue, red, and purple), which can not only protect against UV radiation but can also attract pollinators (Wang et al., 2016). Due to their relevance to plant life, the biosynthesis and regulation of all these compounds have been extensively studied. The transcription factors MYBlike 2 (MYBL2) and SPL9 are negative regulators of anthocyanin biosynthesis in Arabidopsis (Dubos et al., 2008; Matsui et al., 2008; Gou et al., 2011). On the other hand, overexpression of miR156 increases anthocyanin levels in Arabidopsis, since some miR156 target genes, including SPL9, repress anthocyanin biosynthesis (Gou et al., 2011; Cui et al., 2014). In Chinese sand pear (Pyrus pyrifolia) fruits, bagging and subsequent re-exposure to light causes anthocyanin accumulation in fruit peel, giving red color to the fruits (Qian et al., 2013). Qian et al. (2017) hypothesized that miR156 and SPLs would also be involved in anthocyanin accumulation in pears in response to bagging treatments. They detected miR156 in peels of bagging-treated pear fruits and found that the promoters of two pear PpMIR156 genes contained lightresponsive elements. After bag removal, miR156 levels increased, whereas PpSPL transcripts decreased. This was accompanied by upregulation of homologs of genes encoding transcription factors that control anthocyanin biosynthesis, and followed by upregulation of anthocyanin biosynthesis genes (Qian et al., 2017). Although functional analyses of miR156 and PpSPL genes in pear have not yet been reported, these results point to a possible role of miR156 and PpSPL proteins in light-induced anthocyanin biosynthesis in bagging-treated pear fruits.

In a related report, Qu et al. (2016) investigated the miRNA profiling of "Granny Smith" apple peels upon re-exposure to sunlight after fruit bagging, using high-throughput sequencing of small RNA libraries. They found that over 67% of differentially expressed miRNAs were downregulated in the bagged group as compared to the unbagged control. These findings suggest the existence of a light-regulated pool of miRNAs in apple peel. miR156, miR160, miR171, miR172, miR395, and miR398 were differentially expressed upon sunlight re-exposure. Interestingly, these miRNAs families had previously been identified as UV-B-regulated miRNAs in several plant species (see Section "UV-Responsive miRNAs"). In Arabidopsis, in addition to miR156, miR858 positively regulates anthocyanin biosynthesis by targeting specific repressors of this process (Gou et al., 2011; Qian et al., 2013; Cui et al., 2014; Wang et al., 2016). In tomato, miR858 also promotes anthocyanin biosynthesis (Jia et al., 2015). Interestingly, miR156, miR828, and miR858 expression levels varied upon apple fruit debagging, suggesting their involvement in anthocyanin biosynthesis in apple peels. In fact, miR156 correlated with anthocyanin accumulation only in "Granny Smith," whereas miR5072 did it only in the "Starkrimson"

cultivar. These results highlight the high specificity of certain miRNA families to control particular responses, such as pigment accumulation to protect against light stress.

As previously mentioned, miR858 is a positive regulator of anthocyanin biosynthesis (Wang et al., 2016). In Arabidopsis, MIR858A expression is induced by light, especially high light, in a HY5-dependent manner. Upon transfer from dark to high light, MIR858A level oscillates, suggesting also circadian regulation. Interestingly, miR858a overexpression partially complements the long hypocotyl phenotype of the hy5 mutant and rescues the anthocyanin levels of this mutant. miR858a and miR858b can promote anthocyanin biosynthesis in two ways: (1) by indirectly inducing the expression of genes encoding components of the anthocyanin biosynthetic pathway and (2) by inhibiting protein accumulation of their target gene encoding the repressor MYBL2 protein, most likely at the translational level. HY5 directly binds and represses the MYBL2 promoter probably by a mechanism that involves the loss of H3K9Ac and H3K4me3-positive histone marks (Wang et al., 2016). Therefore, HY5 has a dual effect in promoting anthocyanin biosynthesis: it binds and represses MYBL2, and it promotes MIR858A expression in response to light (e.g., high light) signals.

It has been shown that hen1 mutants, which have reduced levels of mature miRNAs, accumulate high anthocyanin levels (Tsai et al., 2014), a result somehow contradictory with miR156 and miR858 promoting anthocyanin biosynthesis. However, Sharma et al. (2016) propose a different function for miR858 in the regulation of flavonoid biosynthesis. They found that miR858 targets the positive regulators of flavonoid biosynthesis MYB11, MYB12, and MYB111. Moreover, transcriptional profiling of miR858-overexpressing lines (miR858-OX) revealed MYB20, MYB42, and MYB83 as new targets of miR858. Since these transcription factors regulate several growth responses, the authors characterized growth phenotypes of miR858-OX and miR858a target mimic lines (MIM858), in which miR858a activity is silenced. While miR858-OX plants showed enhanced growth, early flowering, and an increase in seed size, MIM858 plants displayed reduced growth, late flowering, and smaller seeds (Sharma et al., 2016). The metabolic profiling of these lines confirmed their opposite effect on flavonoid accumulation, since miR858 depletion resulted in the accumulation of major flavonoids, whereas they were significantly reduced in miR858- OX. This was accompanied by an opposite effect on lignin biosynthesis, since miR858-OX showed increased lignification and upregulation of lignin biosynthetic genes. These results indicate that miR858 could differently regulate anthocyanin, other flavonoids, and lignin biosynthesis. This could be achieved by targeting different MYB transcription factors in response to distinct light conditions (white light versus low or high light). This differential regulation could then help maintain a metabolic flux balance between the different biosynthetic pathways.

Chlorophylls are critical plant pigments involved in light absorption and energy transfer during photosynthesis. However, this fundamental process for plant life also generates reactive oxygen species, which are detrimental for plant growth and development. Considering that chlorophyll biosynthesis precursors are also sources for reactive oxygen species, it is of paramount importance to tightly regulate this pathway. One of such mechanisms includes miR171 and miR171 targeted SCL genes, also known as HAIRY MERISTEM (HAM) or LOST MERISTEMS (LOM). In Arabidopsis, miR171 positively regulates chlorophyll biosynthesis in the light through downregulation of miR171-targeted SCLs, leading to upregulation of protochlorophyllide oxidoreductase (POR), a key enzyme in chlorophyll biosynthesis (Ma et al., 2014). MIR171C-overexpressing plants and scl6 scl22 scl27 mutants show higher chlorophyll and POR levels than wild-type plants, whereas plants expressing a miR171-resistant form of SCL27 (rSCL27) show reduced chlorophyll and POR levels (Wang et al., 2010; Ma et al., 2014). On the other hand, downregulation of POR expression reduces the chlorophyll content of wild-type, MIR171C-OX, and scl triple mutants (Ma et al., 2014), indicating that the role of miR171c and SCLs on chlorophyll regulation is mediated by PORC. SCL27 represses PORC expression by directly binding and inhibiting its promoter.

Moreover, the miR171-SCL module also mediates gibberellindependent effects on chlorophyll biosynthesis due to its regulation of DELLA proteins and PORC expression in the light, but not in the dark. In fact, SCL27 interacts with the DELLA protein RGA and this interaction reduces the ability of SCL27 to bind to the PORC promoter. Finally, SCLs induce the expression of MIR171 genes, revealing the existence of a regulatory feedback loop (Ma et al., 2014). Ma et al. (2014) propose that this feedback loop may also contribute to maintain the diurnal oscillation of miR171.

Taken together, these findings emphasize the role of miRNAs in diverse light-regulated processes. In addition, miRNAs can also directly target major regulators of the light signaling pathways and/or the photoperiod pathway, thus modulating photomorphogenesis or the time to flower, respectively. The most recent findings on miRNAs involved in these processes are discussed below.

### Photomorphogenesis

Within the group of light-responsive miRNAs, the miR156/157 family occupies a prominent position (**Table 1**). In Arabidopsis, miR157d and miR319 promote the degradation of their target transcripts, encoding the positive and negative key photomorphogenesis regulators HY5 and TCP (TEOSINTE BRANCHED1, CYCLOIDEA, and PCF) transcription factors, respectively. Interestingly, both induction and stabilization of miR157d and miR319 were shown to depend on HEN1, with HEN1 accumulation increasing miR157d and miR319 levels in de-etiolating seedlings (Tsai et al., 2014). In turn, HY5 induces HEN1 expression in a light-dependent manner. Therefore, HEN1 and HY5 constitute a negative feedback regulatory loop that is mediated by miR157, since this miRNA will ultimately target HY5 transcripts for cleavage. In a hen1 mutant, both HY5 transcript and protein levels are increased, probably due to the decrease in miR157d, resulting in a light-hypersensitive phenotype. Oppositely, miR157 constitutive expression reduced HY5 transcript and protein levels, resulting in seedlings displaying a light hyposensitivity phenotype. Nevertheless, the light-hypersensitive phenotype displayed by hen1 mutants may

also be dependent on miR319, which promoted the cleavage of TCP mRNAs and repressed hypocotyl elongation (Tsai et al., 2014). The role of miR319 in photomorphogenesis has been confirmed using a loss of function mutant, which shows longer hypocotyls than the wild type under red light (Sun et al., 2018). Perhaps, the HEN1/HY5 feedback loop may help explain the inconsistencies found in the effect of HEN1, miR156, and miR858 on anthocyanin accumulation.

Three additional miRNAs, miR160, miR167, and miR848, affect Arabidopsis hypocotyl elongation under red light (Sun et al., 2018). It is worth mentioning that miR160 and miR167 respond to different light conditions in several species (**Table 1**). Further understanding of the role of these miRNAs in light responses will undoubtedly provide novel insights into how miRNAs contribute to adaptation to different light environments.

Besides the early stages of photomorphogenesis, when the seedling reaches for light and hypocotyl elongation is critical, miRNAs regulate other light responses during plant adult life, such as the perception of their surroundings and neighbors or the time to flower, for instance. When in isolation, plants receive light with a high red to far-red (R:FR) ratio. However, when they grow under a canopy or in close proximity to neighboring plants, the R:FR ratio is lower and this triggers diverse morphological and physiological responses that allow the plants to adapt to these light conditions (Ballaré and Pierik, 2017). When Arabidopsis plants grown under light/dark cycles are treated with a pulse of far-red light at the end of the light period (end-of-day farred, EOD-FR), they show earlier flowering, increased petiole elongation, and a reduction in the number of rosette branches in comparison with plants grown under normal white-light/dark cycles (WL; Xie et al., 2017). EOD-FR treatment causes a decrease in mature miR156 levels by downregulating the expression of several MIR156 genes. Consistent with this, there is upregulation of several miR156-target SPL genes in EOD-FR-treated plants (Xie et al., 2017). This suggests that the miR156-SPL module might be involved in EOD-FR responsiveness. Indeed, when plants with reduced miR156 activity (MIM156 plants) are grown under WL, they show a constitutive EOD-FR response (Xie et al., 2017). These findings suggest that a reduction in miR156 levels is required for this response, and place miR156 as a repressor of this process.

PHYTOCHROME-INTERACTING FACTORS (PIFs) mediate the effect of phytochromes on numerous light responses (Leivar and Monte, 2014). PIF protein abundance increases under low R:FR light conditions (Lorrain et al., 2008; Leivar et al., 2012; Xie et al., 2017), and PIFs act as positive regulators of EOD-FR responses, showing therefore an opposite behavior to miR156. Further substantiating a previous report suggesting that MIR156E is a direct target gene of PIF5 (Hornitschek et al., 2012), Xie et al. (2017) showed that several PIFs repress transcription of five MIR156 genes, including MIR156E, by directly binding to PIF-binding sites present in their promoters. This causes a reduction in mature miR156 levels and a concomitant increase in miR156-target SPL transcript abundance. Moreover, genetic analyses show that miR156 acts downstream of PIF5 (Xie et al., 2017). Altogether, these results demonstrate that miR156 responds to EOD-FR treatments and mediates, probably through downregulation of SPL genes, the effect of PIFs on at least part of the low R:FR light responses. It will be interesting to determine whether these findings can be reproduced under low R:FR conditions more closely resembling canopy or proximity shade. Therefore, not only do miR156/157 respond to light but also affect several light-regulated processes.

### miRNAs and Photoperiod-Regulated Processes

Photoperiod affects many plant metabolic, physiological, and developmental processes. It also affects the levels of some miRNAs, as we have already discussed (Li et al., 2015). An example is miR156, which is regulated by photoperiod in potato and soybean, although not in Arabidopsis (Jung et al., 2012; Bhogale et al., 2014; Li et al., 2015). miR156, via its target genes, downregulates miR172 in several plant species (Chuck et al., 2007; Wu et al., 2009; Bhogale et al., 2014). In Arabidopsis, mature miR172 levels are higher under LD than SD and this difference does not seem to be driven by transcription, given that the abundance of at least two miR172 primary transcripts is lower under LD than SD (Jung et al., 2007). Rather, it probably results from regulation of miR172 processing. A gigantea (gi) mutant, which shows a reduced response to photoperiod and reduced expression of two genes involved in miRNA processing, DCL1 and SE, shows lower mature miR172, but not pri-miR172, levels than the wild type. However, additional factors must contribute to the photoperiodic regulation of miR172, given that this miRNA still responds to photoperiod in the gi mutant (Jung et al., 2007). Potato and soybean also show different miR172 levels under LD and SD (Martin et al., 2009; Li et al., 2015), suggesting that the photoperiodic regulation of this miRNA may be evolutionarily conserved.

Interestingly, at least four photoreceptors, phytochrome A, phytochrome B, and cryptochrome 1 and 2, regulate miR172 levels (Jung et al., 2007; Martin et al., 2009; Zhao et al., 2015). In agreement with this, red light downregulates miR172 and blue light upregulates it in Arabidopsis. Therefore, both light quality and duration affect miR172 abundance. In addition, TIMING OF CAB EXPRESSION1 (TOC1), a component of the circadian clock, reduces mature miR172 levels, but despite this, mature miR172 levels do not show daily oscillations in Arabidopsis (Jung et al., 2007). Further research is still needed to determine whether the circadian clock regulates miR172.

In Arabidopsis, flowering is regulated by photoperiod, such that plants flower earlier under LD than SD. By contrast, soybean flowering and potato tuber formation are induced by SD. Overexpression of miR172 promotes flowering in Arabidopsis and tuberization in potato and reduces the photoperiodic tuberization response (Aukerman and Sakai, 2003; Chen, 2004; Jung et al., 2007; Martin et al., 2009), strongly suggesting that miR172 is involved in the regulation of photoperiodic processes. miR172 targets a subfamily of AP2-like genes, including AP2 itself, which partially redundantly repress flowering (Zhu and Helliwell, 2011). In Arabidopsis, and probably in soybean, alteration of miR172-target gene expression or function leads to a reduced response of flowering to photoperiod, confirming the role of the miR172/AP2-like module in the photoperiodic regulation of flowering (Mathieu et al., 2009; Yant et al., 2010;

Zhao et al., 2015). In Arabidopsis plants grown under LD, GI, and probably several photoreceptors, upregulates miR172, which negatively regulates its target genes, which in turn directly repress the expression of several genes promoting flowering, including FLOWERING LOCUS T (FT), and promote the expression of genes repressing flowering (**Figure 3**) (Jung et al., 2007; Mathieu et al., 2009; Yant et al., 2010). Downregulation of miR172 target genes, thus, accelerates flowering under LD conditions. This photoperiodic flowering pathway is independent of the photoperiodic flowering regulator CONSTANS (CO) in Arabidopsis (Jung et al., 2007; Mathieu et al., 2009; Yant et al., 2010), although it has been proposed to involve a CO-like gene in soybean (Zhao et al., 2015).

Given that miR156 responds to light in several species, to photoperiod in potato and soybean, is involved in flowering control, and negatively regulates miR172 (Chuck et al., 2007; Wang et al., 2009; Wu et al., 2009; Bhogale et al., 2014; Li et al., 2015) (**Table 1**), it seems likely that miR156 may play a role in photoperiodic flowering. However, whether this is the case still needs to be proved. In addition to light quality and photoperiod, miR156 and miR172 levels respond to other environmental and endogenous signals and play roles in processes other than flowering and tuberization (Zhu and Helliwell, 2011; Han et al., 2013; Nova-Franco et al., 2015; Ripoll et al., 2015; Wang and Wang, 2015; Huo et al., 2016; Li et al., 2017;

Díaz-Manzano et al., 2018; Luan et al., 2018), indicating that miR156 and miR172 influence many aspects of plant biology.

The miR170/171 family has been shown to be responsive to different light conditions in several species (**Table 1**). In addition, the rice OsMIR171C promoter contains several elements putatively involved in light responses (Fan et al., 2015). Consistent with this, OsMIR171C transcript levels show a diurnal oscillation with a peak early in the morning, similar to the oscillation of mature miR171 in Arabidopsis (Fan et al., 2015). Conversely, transcript levels of four rice miR171 target genes, OsHAM1 to OsHAM4, accumulate from the evening until early morning under light/dark cycles. Although it is not clear whether these rice oscillation patterns are regulated by light or by the circadian clock, miR171c levels respond to photoperiod, with higher levels under LD than SD (Fan et al., 2015). miR171c upregulation in the delayed heading (dh) mutant, which carries a T-DNA insertion in the OsMIR171C promoter, leads to late flowering (heading), strongly suggesting that miR171c represses flowering (Fan et al., 2015). Moreover, the upregulation of miR171c by LD, which are non-inductive conditions for rice flowering, fits with a role in delaying flowering. However, further research is still required to determine if miR171 affects the photoperiodic regulation of flowering in rice. Nevertheless, the expression pattern of miR171c and OsHAMs in the shoot apex of wild-type plants is consistent with a role in regulating the floral transition. This is also supported by the downregulation of three key positive regulators of flowering (Ehd1, Hd3a, and RFT1) in the dh mutant in comparison with the wild type (Fan et al., 2015). Interestingly, the dh mutant also shows upregulation of miR156 (Fan et al., 2015), a flowering repressor miRNA in several species (Wang, 2014), and of OsPHYC, which encodes a photoreceptor that also delays flowering in rice (Takano et al., 2005; Fan et al., 2015). Therefore, miR171 is regulated by light and, in turn, it may be involved in regulating light responses mediated by OsPHYC.

Another miRNA that responds to day length and regulates photoperiodic flowering is miR5200, a miRNA specific of the Pooideae family. In a Brachypodium distachyon accession that flowers much earlier under LD than SD, mature miR5200, as well as pri-miR5200a and pri-miR5200b, is present at much higher levels under SD than LD and its pri-miRNAs follow a daily rhythm under SD (Wu et al., 2013). The expression pattern of MIR5200A and MIR5200b correlates with active histone marks in these genes under SD and repressive histone marks under LD. Another five Pooideae species accumulate more miR5200 under SD than LD, revealing conservation of this photoperiodic regulation among these grass species (Wu et al., 2013). In B. distachyon, miR5200 targets two FT-like genes, FTL1 and FTL2, whose transcripts are more abundant under long than short photoperiods. B. distachyon plants overexpressing this miRNA flower much later than wild-type plants under LD, whereas plants with reduced miR5200 activity (MIM5200) flower earlier than the wild type under SD, and like the wild type under LD, showing therefore a reduced response to photoperiod. Consistently, FTL1 and FTL2 mRNA levels are increased in MIM5200 plants under SD, but unchanged under

LD in comparison with wild-type plants (Wu et al., 2013). Therefore, miR5200 plays an important role in the photoperiodic regulation of flowering in B. distachyon by delaying flowering under SD.

# LONG NON-CODING RNAs

Next-generation sequencing techniques have unraveled a whole new transcriptional landscape that includes lncRNAs. Several screening approaches have been designed to identify novel plant lncRNAs that would accumulate in response to certain environmental cues, in specific organs/tissues or at particular developmental stages. However, although lncRNA listings are available, the functional characterization of most of these transcripts is yet to be done. Most of the knowledge on plant lncRNA function is associated with FLOWERING LOCUS C (FLC), a negative regulator of flowering. In fact, several lncRNAs arising from this locus were shown to repress FLC transcript levels upon vernalization and thus promote flowering. COLD OF WINTER-INDUCED NON-CODING RNA FROM THE PROMOTER (COLDWRAP) is transcribed from the FLC promoter region, whereas COLD-ASSISTED INTRONIC NON-CODING RNA (COLDAIR) is expressed from the first intron. ANTISENSE LONG (ASL) and COLD-INDUCED LONG ANTISENSE INTRAGENIC RNA (COOLAIR) both originate from the FLC 3 <sup>0</sup>UTR region and show partial overlap. The combined action of the three lncRNAs was shown to silence FLC expression, and therefore to promote flowering, through the accumulation of repressive histone marks (reviewed in Chekanova, 2015; Whittaker and Dean, 2017). Although most of the research on these lncRNAs pertains to vernalization-regulated flowering, and therefore is outside the scope of this review, there are reports connecting FLC with photoperiod and circadian regulation. FLC lengthens the circadian period, especially at high temperature (Swarup et al., 1999; Edwards et al., 2006), but shortening of the circadian period by vernalization is not mediated by FLC (Salathia et al., 2006). Therefore, whether the regulation of FLC by vernalization-associated lncRNAs is linked to its role in circadian periodicity is yet to be determined.

Several FLC clade members were shown to regulate flowering both under LDs and SDs. Protein levels of one of them, MAF3 oscillated throughout the day, suggesting regulation by the photoperiod or the circadian clock (Gu et al., 2013). In fact, FLC clade members could form multi-protein complexes and regulate flowering depending on photoperiod and temperature. However, further research is needed to determine whether vernalization-associated lncRNAs would also participate in these processes. Therefore, in this section, we will discuss the available reports focused on light and circadian-responsive lncRNAs.

## lncRNAs as Regulators of Light-Dependent Processes

One of the screening approaches that have allowed the identification of novel lncRNA transcriptional units used a reproducibility-based tiling-array analysis strategy (RepTAS) previously described by Liu et al. (2012). Thus, Wang H. et al. (2014) identified a total of 37,238 light-responsive lncNAT pairs (NAT pairs where one of the components is an lncRNA) in Arabidopsis. These NAT pairs were afterward verified by different methods, including the screening of a novel ATH (Arabidopsis thaliana) NAT custom array, strandspecific RNA-seq (ssRNA-seq), and quantitative RT-PCR (RT-qPCR). To identify bona fide light-regulated lncRNAs, the expression levels of these antisense and sense transcripts were determined in etiolated seedlings and seedlings undergoing de-etiolation that were exposed to continuous white light for 1 and 6 h. Not only did this approach allow the identification of light-responsive lncNATs but also revealed their preferential accumulation in cotyledons compared to hypocotyls, and roots. Interestingly, the number of light-responsive lncNAT pairs was much higher after 6 h of treatment than at 1 h, indicating spatial-, developmental-, and temporal-specific patterns of accumulation. When the expression levels of both the sense and antisense components of each NAT pair was determined, these lncNAT pairs were classified into two groups: light-responsive concordant NAT pairs and light-responsive discordant NAT pairs. In order to further characterize their expression pattern, a parallel analysis was performed matching specific histone modifications with their transcriptional changes. This approach showed that transcription changes of these light-regulated NATs were preferentially correlated with histone H3 acetylation (H3K9ac and/or H3K27ac), H3K9ac being the predominant positive histone mark, suggesting its involvement in regulating lightresponsive NAT pairs (Wang H. et al., 2014). Although the functional characterization of most of these candidates is yet to be done, these results provide the first evidence that light is a major shaper of the plant non-coding transcriptional landscape, including both long and small non-coding RNAs.

HIDDEN TREASURE 1 was the first lncRNA described as a positive regulator of photomorphogenesis under continuous red light. This is achieved by HID1 ability to downregulate the expression levels of PIF3, a negative regulator of this process (Wang Y. et al., 2014). In agreement with this, PIF3 mRNA levels were increased in hid1 mutants that displayed elongated hypocotyls under red light, a typical phenotype of PIF accumulation. Functional characterization of HID1 revealed that this lncRNA assembles in nuclear protein–RNA complexes where it is able to associate with PIF3 50UTR and repress its expression. Therefore, HID1 would act through PIF3 to modulate hypocotyl elongation. Sequence homology searches revealed that HID1 is present in other plant species. Confirming its conservation, Arabidopsis hid1 mutants expressing OsHID1, a rice homolog, could be rescued of their elongated hypocotyl phenotype (Wang Y. et al., 2014). These results suggest that lncRNAs may be conserved between species and regulate similar biological processes, similar to the families of conserved miRNAs.

Interestingly, besides its role as a positive regulator of photomorphogenesis in red light, HID1 can also act as a

negative regulator of cotyledon greening during de-etiolation of Arabidopsis in white light (Wang et al., 2018). Within the several steps of this process, the reduction of protochlorophyllide (Pchlide) to chlorophyllide via POR is fundamental for cotyledon greening to occur. Confirming its repressor role in this process, dark-grown hid1 mutants transferred to white light presented a higher rate of cotyledon greening compared to Arabidopsis wild-type plants. Moreover, HID1 knockdown led to a reduction of Pchlide content in the dark, as well as higher POR mRNA levels, indicating that HID1 plays a role in cotyledon greening by promoting Pchlide accumulation and repressing POR transcription (Wang et al., 2018). To uncover the HID1/PIF3 relationship in cotyledon greening, double hid1 pif3 mutants were generated. PIF3 was previously shown to regulate seedling greening by preventing Pchlide over-accumulation (Leivar and Monte, 2014). However, hid1 pif3 double mutants greening rate was comparable to that of hid1, suggesting that HID1 would act downstream of PIF3 (Wang et al., 2018). Since PIF3 does not regulate HID1 expression, the exact mechanism by which HID1 and PIF3 regulate greening still remains to be determined (Wang et al., 2018). Together, these results suggest that depending on the light-mediated developmental process, HID1 could associate with different factors allowing it to perform different functions.

### Circadian-Regulated lncRNAs in Photoperiod-Dependent Flowering

Long non-protein coding RNAs can also display circadian regulation. This is characterized by their ability to maintain an oscillatory pattern of expression even in the absence of environmental cues (e.g., under constant conditions). Several screening approaches allowed the identification of circadian ncRNAs in fungi, mammals, and Arabidopsis, some of them being lncRNAs that were also NATs (Kramer et al., 2003; Hazen et al., 2009; Coon et al., 2012; Xue et al., 2014). However, even if non-coding NATs were reported for components of the Arabidopsis oscillator (Hazen et al., 2009), their function had not been investigated. Recently, we characterized a circadian-regulated Arabidopsis lncRNA, FLORE, which is a NAT to the CYCLING DOF FACTOR 5 (CDF5) transcriptional regulator (Henriques et al., 2017). FLORE and CDF5 displayed an antiphasic pattern of expression that resulted in part from a mutual inhibitory function, as well as an opposite biological function (**Figure 3**). CDFs have been characterized as repressors of photoperiodic flowering due to their ability to repress CO and FT transcription (Song et al., 2015). FLORE inhibits CDF5 accumulation and contributes to maintain its proper oscillatory pattern. Therefore, due to downregulation of CDF5, FLORE overexpression in the vascular tissue promotes flowering. On the other hand, CDF5 will also regulate FLORE transcript levels, thus contributing to its oscillation. These findings indicate that the circadian clock is able to control the expression of both coding and noncoding transcripts, in this case, CDF5 and FLORE (Henriques et al., 2017). However, the mutual regulation within this NAT pair seems to be required to ensure the timely accumulation of each transcript. Further research is still necessary to uncover the molecular mechanisms that account for this regulation.

# FINAL CONCLUSIONS

Both small and lncRNAs have been associated with specific biological processes that can occur at particular developmental stages, often in very precise locations within certain organs and tissues. Light is one of the major environmental cues for plants and essential for their survival. Therefore, it is not surprising that light modulates the expression, processing, and activity of these non-coding RNAs. It became apparent that, in the case of specific miRNAs, a correlation pattern emerges where the accumulation of specific families associates with particular light conditions in different species. Moreover, this seems to be a two-way regulation since miRNAs can also target basic components of light and photoperiod signaling, thus being also required for the proper functioning of these pathways. Although some of the miRNAs that are consistently revealed as light-regulated in diverse plant species have not been shown to play functional roles in light responses so far, they are excellent candidates for performing such roles. lncRNAs have recently become the subject of scrutiny, but the few reports available prevent the establishment of any specific trend. Sequencing and bioinformatics efforts led to the construction of a plant-specific lncRNA database (Jin et al., 2013). However, other approaches combining transcriptomics, epigenetic, and functional assays would provide further insights into the biological role of plant lncRNAs. Nevertheless, it is noteworthy mentioning that, from the reduced group of studied lncRNAs, two of them associate with lightdependent processes. These findings do show, however, that light is a massive re-programmer of non-coding transcription able to set in motion-specific fine-tuning mechanisms with the ultimate goal of adjusting plant development to their surroundings.

# AUTHOR CONTRIBUTIONS

All the authors read the cited publications, prepared summaries, and wrote the manuscript.

# FUNDING

RH group was supported by grants from the Spanish "Ministerio de Economía y Competitividad" (MINECO) (BIO2015-72161-EXP, BIO2015-70812-ERC, and RYC-2011- 09220) and by the European Commission (PCIG2012-GA-2012- 334052). PS-L acknowledges financial support from the MINECO (Grant BFU2015-64409-P). We also acknowledge the financial support of the "Severo Ochoa Program for Centres of Excellence in R&D" 2016-2019, SEV-2015-0533) and the Generalitat de Catalunya (CERCA Program).

### REFERENCES




**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Sánchez-Retuerta, Suaréz-López and Henriques. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

fpls-09-00600 April 30, 2018 Time: 15:31 # 1

# Transposon-Derived Non-coding RNAs and Their Function in Plants

### Jungnam Cho\*

The Sainsbury Laboratory, University of Cambridge, Cambridge, United Kingdom

Transposable elements (TEs) are often regarded as harmful genomic factors and indeed they are strongly suppressed by the epigenetic silencing mechanisms. On the other hand, the mobilization of TEs brings about variability of genome and transcriptome which are essential in the survival and evolution of the host species. The vast majority of such controlling TEs influence the neighboring genes in cis by either promoting or repressing the transcriptional activities. Although TEs are highly repetitive in the genomes and transcribed in specific stress conditions or developmental stages, the trans-acting regulatory roles of TE-derived RNAs have been rarely studied. It was only recently that TEs were investigated for their regulatory roles as a form of RNA. Particularly in plants, TEs are ample source of small RNAs such as small interfering (si) RNAs and micro (mi) RNAs. Those TE-derived small RNAs have potentials to affect non-TE transcripts by sequence complementarity, thereby generating novel gene regulatory networks including stress resistance and hybridization barrier. Apart from the small RNAs, a number of long non-coding RNAs (lncRNAs) are originated from TEs in plants. For example, a retrotransposon-derived lncRNA expressed in rice root acts as a decoy RNA or miRNA target mimic which negatively controls miRNA171. The post-transcriptional suppression of miRNA171 in roots ensures the stabilization of the target transcripts encoding SCARECROW-LIKE transcription factors, the key regulators of root development. In this review article, the recent discoveries of the regulatory roles of TE-derived RNAs in plants will be highlighted.

Keywords: transposable elements, domestication, small RNA, long non-coding RNA, microRNA target mimic

# INTRODUCTION

Transposable elements (TEs) are the major constituent of many eukaryotic genomes. Especially in the cereal crops (e.g., barley, wheat, and maize), more than 80% of their genomes are made up of transposons (Tenaillon et al., 2010). TEs are classified to two major classes depending on their modes of transposition; class I and class II (Feschotte et al., 2002; Wicker et al., 2007). Class I TEs, also known as retrotransposons, move through RNA intermediates that are later converted to cDNAs, creating extra copies in the genome. The long terminal repeat (LTR) retrotransposons and the long interspersed nuclear elements (LINEs) are the two main types of retrotransposon. Both LTR retrotransposons and LINEs are autonomous elements since they encode for the proteins necessary for transposition, while those that depend on the autonomous elements such as large retrotransposon derivatives (LARDs), terminal repeat retrotransposons in miniature (TRIMs) and short interspersed nuclear elements (SINEs) are non-autonomous retrotransposons. Unlike class I, class II transposons, or DNA TEs, are excised from one location and insert to another genomic

### Edited by:

Dora Szakonyi, Instituto Gulbenkian de Ciência (IGC), Portugal

### Reviewed by:

German Martinez, Swedish University of Agricultural Sciences, Sweden Deqiang Zhang, Beijing Forestry University, China

### \*Correspondence: Jungnam Cho

jungnam.cho@slcu.cam.ac.uk

### Specialty section:

This article was submitted to Plant Cell Biology, a section of the journal Frontiers in Plant Science

Received: 28 February 2018 Accepted: 16 April 2018 Published: 03 May 2018

### Citation:

Cho J (2018) Transposon-Derived Non-coding RNAs and Their Function in Plants. Front. Plant Sci. 9:600. doi: 10.3389/fpls.2018.00600

**80**

fpls-09-00600 April 30, 2018 Time: 15:31 # 2

position by the transposase protein which is encoded within DNA TEs. Class II TEs include another subclass, Helitrons, which replicate through rolling circle amplification. In many plant genomes, retrotransposons are more abundant compared to class II elements. Particularly, the LTR retrotransposons are the predominant families of TEs in many plants (Vitte et al., 2017). The replication cycle of the LTR retrotransposons initiates with transcription of genomic copy by the host's RNA polymerase (Pol) II. The mRNAs of LTR retrotransposons are subjected to both translation and reverse-transcription (Grandbastien, 1998). Autonomous LTR retrotransposons produce multiple proteins including GAG, aspartic protease, reverse-transcriptase, RNase H and integrase which are required for the completion of retrotransposition cycle. As a result of reverse-transcription, the linear and double-stranded DNA is produced which is known as extrachromosomal DNA (ecDNA). The ecDNAs are then transported back to the nucleus and integrate to genomic chromosomal DNA by the integrase protein.

Since TE mobilization can be mutagenic, the host genomes have evolved elaborate mechanisms to suppress their activities (Matzke and Mosher, 2014). TEs are primarily repressed by the epigenetic silencing pathways including histone modification and DNA methylation. In plants, the RNA-directed DNA methylation (RdDM) pathway plays a central role in TE silencing. Genomic regions marked by DNA methylation are recognized by the plant-specific RNA polymerase, RNA PolIV, which transcribes relatively short stretches of RNAs (Blevins et al., 2015; Li et al., 2015; Zhai et al., 2015). The transcribed RNAs are then duplexed by the RNA-dependent RNA polymerase (RDR) 2 and subsequently sliced to 24 nucleotide (nt) small interfering (si) RNAs by the DICER-like (DCL) 3. These 24 ntsiRNAs are bound by the ARGONAUTE (AGO) 4 proteins and interact with the nascent RNA transcribed by the RNA PolV. AGO4 then recruits multiple proteins including SU(VAR)3-9 HOMOLOG (SUVH) 4/5/6 and DOMAINS REARRANGED METHYLASE (DRM) 1/2 that mediate repressive histone modification (H3K9me2) and DNA methylation, respectively, thus contributing to reinforcement of the silenced state of TE chromatins (Zilberman et al., 2004; Tran et al., 2005; Zhong et al., 2014). TEs escaped from silencing or newly introduced to the genome are recognized by the RDR6-RdDM pathway that post-transcriptionally degrades TE mRNAs. RNA PolIItranscribed TE mRNAs are processed to 21 or 22 nt-siRNAs by the RDR6 and DCL2/4 (Creasey et al., 2014). These 21 or 22 nt-siRNAs associate with AGO1 and target TE mRNAs for degradation. Intriguingly, TE-associated siRNAs can also interact with non-TE target transcripts exerting certain regulatory roles in various biological processes. In mammals, PIWI-interacting RNAs regulate a large number of mRNAs and long non-coding RNAs (lncRNAs) in testis, suggesting widespread regulatory roles of TE-derived small RNAs in both plants and animals (Watanabe et al., 2015).

In addition to siRNAs, many plant miRNAs have been suggested to be evolved from TEs (Piriyapongsa and Jordan, 2008; Li et al., 2011). Although miRNAs are distinct from siRNAs in origin and biogenesis by definition (Borges and Martienssen, 2015), the categorization of small RNAs identified by deep sequencing has not been done with sufficient precision. In fact, considerable number of siRNAs are mis-annotated to miRNAs (McCue et al., 2012). Nonetheless, multiple lines of evidence indicate that TEs have co-opted to miRNAs since the repeated sequences associated with TEs can readily form RNA hairpin structures that can be subsequently processed by miRNA biogenesis pathways (Piriyapongsa and Jordan, 2008; Li et al., 2011). Moreover, the vast majority of lncRNAs are originated from TEs in mammalian as well as plant genomes (Kelley and Rinn, 2012; Liu et al., 2012; Kapusta et al., 2013), suggesting dynamic evolutionary exaptation of TEs in the form of RNA. In the following two sections, several examples of TE domestication to functionally relevant regulatory RNAs in plants will be explained.

### TRANSPOSON-DERIVED SMALL RNAs

### TE-siRNA in Stress Response

In the mutant of Decreased DNA methylation 1 (DDM1), a gene encoding ATP-dependent chromatin remodeler in Arabidopsis, global DNA methylation level is dramatically reduced, thereby numerous TEs are reactivated (Vongs et al., 1993; Creasey et al., 2014). A large fraction of those reactivated TEs are accompanied with the production of 21 or 22 nt-siRNAs, known as epigenetically activated siRNAs (easiRNAs) (Creasey et al., 2014). The easiRNAs target TE transcripts for cleavage ensuring the silencing of TEs at the post-transcriptional step. Interestingly, a subset of the easiRNAs in ddm1 mutants can interact with genic mRNAs reducing their expression levels. For example, siRNA854 is one of the easiRNAs generated in ddm1 mutant and produced from Athila6A TE. It targets 3<sup>0</sup> UTR of UBP1 transcript which encodes a stress granule protein (**Figure 1A**; McCue et al., 2012). Using the multiple reporter gene constructs containing the 3<sup>0</sup> UTR of UBP1, it was demonstrated that siRNA854 represses non-TE targets as well. The expression levels of Athila6A and siRNA854 are increasingly upregulated in multiple generations of ddm1 mutation. For example, the plants with ddm1 homozygous mutation for six generations (ddm1 F6) have higher levels of siRNA854 compared to ddm1 F2. The ubp1 mutants show strong susceptibility to osmotic stress and similar phenotype was also observed in ddm1 F6 but not in ddm1 F2 plants. Therefore, the targeting of Athila6Aderived siRNA854 to UBP1 transcript and alteration of resistance to abiotic stresses provides insight into how TEs adapted to changing environment in plants. In addition to UBP1, Athila6Aderived easiRNAs can target other genic mRNAs including AMS and HHP2 (McCue et al., 2013). Several of those targets were experimentally validated for the easiRNA-mediated repression by the short tandem target mimic (STTM) transgenic approach, however, the biological relevance of this regulation is yet to be answered.

A more recent paper by Zhang et al. (2016) suggested that TEsiRNA815 in rice can induce de novo DNA methylation in the target gene loci through RdDM pathway (**Figure 1A**). Two allelic transcription factor genes, WRKY45-1 and WRKY45-2, were previously shown to have opposing effects in the resistance to

Xanthomonas oryzae pv. oryzae (Xoo). WRKY45-1 allele produces TE-siRNA815 from WANDERER\_OS-type DNA TE located in the intron. TE-siRNA815 then recognizes the complementary sequence and deposits DNA methylation through RdDM pathway in the intron of ST1 locus, which is critical in the resistance against Xoo. On the other hand, WRKY45-2 allele lacks such siRNA producing region that ensures stable ST1 expression contributing to the pathogen resistance.

# TE-siRNA in Hybridization Barrier

fpls-09-00600 April 30, 2018 Time: 15:31 # 3

Since easiRNAs are mainly produced in the epigenetic mutants, it has been questioned if easiRNAs have a function in natural conditions. Two recent studies answered this question by demonstrating the roles of easiRNAs in the hybridization barrier in Arabidopsis (Borges et al., 2018; Martinez et al., 2018). The vegetative nuclei of pollen grains have reduced activity of DDM1 and numerous TEs are reactivated (Slotkin et al., 2009). Pollenspecific miRNA845b recognizes the conserved sequence [primerbinding site (PBS)] in the LTR retrotransposons activated in pollen and triggers the initial cleavage of LTR-TE mRNAs followed by the production of easiRNAs (Borges et al., 2018). Higher dosage of the paternal genome brings higher amount of easiRNAs in fertilization which gives rise to the unbalanced gametic siRNAs and ultimately seed failure (triploid block, Martinez et al., 2018). When the paternal easiRNA levels were suppressed by poliv mutation, the triploid blockage phenotype was partly restored, indicating a critical role of the paternal easiRNAs in the hybridization barrier (Martinez et al., 2018). The exact mechanism for the easiRNA-mediated triploid block is still unclear, however, it was suggested that the excess paternal easiRNAs might interfere with DNA methylation establishment around the paternally expressed imprinted genes by hijacking PolV-transcribed nascent transcripts (**Figure 1A**).

# TE-Small RNA as Anti-silencing Factor

Transposable elements have often domesticated to miRNA genes in plants (Li et al., 2011). One example is miRNA820 of rice. miRNA820 is 22 or 24 nt in size and is originated from the internal region of CACTA DNA TE (Nosaka et al., 2012). Interestingly, miRNA820 targets the transcripts of DRM2 that encodes a de novo DNA methyltransferase (**Figure 1A**). The targeting of miRNA820 to DRM2 is evolutionarily conserved

Frontiers in Plant Science | www.frontiersin.org

fpls-09-00600 April 30, 2018 Time: 15:31 # 4

in the Oryza genus and the repression of DRM2 gene results in strong reduction in DNA methylation and transcriptional upregulation of many TEs. Therefore, miRNA820 can be seen as an anti-silencing factor encoded within a TE that works at the post-transcriptional level. Similarly, UBP1 is the Arabidopsis homolog of TIA-1 in mammals which is known to suppress the viral translation of Tick-Borne Encephalitis Virus (Albornoz et al., 2014). McCue et al. (2013) have also shown that UBP1 protein forms the cytoplasmic stress granules in abiotic stress condition or when heterochromatic TE silencing is released, for instance in ddm1 mutant. As is the case in mammals that TIA-1 inhibits the viral translation, the level of GAG protein encoded in Athila6A was elevated in ddm1 rdr6 ubp1 triple mutants (McCue et al., 2013). This data supports the notion that UBP1 acts on the activated TE mRNAs to suppress their translation and therefore TE-siRNAs are the repressors of the host TE silencing mechanism.

# TRANSPOSON-DERIVED lncRNAs

There is emerging evidence of TE domestication to lncRNAs in both mammalian and plant genomes (**Figure 1B**; Kelley and Rinn, 2012; Kapusta et al., 2013; Johnson and Guigo, 2014; Liu et al., 2015; Wang et al., 2015, 2016; Quattro et al., 2017). LncRNA can be defined as a transcript of at least 200 bp in size but has low protein-coding potential (Liu et al., 2015). It is well-studied that lncRNAs perform various cellular function including the recruitment of the epigenetic regulators to target chromatin or the sequestration of miRNAs (Franco-Zorrilla et al., 2007; Heo and Sung, 2011; Csorba et al., 2014). In the past decade, transcriptomic analyses have dramatically expanded the catalog of the lncRNAs in various tissues and stress conditions of many plant species. Despite the large number of plant lncRNAs identified so far, however, the biological roles are still largely unexplored. In this section, the current status of plant TElncRNA studies will be discussed.

### TE-lncRNA in Stress Response

Since many TEs in plants possess stress-responsive cis-acting elements within them (Paszkowski, 2015), TE-lncRNAs often appear in specific stress conditions (Liu et al., 2012; Quattro et al., 2017; Wang et al., 2017). In a recent report, Wang et al. (2017) interrogated the lncRNAs in Arabidopsis, rice and maize under various abiotic stresses. There was a large discrepancy in TE families that TE-lncRNAs are originated from; RC/Helitron in Arabidopsis, MITEs in rice and LTR retrotransposons in maize were predominantly overrepresented in TE-lncRNAs (Wang et al., 2017). Particularly, TE-lncRNA11195 in Arabidopsis contains an LTR-type retrotransposon and is activated after abiotic stresses or ABA treatment. The deletion of the LTR sequence compromised the ABA responsiveness, suggesting that LTR sequence conferred the stress responsiveness to TE-lncRNA11195 (Wang et al., 2017). TE-lncRNA11195 was tested for its role in stress response using T-DNA insertional mutants. Interestingly, two independent mutant lines showed marked increase in resistance to abscisic acid (ABA) in root elongation and shoot fresh weight (Wang et al., 2017). TE-lncRNAs in tomato were also described to be responsive for both abiotic and biotic stresses (Wang et al., 2015), however, the biological function has not been investigated as yet.

# TE-lncRNA and Development

In mammals, the majority of lncRNAs are derived from TEs and exhibit strong tissue-specific expression pattern (Kelley and Rinn, 2012; Kapusta et al., 2013). Similarly, TE activation in plants is associated with specific development stages potentiating the emergence of tissue-specific TE-lncRNA (Hsieh et al., 2009; Slotkin et al., 2009; Baubec et al., 2014; Cho and Paszkowski, 2017). Recently, Cho and Paszkowski (2017) have investigated the expression pattern of TEs in various rice tissues and identified a retrotransposon-derived transcript called MIKKI which is specifically transcribed in rice roots. MIKKI contains multiple introns and has low coding potential, which is a strong sign of domestication to lncRNA. Intriguingly, the fourth intron of MIKKI is derived from an independent family of retrotransposon and the splicing of this intron generates a binding site for miR171 in the exon–exon junction. Despite the miR171-binding sequence, MIKKI mRNAs are not cleaved by miR171. The miR171-binding site of MIKKI does not perfectly base-pair with its cognate miRNA but has two mismatches at the positions where the cleavage is supposed to occur. It is very well-known that mismatches in the cleavage positions attenuate the cleavage activity of miRNA and is regarded as the signature of miRNA target mimic (Franco-Zorrilla et al., 2007; Yan et al., 2012; Reichel et al., 2015). Indeed, the knock-out mutants of MIKKI that had lost the target mimicking sequence showed higher levels of miR171, while overexpression of MIKKI resulted in the downregulation of miR171. miR171 targets the mRNAs encoding SCARECROW-Like (SCL) transcription factors which are critical regulators of root development (Wang et al., 2010). Therefore, MIKKI evolved from retrotransposons in rice and was positively selected to suppress miR171 in root. This in turn stabilizes the mRNAs of SCLs which are essential in the root development. Similarly, Quattro et al., 2017 also identified multiple TE-derived lncRNAs from Brachypodium genome that are able to interact with miRNAs, however, their target mimicry activities are yet to be confirmed.

# CONCLUDING REMARKS

Transposon-derived RNAs have been underestimated for a long time and their biological function have just started to be unveiled. The fact that TEs are repeated in the genome has significantly hampered the investigation of transposons so far. For example, the short length of the next-generation sequencing reads causes drastic ambiguity and imprecision in mapping the TE reads. Recent advance of the long read sequencing of PacBio (Disdero and Filée, 2017) and Oxford Nanopore (Debladis et al., 2017) is expected to overcome this shortcoming. In addition, due to the multiplicity of TEs in the genome and possible redundancy fpls-09-00600 April 30, 2018 Time: 15:31 # 5

between them, the genetic analyses of TEs have been challenging. CRISPR-mediated mutagenesis has become more efficient and even a large deletion of an entire TE can be made by triggering the double-strand breaks in the flanking regions of the targeted TE (Gao et al., 2016; Ordon et al., 2017). Another option worth to consider is the population genetics approach. There is increasing number of available genome sequences and the number will grow exponentially as the sequencing cost drops. A large scale genome resequencing data analysis performed in the Arabidopsis natural accessions revealed that the TE landscape is very dynamic and the transcriptomic, epigenomic as well as phenotypic variations are attributed to TEs (Quadrana et al., 2016; Stuart et al., 2016). Taken all together, it seems that now is the best time to explore the hidden roles of TEs in plants by applying the new technologies developed recently.

### REFERENCES


### AUTHOR CONTRIBUTIONS

The author confirms being the sole contributor of this work and approved it for publication.

### FUNDING

This work was supported by the European Research Council (EVOBREED) [322621] and the Gatsby Charitable Foundation [AT3273/GLE].

### ACKNOWLEDGMENTS

I thank Dr. Jayne Griffiths for critical reading.


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Cho. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

fpls-09-00600 April 30, 2018 Time: 15:31 # 6

# Cold-Dependent Expression and Alternative Splicing of Arabidopsis Long Non-coding RNAs

Cristiane P. G. Calixto<sup>1</sup> , Nikoleta A. Tzioutziou<sup>1</sup> , Allan B. James<sup>2</sup> , Csaba Hornyik<sup>3</sup> , Wenbin Guo1,4, Runxuan Zhang<sup>4</sup> , Hugh G. Nimmo<sup>2</sup> and John W. S. Brown1,3 \*

<sup>1</sup> Plant Sciences Division, School of Life Sciences, University of Dundee, Dundee, United Kingdom, <sup>2</sup> Institute of Molecular, Cell and Systems Biology, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, United Kingdom, <sup>3</sup> Cell and Molecular Sciences, The James Hutton Institute, Dundee, United Kingdom, <sup>4</sup> Information and Computational Sciences, The James Hutton Institute, Dundee, United Kingdom

### Edited by:

Mathew G. Lewsey, La Trobe University, Australia

### Reviewed by:

Alice Pajoro, Max Planck Institute for Plant Breeding Research, Germany Anthony Gobert, UPR2357 Institut de Biologie Moléculaire des Plantes (IBMP), France Yuichiro Watanabe, The University of Tokyo, Japan

\*Correspondence:

John W. S. Brown j.w.s.brown@dundee.ac.uk

### Specialty section:

This article was submitted to Plant Cell Biology, a section of the journal Frontiers in Plant Science

Received: 23 November 2018 Accepted: 12 February 2019 Published: 28 February 2019

### Citation:

Calixto CPG, Tzioutziou NA, James AB, Hornyik C, Guo W, Zhang R, Nimmo HG and Brown JWS (2019) Cold-Dependent Expression and Alternative Splicing of Arabidopsis Long Non-coding RNAs. Front. Plant Sci. 10:235. doi: 10.3389/fpls.2019.00235 Plants re-program their gene expression when responding to changing environmental conditions. Besides differential gene expression, extensive alternative splicing (AS) of pre-mRNAs and changes in expression of long non-coding RNAs (lncRNAs) are associated with stress responses. RNA-sequencing of a diel time-series of the initial response of Arabidopsis thaliana rosettes to low temperature showed massive and rapid waves of both transcriptional and AS activity in protein-coding genes. We exploited the high diversity of transcript isoforms in AtRTD2 to examine regulation and post-transcriptional regulation of lncRNA gene expression in response to cold stress. We identified 135 lncRNA genes with cold-dependent differential expression (DE) and/or differential alternative splicing (DAS) of lncRNAs including natural antisense RNAs, sORF lncRNAs, and precursors of microRNAs (miRNAs) and trans-acting small-interfering RNAs (tasiRNAs). The high resolution (HR) of the time-series allowed the dynamics of changes in transcription and AS to be determined and identified early and adaptive transcriptional and AS changes in the cold response. Some lncRNA genes were regulated only at the level of AS and using plants grown at different temperatures and a HR time-course of the first 3 h of temperature reduction, we demonstrated that the AS of some lncRNAs is highly sensitive to small temperature changes suggesting tight regulation of expression. In particular, a splicing event in TAS1a which removed an intron that contained the miR173 processing and phased siRNAs generation sites was differentially alternatively spliced in response to cold. The cold-induced reduction of the spliced form of TAS1a and of the tasiRNAs suggests that splicing may enhance production of the siRNAs. Our results identify candidate lncRNAs that may contribute to the regulation of expression that determines the physiological processes essential for acclimation and freezing tolerance.

Keywords: long non-coding RNA, primary microRNA, alternative splicing, diel time-course, high-resolution RNAseq, cold transcriptome

## INTRODUCTION

fpls-10-00235 February 26, 2019 Time: 15:9 # 2

Non-coding RNAs (ncRNAs) are a diverse set of RNAs which do not generally code for proteins. They include families of house-keeping ncRNAs and their precursors such as small nuclear ribonucleoprotein particle RNAs (snRNAs), small Cajal body RNAs (scaRNAs), ribosomal RNAs (rRNAs), transfer RNAs (tRNAs), and small nucleolar RNA (snoRNAs) (Shaw and Brown, 2012; Cech and Steitz, 2014). The regulatory ncRNAs include small RNAs such as microRNAs (miRNAs), short-interfering RNAs (siRNAs), and long non-coding RNAs (lncRNAs) which are expressed from intergenic regions or introns and include natural antisense transcripts (NATs). Some lncRNAs are precursors of small RNA production: primary transcripts are processed to miRNAs or siRNAs such as trans-acting siRNAs (tasiRNAs) or natural antisense siRNAs (nat-siRNAs) derived from double-stranded RNA molecules. The regulatory ncRNAs function in a wide range of cellular processes, from the regulation of transcription and splicing, to chromatin modification, gene inactivation, and translation (Liu et al., 2015). In plants and animals, tens of thousands of lncRNAs are now routinely detected in RNA-seq analyses demonstrating a new level of complexity of gene expression (Liu et al., 2012, 2015; Zhang and Chen, 2013; Yuan et al., 2016; Lu et al., 2017; Wang et al., 2017; Severing et al., 2018; Zhao et al., 2018).

In plants, the majority of lncRNAs are transcribed by RNA polymerase II (Pol II) but some are generated with Pol IV and/or Pol V (Liu et al., 2015). Like mRNAs, they are usually capped at the 5<sup>0</sup> end, can be spliced and they form two classes which are either polyadenylated or non-polyadenylated (Liu et al., 2012; Di et al., 2014; Liu et al., 2015). Genome-wide analyses have identified thousands of plant lncRNAs in different plant species such as Arabidopsis, rice, maize, tomato, wheat, cucumber, poplar, cassava, and cotton and they are shown to be differentially expressed in response to stresses caused by abiotic environmental factors such as cold, heat, drought, and salt (Ben Amor et al., 2009; Liu et al., 2012; Di et al., 2014; Shuai et al., 2014; Zhang et al., 2014; Hao et al., 2015; He et al., 2015; Wang et al., 2015a,b; Chen et al., 2016; Forestan et al., 2016; Khemka et al., 2016; Lu et al., 2016; Yuan et al., 2016; Zhao et al., 2016; Li et al., 2017; Shumayla et al., 2017;Wang et al., 2017; Severing et al., 2018; Zhao et al., 2018). Infection of different plant species by pathogens also leads to differential expression (DE) of lncRNAs: for example, wheat with powdery mildew and stripe rust (Xin et al., 2011; Zhang et al., 2013), Arabidopsis with Fusarium oxysporum (Zhu et al., 2014), Brassica napus with stem rot (Joshi et al., 2016), and tomato with Tomato Yellow Curly Leaf Virus (Wang et al., 2018). The function of the vast majority of lncRNAs is unknown but, in general, they are primarily associated with regulation of gene expression via a range of different mechanisms and ultimately are important in differentiation, development, and abiotic and biotic stress responses (Ben Amor et al., 2009; Liu et al., 2012, 2015; Zhang and Chen, 2013; Wang et al., 2017; Severing et al., 2018; Zhao et al., 2018). The functions of specific lncRNAs illustrate the breadth of regulation of different plant processes and their modes of action (Kim and Sung, 2012; Liu et al., 2012, 2015; Zhang and Chen, 2013; Qin et al., 2017; Wang et al., 2017; Kindgren et al., 2018; Zhao et al., 2018). For example, lncRNAs are involved in vernalization-mediated regulation of flowering (Swiezewski et al., 2009; Ietswaart et al., 2012), circadian regulation of flowering (Henriques et al., 2017), low temperature suppression of flowering (Zhao et al., 2018), lateral root development (Bardou et al., 2014), gametophyte development (Wunderlich et al., 2014), photoperiod-sensitive male sterility (Ding et al., 2012), phosphate homeostasis (Franco-Zorrilla et al., 2007), cold acclimation (Kindgren et al., 2018), and drought (Qin et al., 2017).

Low temperatures negatively affect plant growth and development and, in general, plants from temperate climatic regions can tolerate chilling temperatures (0–15◦C) and increase their freezing tolerance by prior exposure to low, non-freezing temperatures. This process of acclimation involves complex physiological, biochemical, and molecular changes that protect cell integrity during exposure to freezing. The multiple, cold-responsive changes reflect reprogramming of gene expression involving chromatin modification, transcription, post-transcriptional processing, post-translational modification, and protein turnover (Thomashow, 2010; Knight and Knight, 2012; Zhu, 2016). For many years, the major focus of transcriptome reprogramming in Arabidopsis in response to cold has been at the level of transcription and identifying gene targets of specific transcription factors (e.g., C-repeat binding factors – CBFs and the CBF regulon; Thomashow, 2010; Knight and Knight, 2012; Zhu, 2016). High-throughput RNA-sequencing now enables investigation of other aspects of expression control including alternative splicing (AS) and lncRNAs. AS of protein-coding pre-mRNAs is a regulated process and a major level of post-transcriptional regulation of expression; it has been associated with many different stress responses including cold (Leviatan et al., 2013; Staiger and Brown, 2013; Calixto et al., 2016; Hartmann et al., 2016; Laloum et al., 2017; Pajoro et al., 2017). Recently, the scale and dynamics of the contribution of AS to the cold response has been demonstrated by coincident waves of transcription and AS occurring in the first few hours of exposure to cold and the speed and sensitivity of some genes to small temperature changes (Calixto et al., 2018). The importance of AS in both stress responses and development is to increase proteome diversity by generating AS isoforms which encode different functional protein variants. However, AS also can regulate expression of genes by modulating the proportion of protein-coding transcripts vs transcripts which contain premature termination codons (PTCs), many of which are targeted for degradation by non-sense-mediated decay (NMD; Kalyna et al., 2012; Drechsel et al., 2013; Fu and Ares, 2014; Lee and Rio, 2015).

Little is known about the extent of AS of plant lncRNAs per se or how AS can affect the levels or function of lncRNAs. Many lncRNAs contain introns which are spliced during lncRNA biogenesis (Liu et al., 2012) and a number of annotated (TAIR) lncRNAs have splice variants. The bestcharacterized AS of a plant lncRNA is the differential AS and alternative polyadenylation of COOLAIR (antisense to FLOWERING LOCUS C), which determine the levels of FLC (Swiezewski et al., 2009; Marquardt et al., 2014). The lncRNA,

FLORE, is regulated by the circadian clock and promotes flowering by repressing CYCLING DOF FACTOR 5 (CDF5) to allow expression of FT (Henriques et al., 2017). FLORE is differentially expressed and alternatively spliced in response to cold but the function of FLORE AS is unknown. On the other hand, lncRNAs can affect the splicing/AS of target genes by binding and sequestering specific splicing factors (Bardou et al., 2014); reviewed in Romero-Barrios et al. (2018). Primary miRNA transcripts (pri-miRNAs) are a class of lncRNAs where splicing and AS have been well characterized and shown to be important to the production of some mature miRNAs (Brown et al., 2008; Yan et al., 2012; Bielewicz et al., 2013; Barciszewska-Pacak et al., 2015; Knop et al., 2017; Stepien et al., 2017). Recently, in human, deep RNA-seq targeting very low expressed RNAs found that virtually all lncRNAs are alternatively spliced generating massive lncRNA diversity which may be important to the evolution of regulatory modules of expression (Deveson et al., 2018). The intrinsic potential of splicing/AS of lncRNAs to modify their function, and of lncRNAs to influence splicing/AS and expression of target genes through a variety of mechanisms suggests a need to increase our understanding of the interplay between lncRNAs and AS.

We recently performed an ultra-deep RNA-seq analysis of a diel time-series of Arabidopsis plants transferred to cold (Calixto et al., 2018; **Supplementary Figure S1**). The analysis used AtRTD2, a new Arabidopsis transcriptome with greatly increased numbers of transcripts compared to the Arabidopsis TAIR10 and Araport11 databases (Zhang et al., 2017). The greater number and diversity of transcripts allowed the identification of nearly 2500 cold-regulated differentially alternatively spliced protein-coding genes, the majority of which were novel to the cold response. The time-series showed (1) massive peaks of transcription and AS of thousands of protein-coding genes in the first few hours of exposure to cold, (2) defined early and late, transient, and adaptive changes in expression and AS, and (3) demonstrated the speed and sensitivity of AS of particular genes (Calixto et al., 2018). We have interrogated this extensive time-series dataset to examine the transcriptional and post-transcriptional regulation of expression of lncRNAs and the relative contributions and dynamics of transcriptional and AS. We identify cold-responsive lncRNAs in terms of both significant DE and AS.

### MATERIALS AND METHODS

### Plant Material and Growth Conditions

Details of the plant material and growth conditions have been described previously (Calixto et al., 2018). Briefly, Arabidopsis thaliana Col-0 seeds were grown hydroponically for 5 weeks in Microclima environment-controlled cabinets (Snijders Scientific), maintaining 20◦C, 55% relative humidity, and 12 h light:12 h dark (James et al., 2012). Arabidopsis rosettes (9–13) were harvested and pooled at each sampling time forming one biological replicate. Harvesting occurred every 3 h beginning with the last 24 h at 20◦C, and on days 1 and 4 after transfer to 4◦C giving 26 time-points in the time-series (**Supplementary Figure S1A**). Day 1 at 4◦C represents the "transition" from 20 to 4◦C when plants first begin to experience the temperature decrease; day 4 at 4◦C represents "acclimation" where plants have been exposed to 4◦C for 4 days (**Supplementary Figure S1A**). Three biological replicates were generated for each time-point in separate experiments (78 samples in total). The switch to 4◦C from 20◦C was initiated at dusk. In a temperature reduction, the cabinet used here typically takes 2 h to reach 4◦C air temperature. To analyze the speed of change in expression and AS, three biological replicates each consisting of 9–13 of 5-week-old rosettes were harvested just before reducing the temperature (0 min, 20◦C) and after 90 min (5◦C) and 120 and 180 min (4◦C). Tissue was rapidly frozen in liquid N<sup>2</sup> and stored at −80◦C until isolation of RNA and preparation of cDNA.

## RNA Extraction, Library Preparation, and Sequencing

Total RNA was extracted from Arabidopsis tissue using the RNeasy Plant Mini Kit (Qiagen), followed by either on-column DNase treatment (for HR RT-PCR, see below), or the TURBO DNA-freeTM Kit (Ambion) (for library preparation and RT-qPCR, see below). RNA-seq libraries were constructed by following instructions for a TruSeq RNA library preparation (Illumina protocol 15026495 Rev. B). In these preparations, polyA+ selection was used to enrich for mRNA, RNA was fragmented for 8 min at 94◦C, and random hexamers were used for first-strand cDNA synthesis. The libraries had an average insert size of approximately 280 bp and each library was sequenced on Illumina HiSeq 2500 platform generating 100 bp paired-end reads. Residual adaptor sequences at both 5<sup>0</sup> and 3<sup>0</sup> ends were removed from raw reads using cutadapt version 1.4.2<sup>1</sup> with quality score threshold set at 20 and minimum length of the trimmed read kept at 20. The "–paired-output" option was used to keep the two paired read files synchronized and avoid unpaired reads. The sequencing files before and after the trimming were examined using fastQC version 0.10.0.

### Quantification of Transcripts and AS

Arabidopsis transcript expression from the RNA-seq data was carried out using Salmon version 0.82 (Patro et al., 2017) in conjunction with AtRTD2-QUASI augmented by eight genes that were not originally present (Zhang et al., 2017). For indexing, we used the quasi-mapping mode to build an auxiliary k-mer hash over k-mers of length 31 (–type quasi –k 31). For quantification, the option to correct for the sequence specific bias ("–seqBias") was used. The number of bootstraps was set to 30 and all other parameters were on default settings. Transcript expression results are provided in Calixto et al. (2018) and expression profiles in the time-series data are available at https://wyguo.shinyapps.io/ atrtd2\_profile\_app/.

# Differential Gene Expression (DE) and AS (DAS) Analysis of the RNA-seq Data

The pre-processing of the read data and DE and differential AS analyses are described in detail in Calixto et al. (2018). Briefly,

<sup>1</sup>https://pypi.python.org/pypi/cutadapt/1.4.2

transcript and gene read counts were generated from transcripts per million (TPM) data and low expressed transcripts that did not have ≥ 1 counts per million (CPM) in three or more samples were removed. At the gene level, if any transcript passed the expression level filtering step, the gene was included as an expressed gene and then the normalization factor was estimated using the weighted trimmed mean of M values method using edgeR version 3.12.1 (Robinson et al., 2010). Batch effects between biological replicates were estimated using RUVSeq R package version 1.4.0 with the residual RUVr approach (Risso et al., 2014). Normalized read counts in CPM were then log2 transformed and mean-variance trends were estimated and weights of variance adjustments were generated using the voom function in limma version 3.26.9 (Law et al., 2014, 2016; Ritchie et al., 2015). General linear models to determine DE at both gene and transcript levels were established using time and batch effects of biological replicates as factors and 18 contrast groups were set up where corresponding time-points in the day 1 and day 4 at 4◦C blocks were compared to those of the 20◦C block (e.g., block2.T1 vs block1.T1, block2.T2 vs block1.T2; Calixto et al., 2018). Genes were significantly DE at the gene level if they had at least two contrast groups at consecutive time-points with adjusted p < 0.01 and greater than twofold change in expression in each contrast group. Genes with significant DAS had at least two consecutive contrast groups with adjusted p < 0.01 and with these contrast groups having at least one transcript with ≥10% change in expression.

### Identification of lncRNA Genes

For accurate DE and differential AS analyses of lncRNAs, only those genes in AtRTD2 were selected. AtRTD2 is a new Arabidopsis transcriptome containing over 82k unique transcripts and thereby far greater diversity of alternatively spliced isoforms than TAIR10 and Araport11 (Zhang et al., 2017). LncRNA gene lists from Liu et al. (2012), PLncDB:Plant Long non-coding RNA Database, lncRNAs from TAIR, and Araport and antisense lncRNAs from Araport were compared to AtRTD2 giving 379 lncRNA genes.

# Quantitative Reverse Transcription RT-PCR (RT-qPCR)

Real-time RT-PCR was performed essentially as described previously (James et al., 2012). Complementary DNA (cDNA) was synthesized from 2 µg of total RNA using oligo dT primers and SuperScriptII reverse transcriptase (ThermoFisher Scientific). Each reaction (1:100 dilution of cDNA) was performed with Brilliant III SYBR Green QPCR Master Mix (Agilent) on a StepOnePlus (Fisher Scientific-UK Ltd., Loughborough, United Kingdom) real-time PCR system. The average Ct values for IPP2 (AT3G02780) were used as internal control expression levels. The delta–delta Ct algorithm (Livak and Schmittgen, 2001; James et al., 2012) was used to determine relative changes in gene expression. Primers TAS1a-ex1-fwd 5 0 -CTAAGCGGCTAAGCCTGACGTCA-3<sup>0</sup> and TAS1a-ex2-ex1 rev 5<sup>0</sup> -CACCCATTACAAGCCTTTCTATCAGACAAGAC-3<sup>0</sup> targeted spliced TAS1a transcripts where the latter primer bridged the spliced intron (between exonic nucleotides underlined in the primer sequence). Primers amplifying total TAS1a transcripts comprised the aforementioned TAS1a-ex1-fwd primer in combination with primer TAS1a-ex1-rev 5 0 -CAGACAAGACCATGACTCGATCTAAAGGC-3<sup>0</sup> .

# High-Resolution (HR) RT-PCR

High-resolution (HR) RT-PCR reactions were conducted as described previously (Simpson et al., 2008). Gene-specific primer pairs (**Supplementary Table S3**) were used for analyzing the expression and AS of different genes. For each primer pair, the forward primer was labeled with 6-carboxyfluorescein (FAM). cDNA was synthesized from 4 µg of total RNA using the Sprint RT Complete – Double PrePrimed Kit following manufacturer's instructions (Clontech Laboratories, Takara Bio Company, United States). The PCR reaction usually contained 3 µL of diluted cDNA (1:10) as a template, 0.1 µL of each of the forward and reverse primers (100 mM), 2 µL of 10 X PCR Buffer, 0.2 µL of Taq Polymerase (5 U/µL, Roche), 1 µL of 10 mM dNTPs (Invitrogen, Life Technologies), and RNase-free water (Qiagen) up to a final volume of 20 µL. For each reaction, an initial step at 94◦C for 2 min was used followed by 24–26 cycles of (1) denaturation at 94◦C for 15 s, (2) annealing at 50◦C for 30 s, and (3) elongation at 70◦C for either 1 min (for fragments smaller than 1000 bp) or 1.5 min (for fragments between 1000 and 1200 bp) and a final extension cycle of 10 min at 70◦C. To separate the RT-PCR products, 1.5 µL of PCR product was mixed with 8.5 µL of Hi-DiTM formamide (Applied Biosystems) and 0.01 µL of GeneScanTM 500 LIZTM dye or 0.04 µL of GeneScanTM 1200 LIZTM dye size standard and run on a 48-capillary ABI 3730 DNA Analyser (Applied Biosystems, Life Technologies). PCR products were separated to single base-pair resolution and the intensity of fluorescence was measured and used for quantification in relative fluorescent units (RFUs). The different PCR products and their peak levels of expression were calculated using the Genemapper <sup>R</sup> software (Applied Biosystems, Life Technologies).

# Small RNA Extraction and Detection of miRNAs and siRNAs

Total RNA was extracted from rosette material using TRI Reagent reagents (Sigma–Aldrich, United States) according to the manufacturer's instructions. For RNA gel blot hybridization of small RNAs 10 µg of total RNA was separated on a 15% polyacrylamide (19:1) gel with 8 M urea and 1 × MOPS (20 mM MOPS/NaOH, pH7) buffer. RNA markers (Decade RNA markers, Ambion, United States) were end-labeled by [ <sup>32</sup>P]γ-ATP according to the manufacturer's instructions. RNA was blotted onto Hybond-N membrane (Amersham, GE Healthcare, United Kingdom) using a PantherTM Semi-dry Electroblotter, HEP-1 (Thermo Scientific Owl Separation Systems, United States) and cross-linked by N-(3 dimethylaminopropyl)-N 0 -ethylcarbodiimide hydrochloride (EDC, Sigma–Aldrich, United States; Pall et al., 2007). DNA oligos (20 pmol) were end-labeled by [32P]γ-ATP using T4 polynucleotide kinase (NEB, United States) to visualize small RNAs. Hybridization was performed in PerfectHybTM Plus

Hybridization Buffer (Sigma–Aldrich, United States). After overnight incubation at 37◦C, the membrane was washed twice in 2 X SSC and 0.1% SDS for 15 min at 37◦C. After washing, signals were detected by phosphorimager plate visualized by FLA-7000 Fluorescent Image Analyzing System (Fujifilm, United States). The scanned images were quantified by AIDA Image Analyzer software (Fujifilm, United States). The same membrane was re-hybridized with the different probes; it was stripped after each hybridization using 0.1 × SSC, 0.1% SDS at 65◦C for 30 min and the efficiency was checked by overnight exposure. Student's t-test (p < 0.05) was used to identify differentially expressed small RNAs.

### RESULTS

### Identification of Cold-Induced Changes in Expression and Alternative Splicing

To examine changes in gene expression and AS in response to low temperature, we previously performed deep RNA-seq on a diel time-series of 5-week-old Arabidopsis Col-0 rosettes grown at 20◦C and transferred to 4◦C (**Supplementary Figure S1A**; Calixto et al., 2018). Briefly, rosettes were sampled at 3 h intervals for the last day at 20◦C, the first day at 4◦C, and the fourth day at 4 ◦C (**Supplementary Figure S1A**) and each time-point consisted of three biological replicates. Over 360 million paired-end reads were generated for each of the 26 time-points and transcript abundances were quantified using Salmon (Patro et al., 2017) and AtRTD2-QUASI as the reference transcriptome (Zhang et al., 2017). The time-series data was analyzed at both the gene and individual transcript levels to identify genes with significant DE and significant differential alternative splicing (DAS). Briefly, this was achieved by generating read counts data using tximport (Soneson et al., 2015), normalizing data across samples with edgeR (Robinson et al., 2010), transforming using the voom function in limma (Law et al., 2014; Ritchie et al., 2015; Law et al., 2016), and establishing contrast groups in limma (for details see section "Materials and Methods" and Calixto et al., 2018). The experimental design allowed direct comparisons between equivalent time-points at 20◦C and those in day 1 or day 4 at 4◦C. This controlled for any effects of time-of-day variation in expression so that the changes detected were due to reduction in temperature. The statistical criteria for a DE gene were that it must have a log2-fold change ≥ 1 (≥2-fold change) in expression in at least two consecutive contrast groups with an adjusted p-value of <0.01. To detect DAS genes, the consistency of expression changes between the total expression of the gene and individual transcripts was examined using F-tests. For a gene to be significantly differentially alternatively spliced, a log2-fold change value of at least one of the transcripts must differ significantly from the gene log2-fold change value with an adjusted p-value of <0.01, and show a 1 percent spliced (1PS) of ≥0.1 in at least two consecutive contrast groups (for details, see Calixto et al., 2018). Using these stringent criteria, we identified a total of 7302 DE genes and 2442 DAS genes whose expression was significantly differentially expressed or alternatively spliced, respectively. The overlap between DE and DAS genes was 795 genes being both significantly DE and DAS (regulated by both transcription and AS) in response to low temperature (**Supplementary Figure S1B**; Calixto et al., 2018).

### Cold-Induced Expression and AS of lncRNAs

The RNA-seq time-series data was analyzed using AtRTD2, a new transcriptome dataset for Arabidopsis (Zhang et al., 2017). DE and DAS analyses were therefore applied to the 379 lncRNA genes in AtRTD2. In general, for the majority of these lncRNA genes, AtRTD2 contained novel alternatively spliced transcripts or extended some of the shorter/truncated transcript models currently in the TAIR10/Araport11 thereby providing increased diversity of AS isoforms of the lncRNA genes.

The depth of sequencing and the resolution of the RNA-seq time-course here allowed transcript-specific expression profiles of non-protein-coding genes to be analyzed. To examine the effect of low temperature on the expression and AS of lncRNAs, we searched the DE, DE+DAS, and DAS gene lists for the 379 lncRNA genes (see section "Materials and Methods"). Nearly a third of these genes (135) exhibited significant DE and/or AS in response to cold with 89 DE-only, 24 DE+DAS, and 22 DAS-only lncRNAs (**Figure 1A** and **Supplementary Table S1A**). Of these, 82 were NATs, eight encoded small open reading frames (sORFs; Hsu et al., 2016), two coded for tasiRNA precursors, and a third tasiRNA, TAS4, also encoded sORF27 (**Supplementary Table S1B**). The gene descriptions of the DE only, DE+DAS, and DAS only lncRNAs and of the genes overlapping the NAT lncRNAs are given in **Supplementary Tables S1C–H**. The expression profiles of DE lncRNAs showed a range of behaviors including cold-induced increase or decrease of expression, transient changes mainly seen in day 1 at 4◦C, adaptive changes where the

change in expression after transfer to cold persisted throughout the cold treatment, and late expression where changes were mainly observed in day 4 at 4◦C (**Figure 2**). For example, AT2G15128 (DE) showed an increase in expression in day 4 at 4◦C (**Figure 2A**) while AT5G59662 (DE) showed rhythmic expression at 20◦C and expression decreased dramatically within the first 6–12 h after lowering the temperature (**Figure 2B**). AT2G15128 had one of the highest levels of expression increasing

FIGURE 2 | Expression of lncRNA transcripts in response to cold. Transcripts below 1.85 TPM in all time-points are not shown (except in E, where we used a 0.5 TPM cut-off value). Transition to cold is represented by a vertical dashed blue line at time-point 9. (A) The single transcript of AT2G15128 is DE showing a significant upregulation upon long exposure to cold. (B) AT5G59662\_ID1 transcript is DE showing a significant downregulation upon cold and loss of a high amplitude rhythm is also observed. (C) The AT1G68568\_ID2 transcript is DE showing a significant upregulation throughout the cold treatment and a gain of a high amplitude rhythm is also observed. (D) AT1G22403 (DAS-only) undergoes splicing regulation rapidly in the cold and the relative abundance of the AT1G22403.2 transcript is maintained in day 4 at 4◦C (adaptive) whereas the total gene level is not significantly affected by cold. (E) AT2G31751 and (F) AT1G25098 are DE and DAS showing a significant downregulation upon cold. (G) AT1G34418 (DAS-only) undergoes only significant splicing regulation showing a rapid decrease in abundance of AT1G34418.1 and increasing expression during the day. (H) Rhythmic expression of AT1G53233 (not affected by cold). Rapid and significant isoform switches detected by TSIS (Guo et al., 2017) are labeled with a red circle. NAT lncRNA genes (B,C,E,F,H); sORF (G); other RNA (A,D).

from around 200 TPM at 20◦C to 450 TPM in day 4 at 4◦C (**Figure 2A**). The DE gene, AT1G68568, had two transcripts which were expressed at relatively low levels at 20◦C but the AT1G68568\_ID2 transcript increased immediately upon onset of cold and showed rhythmic expression with the maximal level of expression phased to around the middle of the dark period (**Figure 2C**). For DAS genes (which had no significant DE at the gene level), significant changes in AS again showed a variety of expression profiles at the level of transcripts. For example, AT1G22403 had two transcripts which were expressed at similar levels at 20◦C but the AT1G22403.2 transcript increased with cold with a concomitant decrease in the levels of AT1G22403.1 (**Figure 2D**). AT2G31751 is a DE+DAS gene with three main transcripts of which AT2G31751\_c3 is the most abundant at 20◦C peaking around dusk. At 4◦C, its expression is drastically reduced and replaced by AT2G31750\_c2 which becomes the most abundant transcript and has an altered expression pattern, peaking at dawn (**Figure 2E**). Two other lncRNAs (AT1G25098 – DE+DAS and AT1G34418 – DAS-only – encoding the sORF15 lncRNA) had two and three main transcripts, respectively (**Figures 2F,G**). In both cases, the most abundant transcript at 20◦C decreased rapidly after the start of the cold treatment. Some lncRNA expression profiles were clearly rhythmic, peaking at different times of the day. For example, although not differentially expressed or differentially alternatively spliced, AT1G53233 had clear rhythmic profile across cooling with maximal expression shortly after the onset of dark (**Figure 2H**). Interestingly, in some cases, rhythmic expression was either dampened or amplified in the cold (**Figures 2B,C**, respectively).

To investigate possible functions of the NAT lncRNAs, a GO enrichment analysis of the gene descriptions of the potential protein-coding targets of the 84 DE, DE+DAS, and DAS NATs was performed. The only enriched terms were biological process: flavonoid glucuronidation and molecular function: UDP-glycosyltransferase activity, quercetin 7-O-glucosyltransferase and quercetin 3-O-glucosyltransferase activity (FDR cut-off < 0.05). The time-course also allowed the identification of genes which showed the largest and quickest changes in expression or AS (Calixto et al., 2018). Thirteen of the NAT targets were in this group and included three UDP-glycosyltransferase and a flavone-3-hydroxylase (consistent with the GO analysis) and three transcription factors: AT1G69570 – CDF5, AT5G18240 – MYB-RELATED PROTEIN 1 (MYR1), and AT5G15850 – CONSTANS-LIKE 1 (COL1). We also identified AS in three of the sORF-encoding lncRNA transcripts (AT3G344184 – sORF15; AT3G26612 – sORF28, and AT5G24735 – sORF31). Translation of the AS isoforms of all three showed that AS did not affect the presence or integrity of the sORF (not shown).

### Cold-Induced Expression and AS of pri-miRNAs

To investigate the effect of low temperature on the expression of pri-miRNAs, we identified the miRNA host genes that were DE-only, DE+DAS, and DAS-only. Using the miREx miRNA gene list (Bielewicz et al., 2012; Zielezinski et al., 2015), 192 pri-miRNA genes were present in AtRTD2 (**Supplementary Table S2A**). For the majority of these genes, AtRTD2 again contained novel alternatively spliced transcripts increasing the number of AS isoforms or extended some of the shorter/truncated transcript models currently in the TAIR10/Araport11. We detected expression of 68/192 pri-miRNA genes (**Supplementary Table S2B**). Of these, 31 were DE-only, three were DE+DAS, and two genes were DAS-only (**Figure 1B**). As with the lncRNAs, reducing the temperature caused increased or decreased expression at different time-points of the cold treatment. For example, the DE pri-miRNA gene, AT1G73687 (encoding miR159a) showed a concomitant rapid increase in expression and a diurnal waveform with a peak toward the end of the dark period in day 1 and day 4 at 4◦C (**Figure 3A**). In contrast, the highly expressed pri-miRNA AT1G65960 (miR5014a) showed a transient increase in expression; levels doubled during Day 1 at 4◦C but returned to the 20◦C levels by day 4 at 4◦C (**Figure 3B**). The main transcript of the DE pri-miRNA gene AT1G05570 (miR5640) showed rhythmic expression peaking in the night but appeared to lose rhythmicity by day 4 at 4◦C (**Figure 3C**). Although not significantly affected by cold, the transcripts of pri-miRNA AT1G67195 (miR414) illustrate rhythmic expression (**Figure 3D**). Two DE+DAS pri-miRNA genes, AT5G08185 (miR162a) and AT5G21100 (miR1888a), showed reduction in the expression levels of their main transcripts with cold (**Figures 3E,G**). The DAS-only pri-miRNA gene, AT5G52070, encodes miR4245 (**Figure 3F**). At 20◦C, AT5G52070\_P3 is the most abundant transcript but shows a rapid decrease after onset of cold with a concomitant increase in the other two transcripts, AT5G52070\_P1 and \_P2 with isoform switches in the first 3 h of cold (**Figure 3F**). Finally, the DAS-only pri-miRNA gene, AT5G22770 (miR3434), had three main transcripts; AT5G22770\_P1 and AT5G22770\_s3 increased their expression slightly with cold while AT5G22770\_P4 was decreased (**Figure 3H**). Of the 36 pri-miRNA genes with DE and/or differential AS, 13 were protein-coding genes containing an intronic miRNA (**Supplementary Table S2B**).

# Cold-Induced Alternative Splicing of TAS1a

TAS1a is an lncRNA that is regulated only by AS in response to cold (no significant DE at the gene level; **Figure 4B**). TAS1a RNA is initially targeted by miR173, converted to doublestranded RNA and subsequently cleaved into 21 nt phased small-interfering RNAs (phasiRNAs; Allen et al., 2005; Allen and Howell, 2010; **Figure 4A**). TAS1a and TAS2 have both been reported to contain an intron (Vazquez et al., 2004; Yoshikawa et al., 2005). In AtRTD2, we confirmed the presence of the introns in TAS1a and TAS2 and identified an intron in TAS1c which all contained the miRNA binding site and phasiRNAs (**Supplementary Figures S2**–**S5**). TAS1a produces two transcript variants: an unspliced transcript (AT2G27400.1) and a transcript where an intron is removed (AT2G27400\_ID1). The intron contains the entire region of TAS1a which has the miR173 binding site and tasiRNAs (**Figure 4A**). Splicing of the intron was confirmed by RT-PCR using primers in exons 1 and 2

FIGURE 3 | Expression of pri-miRNA transcripts in response to cold. Transcripts below 1.85 TPM in all time-points are not shown (except in H, where we used a 1 TPM cut-off value). Transition to cold is represented by a vertical dashed blue line at time-point 9. (A) pri-miR159a is DE showing a significant upregulation upon cold and a gain of a high amplitude rhythm is also observed. (B) pri-miR5014a, is an intronic miRNA in the GLUTAMATE DECARBOXYLASE 2 gene (AT1G65960); the profile of the pre-mRNA is DE showing a significant upregulation only in the first day of cold treatment. (C) pri-miR5640 is an intronic miRNA in CALLOSE SYNTHASE 1 (AT1G05570) which is DE in the cold, while upon longer exposure to cold (day 4) rhythmicity is lost/dampened. (D) pri-miR414 is rhythmically expressed and not significantly affected by cold. (E) pri-miR162a is DE and DAS showing a significant downregulation throughout the cold treatment. (F) pri-miR4245 is intronic in an AGENET DOMAIN-CONTAINING PROTEIN gene (AT5G52070) which undergoes splicing regulation with no significant expression change at the gene level. (G) pri-miR1888a is an intronic miRNA in an L-ASCORBATE OXIDASE gene (AT5G21100) that is downregulated at the transcriptional level and undergoes differential alternative splicing. (H) pri-miR3434 is an intronic miRNA in an ALPHA-ADAPTIN gene (AT5G22770) which undergoes only splicing regulation. Rapid and significant isoform switches detected by TSIS (Guo et al., 2017) are labeled with a red circle.

(**Supplementary Figure S2A**). In RNA-seq data, both transcripts were expressed at similar levels at 20◦C with higher levels of expression during the night (**Figure 4B**). Upon temperature reduction, there was a rapid decrease of the spliced isoform (AT2G27400\_ID1, **Figure 4B**) in the first 6 h after the start of cold application, while the unspliced AT2G27400.1 increased in

retention – IR) of TAS1a (AT2G27400\_ID1) is sensitive to reductions in temperature of 8◦C. Student's t-tests were performed to compare each temperature reduction results against 20◦C control. (E) 5-week-old Arabidopsis rosettes harvested rapidly after transfer to cold. The temperature was gradually reduced from 20◦C at 0 h to 11◦C at 40 min and eventually 4◦C at 120 min into the cold treatment; rosettes were harvested across the first 3 h of cold at the times shown allowing the measurement of the speed of transcriptional and AS changes due to temperature reduction. (F) The unspliced (IR) transcript of TAS1a (AT2G27400\_ID1) responded rapidly to cold within 90 min, when the temperature reaches 5◦C. RT-qPCR was used to measure relative expression levels for data presented in D and F, see Section "Materials and Methods." Student's t-tests were performed to compare each temperature reduction results against 20◦C control. Significant differences are labeled with asterisks (∗∗p < 0.01; ∗∗∗p < 0.001).

the first 3 h and then showed a decrease over the next 12 h. In day 4 at 4◦C, the divergent levels of the unspliced and spliced isoforms relative to the 20◦C patterns were maintained. The change in splicing ratio (1PS – percent spliced) was > 0.3 such that TAS1a is one of 137 DAS genes with the quickest and largest responses to cold (Calixto et al., 2018). To investigate the sensitivity of AS of TAS1a to temperature reductions, plants were exposed to incremental reductions in temperature involving step-wise drops of between 2 and 16◦C from the starting temperature of 20◦C (**Figure 4C**). Isoform abundances were measured using isoform-specific RT-qPCR (**Figures 4C,D**). We observed that the spliced isoform (AT2G27400\_ID1) decreased rapidly with lower temperatures showing a significant change with only an 8◦C reduction in temperature (20–12◦C; **Figure 4D**). To examine the speed and sensitivity of AS of TAS1a, we used RT-qPCR to measure changes in AS in plants after 20, 40, 60, 90, 120, and 180 min of cold treatment (**Figure 4E**). Over this series of cooling time-points, we recorded the air temperature within the growth chamber such that after 40 and 60 min plants had experienced 11 and 8◦C, respectively, and after 2 h, the temperature reached 4◦C (**Figures 4E,F**). Similar to the previous experiment (**Figure 4B**), the spliced isoform (AT2G27400\_ID1) was sensitive to the gradual reduction in temperature from 20 to 4 ◦C and showed a significant reduction within the first 90 min into the cold when the temperature had reached approximately 5 ◦C (**Figure 4F**). This data indicates that rapid and sensitive temperature-dependent AS plays an important role in regulating the expression behavior of TAS1a as seen previously for some protein-coding genes (Calixto et al., 2018).

To investigate whether the changes in abundance of TAS1a transcripts by AS affects the levels of mature siRNAs, we isolated small RNAs from 5-week-old Arabidopsis rosettes grown at 20◦C in 12 h:12 h dark:light and decreased the temperature at dusk (**Supplementary Figure S1A**). Plants were sampled (three biological replicates) at 0, 90, 120, and 180 min after the temperature was reduced to 4◦C. RNAs were isolated and

separated on denaturing polyacrylamide gels, transferred to membrane, and hybridized with a series of different [32P]-labeled oligonucleotides specific to miR173 and siRNAs: siRNA752, siRNA255, and siRNA477 (**Figures 5A–D**, respectively, and **Supplementary Figure S6**) and U6snRNA as control. The intensity of hybridization signals was quantified using AIDA Image Analyzer software using U6snRNA as a loading control. Over the short cold exposure treatments, there was a significant decrease in the abundance of the miR173 and the siRNAs over the first 2 h (**Figure 5**). Thus, that decreasing temperature causes a reduction in the abundance of siRNAs derived from TAS1a.

### Dynamic Expression and AS of lncRNAs and pri-miRNAs

Besides TAS1a, other lncRNAs and pri-miRNA transcripts also showed rapid changes in expression and AS. As for TAS1a, the speed and sensitivity of such changes was also investigated using RNA from leaf samples collected during the first 3 h of cold treatment (see **Figure 4E**). The RNA-seq expression profile of the two main transcript isoforms of the NAT lncRNA, AT1G34844 (**Figure 6A**), showed that AT1G34844\_ID2 decreased and AT1G34844.1 increased rapidly in the cold with an isoform switch in the first 3 h of cold (**Figure 6C**). In the expanded time-course covering the first 3 h when the temperature decreased gradually to 4◦C, the AT1G34844.1 transcript (introns 2 and 3 retained) and AT1G34844\_ID2 (fully spliced, protein-coding) showed significant increase and decrease, respectively, after only 40 min (11◦C; **Figure 6E**, right panel) in comparison to the constant 20◦C control (**Figure 6E**, left panel). AT3G26612 lncRNA (encoding sORF28) had three AS isoforms (**Figure 6B**) where AT3G26612.1 and AT3G26612\_c1 switched in response to cold (**Figure 6D**). In the 3 h time-course, the two main transcripts of the AT3G26612 lncRNA (encoding sORF28) showed little difference in the 20◦C control (**Figure 6E**, left panel) but significant changes in their abundance with temperature reduction. AT3G26612.1 decreased and AT3G26612\_c1 increased rapidly with the first significant differences being detected at 40 min when the temperature had reached only 11◦C (**Figure 6F**).

# DISCUSSION

Dynamic changes in expression at both the gene and transcript levels occur in response to lowering temperature in Arabidopsis (Calixto et al., 2018). AS makes a major contribution to changes in the transcriptome with over a quarter of genes whose expression changes significantly undergoing AS in the cold. The dynamic contribution of AS was illustrated by the rapid coldinduced wave of AS activity accompanying the transcriptional response and further AS changes throughout the period of cold exposure. The analysis of the RNA-seq cold response time-course at the transcript level identified hundreds of protein-coding genes which were only regulated by AS with no significant changes in expression at the gene level: the majority of these genes were novel cold-responsive genes (Calixto et al., 2018). Here, we have focussed on DE and differential AS of lncRNAs and pri-miRNAs. We identified 135 cold-responsive lncRNAs which were significantly differentially expressed and/or differentially alternatively spliced. Of these, a third involved changes in AS which have not been described previously. In particular, the transcript level expression analysis identified cold-responsive DAS-only lncRNAs which would not be detected by microarrays or gene level RNA-seq analyses. Different lncRNAs showed different responses to cold with, for example, transient or adaptive changes in expression/AS and the AS of some lncRNAs responded extremely rapidly and to small reductions in temperature showing that the AS of these genes is temperaturesensitive. Finally, we identified cold-induced AS of TAS1a and that the processing of siRNAs from the primary transcript may be splicing dependent. Therefore, cold-induced AS occurs in many lncRNAs/pri-miRNAs but there is little knowledge of the molecular mechanisms by which AS modulates levels of lncRNAs or miRNAs or their downstream biological functions.

Pre-mRNA processing such as splicing, AS, and alternative polyadenylation can affect gene expression levels by controlling mRNA export and stability, and the production of different functional variants (particularly at the protein level). The interactions between these typical pre-mRNA processing steps, environmental stress, and lncRNA production have been relatively well studied in Arabidopsis pri-miRNAs (Kruszka et al.,

FIGURE 6 | Rapid cold-induced alternative splicing of lncRNAs. (A) Transcript structures of AT1G34844 isoforms showing the fully spliced transcript (AT1G34844\_ID2), AT1G34844.1 which retains introns 2 and 3, and AT1G34844\_ID4 which retains intron 2. (B) Transcript structures of AT3G26612 isoforms showing the fully spliced transcript (AT3G26612.1), AT3G26612\_c1 which has an alternative 5<sup>0</sup> splice site in exon 3, and AT3G26612\_ID4 which retains intron 3. (C) AT1G34844 is regulated by alternative splicing only. AT1G34844\_ID2 and AT1G34844.1 show a transient decrease and increase, respectively, between 20◦C and day 1 at 4◦C while AT1G34844\_ID4 remains unchanged throughout the experiment. (D) AT3G26612 is regulated by alternative splicing only. AT3G26612.1 significantly decreases in the first 6 h (day 1) and throughout the cold treatment, while the other two transcripts have similar levels of expression, with AT3G26612\_c1 slightly increasing and AT3G26612\_ID4 remaining unchanged. (E,F) High-resolution RT-PCR analysis of splicing ratios at 20◦C (control – left panels) and decreasing to 4◦C (right panels). (E) The fully spliced and I2R&I3R transcripts of AT1G34844 respond rapidly to changes in temperature within 40 min when the temperature reaches 11◦C and onward as the temperature decreases further. (F) The fully spliced and Alt5<sup>0</sup> ssE3 transcripts of AT3G26612 respond rapidly to changes in temperature within 40 min when the temperature reaches 11◦C and onward as the temperature decreases further, while I3R is mostly unresponsive. Student's t-tests were performed to compare each temperature reduction results against 20◦C control. Significant differences are labeled with asterisks (∗∗p < 0.01; ∗∗∗p < 0.001).

2012; Yan et al., 2012; Bielewicz et al., 2013; Szweykowska-Kulinska et al., 2013 ´ ; Barciszewska-Pacak et al., 2016; Knop et al., 2017; Stepien et al., 2017). Plant miRNAs are mostly exonic and encoded by independent transcription units of which >50% contain introns (Szarzynska et al., 2009). Both splicing of introns downstream of the miRNA and the integrity of proximal 5<sup>0</sup> splice sites have been shown to be required for efficient miRNA production (Bielewicz et al., 2013; Knop et al., 2017; Stepien et al., 2017). Mutations in various splicing factor genes including STA1 and the cold-induced GRP7 impact miRNA biogenesis from intron-containing pri-miRNAs (Ben Chaabane et al., 2013; Köster et al., 2014). GRP7, in particular, caused a reduction in the levels of many miRNAs along with accumulation of pri-miRNAs; direct binding of GRP7 was required to inhibit pri-miRNA processing. GRP7 also directly affected the splicing/AS of two pri-miRNAs (Köster et al., 2014). Furthermore, in Arabidopsis, 29 miRNAs are encoded within introns in protein-coding or non-coding host genes (Brown et al., 2008; Yan et al., 2012) and miRNA biogenesis is affected by splicing/AS and alternative polyadenylation via various mechanisms. For example, AS can remove regions of the pre-miRNA affecting its ability to fold correctly and be processed, and conversely, pre-miRNA secondary structure can affect splice site choice (Brown et al., 2008; Barciszewska-Pacak et al., 2016; Stepien et al.,

2017). Production of miR400, an intron-located miRNA, was dependent on splice site choice: an AS event which removed part of the intron and left the miR400 in the mRNA of the host gene was induced by heat and led to an increase of the host gene mRNA containing the miRNA and a decrease in abundance of mature miR400 (Yan et al., 2012). The AS event effectively changed the position of the pre-miRNA from intronic to exonic (Yan et al., 2012; Szweykowska-Kulinska et al., 2013 ´ ) suggesting that miRNA production was splicing-dependent and regulated by temperature. On the other hand, the intronic miRNA, miR402, is also induced by heat and correlates with selection of an intronic alternative polyadenylation site which competes with splicing and the miRNA is processed from the alternatively polyadenylated transcript (Knop et al., 2017). Thus, splicing and AS of pri-miRNAs impact the processing and levels of mature miRNAs and the AS of some pri-miRNAs is affected by cold.

Far less is known about AS in plant lncRNAs. Here, we exploited the increased number of transcript isoforms in AtRTD2 to identify novel AS events in Arabidopsis lncRNAs in the cold. In the analysis of the dynamics of AS in response to cold, we identified genes which showed the largest (1PS > 0.25) and quickest (0–6 h of cold) changes in cold-induced AS (Calixto et al., 2018). One of these genes encoded the tasiRNA lncRNA, TAS1a, which showed a rapid decrease in the level of the spliced isoform which correlated with a reduction in siRNAs. This suggested that splicing was required for the generation of the TAS1a-encoded siRNAs and was consistent with the earlier observation that a T-DNA insertion into the TAS1a intron led to greatly decreased siRNA production (Vazquez et al., 2004). Previously, reduced levels of TAS1a-derived siRNAs were observed in the cold while the levels of TAS1a were unaffected (Kume et al., 2010). Here, we also find reduced levels of the siRNAs in the cold but the transcript level analysis is able to distinguish the underlying AS. TAS1c and TAS2 also have introns that contain the miRNA binding site and siRNAs but while the TAS2 intron is efficiently spliced, TAS1c is spliced at very low frequency (**Supplementary Figures S2**–**S5**). The different organization of the TAS RNAs suggests that splicing may be important for efficient production of siRNAs for some genes but what determines why some genes undergo AS and the balance between splicing and processing are unknown.

The functions of the majority of Arabidopsis lncRNAs and, in particular, those with AS are still to be determined. The levels of TAS1a-derived siRNAs are affected in different environmental conditions and lower levels are detected in salt, drought, and cold stress (Sunkar and Zhu, 2004; Kume et al., 2010). The siRNAs derived from TAS1a have five gene targets of largely unknown function. However, the TAS1a siRNA target genes AT1G51670, AT5G18040, and AT4G29760 are upregulated at 4◦C with the increase in AT1G51670 due to reduced silencing by tasiRNAs (Kume et al., 2010). Two of the targets, AT4G29770 and AT5G18040 (HEAT-INDUCED TAS1 TARGET1 and 2 – HTT1 and HTT2), are upregulated in heat stress and mediate thermotolerance (Li et al., 2014). Plants over-expressing TAS1a produced higher levels of TAS1a siRNAs, reduced expression of HTT1 and HTT2, and showed weaker thermotolerance than wild-type (Li et al., 2014). Here, we show cold-induced reduction in TAS1a-derived siRNAs and significant upregulation of HTT2. Thus, TAS1a-derived siRNAs may modulate expression of genes regulating the response of plants to both increased and decreased temperatures.

Other lncRNAs include natural antisense RNAs which generate dsRNA for silencing their target mRNAs, or lncRNAs which interact with protein splicing factors or chromatin modification factors to affect AS and expression of downstream genes (Romero-Barrios et al., 2018). We have identified many cold-responsive NAT lncRNAs with altered expression/AS and around half of the protein-coding genes that overlap these NATs were also differentially expressed and/or differentially alternatively spliced suggesting that the regulation of at least some protein-coding targets involve NAT lncRNAs. The GO enrichment analysis of the NAT targets of the DE/DAS lncRNAs identified genes involved in biosynthesis of flavonoids, secondary metabolites with a range of functions involved in growth, physiology, detoxification, and possibly acting as scavengers of reactive oxygen species (Pourcel et al., 2007). Some of these genes also had the earliest detectable DE which may reflect the upregulation of many genes involved in tolerance to reactive oxygen species early in response to chilling temperatures (Knight and Knight, 2012). In addition, two transcription factor NAT targets with the largest and most rapid changes in expression were involved in flowering control: CDF5 and MYR1. The lncRNA, FLORE, is a natural antisense RNA which oscillates, has antiphasic expression to its CDF5 mRNA target, and regulates flowering time (Henriques et al., 2017). Interestingly, FLORE is alternatively spliced into four isoform variants but the role of this AS is unknown. MYR1 is a G2-like protein containing an N-terminal MYB-like domain, a central coiled-coil domain, and a C-terminal transactivation domain. MYR1 is a negative regulator of flowering time under low light (Zhao and Beers, 2013). It has conserved alternative 3<sup>0</sup> splice sites events which affect a highly conserved sequence in the coiled coil domain and homo-dimerisation properties or interactions with other transcription factors (Zhao and Beers, 2013). The MYR1 isoforms increase rapidly over the first 6 h of cold treatment and then decrease 12–15 h after onset of cold. The MYR1 NAT (AT5G18245) peaks in the dark at 20◦C and rapidly decreased in the first 3–6 h of cold. Thus, the MYR1 NAT and MYR1 AS transcripts appear to be anti-phasic and the increase of isoforms may produce transcription factor complexes which act to suppress flowering. The regulation of CDF5 and MYR1, both involved in flowering, by NATs, parallels the recent report of the NAT lncRNA, MAS, which is induced by cold and required for activation of MAF4 expression and suppression of precocious flowering (Zhao et al., 2018). Therefore, one function of the changes in expression and/or AS of NAT lncRNAs in response to cold is suppression of flowering. The expression profiles of some Arabidopsis pre-mRNAs and lncRNAs showed altered patterns of rhythmic expression in response to cold (Calixto et al., 2018; **Figure 2**). A third NAT-associated transcription factor with the largest and fastest changes in expression, COL1 is involved in regulation of specific circadian rhythms (Ledger et al., 2001). Over-expression of COL1 shortened the period of by 2–3 h.

Here, both the COL1 NAT and COL1 showed a large transient increase in expression in day 1 at 4◦C with the peak of expression advanced by around 3 h compared to 20◦C. Thus, the increase in expression and altered timing of the NAT correlates with the cold response of COL1 and may contribute to altered circadian control of expression of other genes in response to cold. Finally, although not included in this study, transcription of the lncRNA SVALKA-asCBF1 which overlaps CBF1 suppresses CBF1 expression and impacts cold acclimation (Kindgren et al., 2018).

Plants experience constantly changing temperatures on hourly, daily, and seasonal scales and must have flexible regulatory systems to modulate gene expression. Gene regulation is complex involving the interplay between transcription and various post-transcriptional processes. AS is a major contributor to the changes in expression in the cold response (Calixto et al., 2018). In this paper, we demonstrate cold-induced changes in expression and AS of pri-miRNAs and lncRNAs and that AS of some lncRNA occurs very rapidly and are highly temperature sensitive. It is therefore likely that AS impacts the expression of target genes to contribute to both short term temperature responses and cold acclimation. A key question is the role of AS in regulating the processing, levels, and function of lncRNAs. The recent upsurge of interest in the role of intron retention in regulation of expression in plants and animals may provide possible explanations for splicing/AS of some lncRNAs. Intron retention can cause transcripts to be retained in the nucleus to wait for a splicing signal or be degraded while splicing can activate and enhance export of the transcripts to the cytoplasm where they can interact with target mRNAs to affect translational efficiency or be degraded (reviewed by Jacob and Smith, 2017). Thus, splicing/AS of lncRNAs may be a mechanism for regulating the levels of lncRNAs in the nucleus or cytoplasm or the stability of the lncRNAs which will impact the expression of target or downstream genes. Rapid changes in splicing factor activity, levels, or nuclear localization in response to cold can affect the efficiency of splicing/AS and determine whether different transcript isoforms are in the nucleus or cytoplasm affecting their processing or translation (e.g., sORF lncRNAs). Such mechanisms will require a thorough analysis of the transcript diversity of the thousands of plant lncRNAs to allow further studies on how the structure and

### REFERENCES


processing pathways of different types of lncRNAs influence their localization, stability, and function.

### DATA AVAILABILITY

Publicly available datasets were analyzed in this study. This data can be found here: https://www.ebi.ac.uk/ena/data/search? query=prjeb19974.

# AUTHOR CONTRIBUTIONS

JB, HN, and RZ obtained the funding support. JB and CC conceived and designed the experiments. CC, AJ, and NT collected and prepared samples. CC, AJ, NT, and CH performed the experiments. JB, CC, NT, AJ, and CH interpreted the main findings. CC and JB drafted the manuscript. All authors engaged in discussions during the project and revised and approved the final manuscript.

# FUNDING

This work was supported by funding from the Biotechnology and Biological Sciences Research Council (BBSRC) (BB/K006568/1, BB/P009751/1, BB/N022807/1 to JB; BB/K006835/1 to HN) and the Scottish Government Rural and Environment Science and Analytical Services division (RESAS) (to JB and RZ).

# ACKNOWLEDGMENTS

We thank Janet Laird (University of Glasgow) for technical assistance. We wish to apologize to all the authors whose relevant work was not cited in this article due to space limitations.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2019.00235/ full#supplementary-material

range of abiotic stress responses. Front. Plant Sci. 6:410. doi: 10.3389/fpls.2015. 00410


pri-miRNA expression data. Nucleic Acids Res. 40, D191–197. doi: 10.1093/nar/ gkr878




**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Calixto, Tzioutziou, James, Hornyik, Guo, Zhang, Nimmo and Brown. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Nuclear Speckle RNA Binding Proteins Remodel Alternative Splicing and the Non-coding Arabidopsis Transcriptome to Regulate a Cross-Talk Between Auxin and Immune Responses

Jérémie Bazin<sup>1</sup> \*, Natali Romero<sup>1</sup> , Richard Rigo<sup>1</sup> , Celine Charon<sup>1</sup> , Thomas Blein<sup>1</sup> , Federico Ariel1,2 and Martin Crespi<sup>1</sup> \*

<sup>1</sup> CNRS, INRA, Institute of Plant Sciences Paris-Saclay IPS2, Univ Paris Sud, Univ Evry, Univ Paris-Diderot, Sorbonne Paris-Cite, Universite Paris-Saclay, Orsay, France, <sup>2</sup> Instituto de Agrobiotecnolog*ı*a del Litoral, CONICET, Universidad Nacional del Litoral, Santa Fe, Argentina

### Edited by:

Dorothee Staiger, Bielefeld University, Germany

### Reviewed by:

Rossana Henriques, University College Cork, Ireland Heike Lange, UPR2357 Institut de Biologie Moléculaire des Plantes (IBMP), CNRS, France

### \*Correspondence:

Jérémie Bazin jeremie.bazin@free.fr Martin Crespi martin.crespi@u-psud.fr

### Specialty section:

This article was submitted to Plant Cell Biology, a section of the journal Frontiers in Plant Science

Received: 16 May 2018 Accepted: 27 July 2018 Published: 21 August 2018

### Citation:

Bazin J, Romero N, Rigo R, Charon C, Blein T, Ariel F and Crespi M (2018) Nuclear Speckle RNA Binding Proteins Remodel Alternative Splicing and the Non-coding Arabidopsis Transcriptome to Regulate a Cross-Talk Between Auxin and Immune Responses. Front. Plant Sci. 9:1209. doi: 10.3389/fpls.2018.01209 Nuclear speckle RNA binding proteins (NSRs) act as regulators of alternative splicing (AS) and auxin-regulated developmental processes such as lateral root formation in Arabidopsis thaliana. These proteins were shown to interact with specific alternatively spliced mRNA targets and at least with one structured lncRNA, named Alternative Splicing Competitor RNA. Here, we used genome-wide analysis of RNAseq to monitor the NSR global role on multiple tiers of gene expression, including RNA processing and AS. NSRs affect AS of 100s of genes as well as the abundance of lncRNAs particularly in response to auxin. Among them, the FPA floral regulator displayed alternative polyadenylation and differential expression of antisense COOLAIR lncRNAs in nsra/b mutants. This may explains the early flowering phenotype observed in nsra and nsra/b mutants. GO enrichment analysis of affected lines revealed a novel link of NSRs with the immune response pathway. A RIP-seq approach on an NSRa fusion protein in mutant background identified that lncRNAs are privileged direct targets of NSRs in addition to specific AS mRNAs. The interplay of lncRNAs and AS mRNAs in NSR-containing complexes may control the crosstalk between auxin and the immune response pathway.

Keywords: RNA binding proteins, RNP complexes, alternative splicing, immune response, auxin

# INTRODUCTION

RNA binding proteins (RBPs) have been shown to affect all steps of post-transcriptional gene expression control, including alternative splicing (AS), silencing, RNA decay, and translational control (Bailey-Serres et al., 2009). The Arabidopsis thaliana genome encodes for more than 200 proteins predicted to bind RNAs. The picture becomes even more complex since over 500 proteins were found to bind polyA+ RNA in a recent study attempting to define the RNA interactome using affinity capture and proteomics (Marondedze et al., 2016). However, only a small subset of RBPs has been functionally assigned in plants. The versatility of RBPs on gene expression regulation has been recently highlighted by the identification of several among them acting at multiple steps of post-transcriptional gene regulation (Lee and Kang, 2016; Oliveira et al., 2017). During mRNA

maturation, the transcript acquires a complex of proteins at each exon–exon junction during pre-mRNA splicing that influences the subsequent steps of mRNA translation and decay (Maquat, 2004). Although all RBPs bind RNA, they exhibit different RNAsequence specificities and affinities. As a result, cells are able to generate diverse ribonucleoprotein complexes (RNPs) whose composition is unique to each mRNA and these complexes are further remodeled during the life of the mRNA in order to determine its fate. One approach to determine RBP function consisted in the identification of all interacting molecules (the socalled RNPome) of a specific RNP and the conditions of their association. The ribonucleoprotein immunopurification assay facilitates the identification and quantitative comparison of RNA association to specific proteins under different experimental conditions. This approach has been successfully used to elucidate the genome-wide role of a number of plant RBPs involved in pre-mRNA splicing, stress granule formation or translational control (Sorenson and Bailey-Serres, 2014; Gagliardi and Matarazzo, 2016; Foley et al., 2017; Köster and Meyer, 2018).

The nuclear speckle RNA binding proteins (NSRs) are a family of RBPs that act as regulators of AS and auxin regulated developmental processes such as lateral root formation in Arabidopsis thaliana. These proteins were shown to interact with some of their alternatively spliced mRNA targets and at least with one structured lncRNA, named Alternative Splicing Competitor RNA (ASCO) (Bardou et al., 2014). Overexpression of ASCO was shown to affect AS of a subset of mRNA regulated by NSRs, similar to nsra/b double mutants, and ASCO was also shown to compete in vitro with the binding of one AS mRNA target. This study suggested that plant lncRNAs are able to modulate AS of mRNA by hijacking RBPs, such as NSRs, involved in splicing (Romero-Barrios et al., 2018). In addition, transcriptome analysis using microarrays and specific AS analysis on a subset of mRNAs suggested a role of NSR in transcriptome remodeling in response to auxin (Bardou et al., 2014).

Here we used genome wide analysis to monitor the NSR global role on multiple tiers of gene expression, including RNA processing and AS. This allowed us to find a new role of NSR in the control of flowering time regulators as well as to suggest that NSRs control the crosstalk between auxin and the immune response pathway.

### RESULTS

### Auxin Regulation of Gene Expression Is Altered in nsra/b Double Mutant

To characterize the role of NSRs in the control of auxin regulated gene expression, we performed paired-end strand specific RNA sequencing on the nsra/nsrb (nsra/b) double mutant and wild type (Col-0) seedlings treated for 24 h with the synthetic auxin NAA (100 nM) or a mock solution (Bardou et al., 2014; Tran et al., 2016) (**Figure 1A**).

In mock treated samples, 63 and 41 genes were found to be differentially up and down-regulated between mutant and wild type seedlings (**Supplementary Table S1B**). Remarkably, in response to auxin, we identified 709 and 465 genes significantly up and down-regulated in nsra/b, compared to wild type (**Figure 1B** and **Supplementary Table S1B**). Principal component analysis (PCA) showed a dispersion of the data compatible with statistical comparisons between groups (**Supplementary Figure S1**). Multifactor analysis of differential gene expression further showed that nsra/b mutation has a major effect on auxin-regulated gene expression. Indeed, a set of 951 genes showed significant interaction between genotype and auxin regulation (**Figure 1C** and **Supplementary Table S1B**). This is in agreement with our previous findings indicating that NSRs mediate auxin regulation of gene expression (Bardou et al., 2014).

We have previously shown that NSRs modulate auxin-induced AS of a particular subset of genes using specific qRT-PCR assays (Bardou et al., 2014). We use now our RNA-seq dataset to characterize genome-wide effects of NSRs on AS and more generally on RNA processing (**Figure 1A**). To this end, we made use of the RNAprof software, which implements a gene-level normalization procedure and can compare RNA-seq read distributions on transcriptional units to detect significant profile differences. This approach allows de novo identification of RNA processing events independently of any gene feature or annotation independently of gene expression differences (Tran et al., 2016). RNAprof results were parsed to retain only highly significant differential RNA processing events (p.adj < 10e-4) and further crossed with gene annotation in order to classify them according to their gene features. The majority of events overlapped with intronic regions (**Figure 1D** and **Supplementary Table S1C**), which is in accordance with data showing that intron retention is the major event of AS in plants (Ner-Gaon et al., 2004). The effect of nsra/b on RNA processing and splicing is enhanced in response to NAA. In other words the vast majority of differential events between nsra/b and wild type plants were identified essentially in presence of auxin.

To further support the results from RNAprof and to gain knowledge on the functional consequences of NSR mediated AS events, we quantified mRNA transcript isoforms of the AtRTD2 database (Brown et al., 2017) using kallisto (Bray et al., 2016). Then, we searched for marked changes in isoform usage using IsoformSwitchAnalyzeR package (Vitting-Seerup and Sandelin, 2017), which allows statistical detection and visualization and prediction of functional consequences of isoform switching events. As a result, we identified 118 NSR-dependent isoform switching events including 108 only detected in NAA-treated samples (**Figure 1E** and **Supplementary Table S1D**). Comparison of gene sets affected in their steady state abundance, containing differential RNA processing or isoforms switching events in nsra/b highlighted the fact that most differentially spliced genes are not differentially expressed. In addition, over 35% of genes predicted with isoforms switching events were also found using RNAprof (**Figure 1E**).

# NSRs Affect the Abundance of Numerous LncRNAs

The activity of NSR proteins on AS is modulated by the lncRNA ASCO and the abundance of ASCO RNA is increased in nsra/b mutant (Bardou et al., 2014). Therefore, we conducted a global

analysis of lncRNAs detection and expression in our RNA-seq datasets. Annotated lncRNA (Araport11) were combined with de novo predicted transcripts and further classified based on their location in intergenic and antisense regions of coding genes (**Figure 2A**). More than 2440 lncRNAs were detected in our RNAseq data with more than 1 TPM (**Supplementary Table S1A**) in at least three samples. In mock conditions, differential expression analysis served to identify five antisense and four

intergenic lncRNAs differentially expressed between mutant and wild type seedlings, whereas 31 intergenic and 23 antisense lncRNAs were found to be differentially regulated between mutant and wild type in the presence of auxin (**Figure 2B**). Differentially expressed lncRNAs included a number of wellcharacterized lncRNAs such as APOLO, which as been shown to influence root gravitropism in response to auxin via its action on PINOID protein kinase expression dynamics. In addition, the expression of lncRNA ASCO, shown to interact with NSR to modulate AS of its mRNA targets, was also affected in in nsra/b suggesting a feedback regulation of NSR on ASCO lncRNA (**Figure 2B**).

## NSRa Is Involved in the Control Flowering Time Through the Modulation of the COOLAIR/FLC Module

Interestingly, we also identified the lncRNA COOLAIR as down regulated in nsra/b, both in mock or NAA treated samples (**Figure 2B**). COOLAIR designate a set of transcripts expressed in antisense orientation of the locus encoding the floral repressor FLC (Whittaker and Dean, 2017). Two main classes of COOLAIR lncRNAs are produced by AS and polyadenylaton of antisense transcripts generated from the FLC locus. One uses a proximal splice site and a polyadenylation site located in intron 6 of FLC, whereas the distal one results from the use of a distal splice and polyadenylation sites located in the FLC promoter (reviewed in Whittaker and Dean, 2017) (**Figure 3A**).

Strikingly, FLC is one of most deregulated genes in nsra/b mutants in control and NAA-treated samples. Notably, it was shown that a number of splicing and RNA processing factors control FLC expression by modulating the ratio of COOLAIR proximal and distal variants (Liu et al., 2009; Marquardt et al., 2014; Whittaker and Dean, 2017). Therefore, we determined the abundance and the ratio of COOLAIR variants in wild type, single nsra, nsrb and the double nsra/b mutants in control and NAA treated conditions using a dedicated strand-specific RT-qPCR assay (Marquardt et al., 2014). First, we confirmed that total COOLAIR and FLC abundance was decreased in nsra and nsra/b but not nsrb (**Figures 3B,C**). More importantly, we found that relative usage of the short (proximal) variant of COOLAIR increased by twofold in nsra and nsra/b but not in

COOLAIR isoforms are shown including positions of primers (arrows) used to measure distal (blue arrows) and proximal (red arrows) and total (black arrows) COOLAIR variant abundance. Black rectangles and black lines denote exons and introns, respectively. (B) COOLAIR and (C) FLC abundance measured by RT-qPCR in nsra, nsrb, nsra/b and Col-0 in seedlings. (D) Proximal and (E) distal variant usage normalized to the total amount of COOLAIR. (F) Distal vs. proximal variant usage ratio. Data represent the mean of three biological replicates ± standard error. Results were analyzed by one-way analysis of variance (ANOVA) followed by Tukey's post-hoc test: groups with different letters are statistically different (p ≤ 0.05) and groups with the same letters are statistically equal (p ≤ 0.05). Significance was determined using an ANOVA coupled with a Tukey pairwise test (p-value < 0.05).

nsrb leading to an increase of the ratio of distal vs. proximal COOLAIR isoforms in the same genotypes (**Figures 3D–F**). When analyzing the relative abundance of both variants against a housekeeping gene, we determined the decrease of total COOLAIR transcripts associated with a specific decrease of the distal variants. In contrast, proximal variant abundance remains stable (**Supplementary Figure S2**), leading to a change in relative variant usage (**Figures 3D,E**). Interestingly, the proximal COOLAIR variant was associated with a down-regulation of FLC and an early flowering phenotype (Marquardt et al., 2014). Together, these results suggest that the modulation of COOLAIR polyadenylation and/or splicing in nsra mutants contributes to the control of FLC expression. In addition, RNAprof also identified that the mRNA coding for the FPA protein (Hornyik et al., 2010) was differentially processed in nsra/b seedling treated with NAA (**Figure 4A**). The differential RNA processing event occurred at the end of intron 1, which has been shown to contain an alternative polyadenylation site necessary for FPA negative autoregulation (Hornyik et al., 2010). RNAprof analysis hinted a significant reduction of the short FPA variant in nsra/b mutant compared to Col-0 (**Figure 4A**). RT-qPCR analysis using isoform specific primers (**Figure 4A**) showed that the long isoform accumulated in nsra and nsra/b but not in nsrb whereas the short isoform remained unaffected (**Figure 4B**). Hence, our data suggested that the use of the proximal polyA site is reduced in nsra and nsra/b mutant, which is predicted to lead to an increase of the full-length functional FPA. Interestingly, FPA was shown to favor proximal COOLAIR variants forms (Hornyik et al., 2010), suggesting that the effect of NSR mutation on COOLAIR variant ratio may be mediated by changes in FPA polyadenylation site usage. To address this potential mechanism, we checked whether COOLAIR or FPA are direct targets of

nsra/b (blue). Significant differential events are delimited by green lines and labeled with their p-value (p) The Y-axis show the normalized RNA-seq coverage from RNAprof. Section between two purple lines with p-values indicated denote significant differences between nucleotide based coverage. Orange and blue traces correspond triplicate samples of Col-0 and nsra/b treated with a mock solution, respectively. The X-axis represents gene coordinates (boxes and lines representing exons and introns, respectively). Positions of polyadenylation sites identified in Hornyik et al. (2010) are shown on the gene model as well as the two transcript variants deriving. Positions of primer pairs used to amplify the short and long FPA variant are indicated as black and with arrows (respectively). (B) Isoforms specific RT-qPCR analysis of short and long FPA variant and their abundance ratio in nsra, nsrb, and nsra/b. Depicted data is the mean of fold change compared to Col-0 ± standard deviation of three biological replicates. Significance is was determined according to a Student's t-test (∗p < 0.05; ∗∗p < 0.01). (C) RIP assays using ProNSRa::NSRa::HA (NSRa), Col-0 (w/o: without tag) plants on total cell lysates of 10-day-old seedlings treated with 10 mM NAA for 24 h. Results of RT-qPCR are expressed as mean of the percentage of the respective INPUT signal (total signal before RIP) from three independent replicates ± standard error. Genes analyzed are a housekeeping gene (At1g13320) named here REF and FPA (AT2G43410) short and long isoforms.

NSRa by RNA immunoprecipitation (RIP) using transgenic lines expressing a tagged version of the NSRa protein. Although we did not find COOLAIR binding to NSR, both the long and the short FPA variant were enriched in the RIP assay supporting the idea that NSRa directly influences the processing of FPA mRNA (**Figure 4C**). Given the critical role of FPA, COOLAIR, and FLC in flowering, we hypothesized that NSRa may be involved in the control of flowering time. Indeed, we observed that nsra/b mutant displays an early flowering phenotype (**Figure 5A**). We then quantified this phenotype by counting the number of rosette leaves when the flower stem emerged from the plants. Data showed that nsra and nrsa/b but not nsrb display an early flowering phenotype (**Figure 5B**), which is consistent with a lower expression of FLC in nsra and nsra/b mutants only (**Figure 4C**). Altogether, our results indicate that NSRa-dependent modulation of FPA polyadenylation may impacts the activity of the COOLAIR/FLC module, affecting flowering time in Arabidopsis.

### NSRs Affect Auxin-Dependent Expression of Biotic Stress Response Genes

To extend our understanding on the genome-wide roles of NSRs in the control of auxin-dependent gene expression, we searched the putative function of differentially expressed and/or spliced gene groups using clustering and Gene Ontology (GO) enrichment analyses. Hierarchical clustering of differentially expressed genes determine two clusters of genes showing opposite expression patterns in response to NAA in nsra/b as compared to wild type plants (**Figure 6A**). GO analyses revealed that cluster 2 (**Figure 6B**), e.g., genes up-regulated by NAA in wild type plants but down-regulated by NAA in nsra/b is significantly enriched for genes belonging to GO categories such as "response to hormone" (FDR < 1e-6); "response to water deprivation" (FDR < 5e-9). On the other hand cluster 3 genes (**Figure 6C**), e.g., down-regulated or not affected by NAA in wild type but up-regulated in the mutant are highly significantly enriched for GO categories related to pathogen responses such as "response to biotic stimulus" (FDR < 5e-16); "response to chitin" (FDR < 1e-26). We then confirmed the results of RNA-seq datasets (**Figure 7**) by RT-qPCR analysis of a small subset of genes belonging to clusters 2 and 3.

Given the important effect of NSRs on AS regulation, we also examined the putative function of differentially spliced genes having a switch in isoform usage. Strikingly, we identified a number of AS proteins located upstream of the immune response pathway. They include the MKP2 phosphatase (Lumbreras et al., 2010), the Toll/interleukin receptor (TIR) domain-containing protein TN1 and three members of the jasmonate co-receptor family (JAZ7, JAZ6, and JAZ2). In agreement, GO enrichment analysis of genes predicted to have significant isoforms switching events between nsra/b and Col-0 revealed a strong enrichment toward biological functions related to biotic stress responses (**Figure 6D**).

### NSRa Directly Recognizes Transcripts Involved in Biotic Stress Responses

To address the question whether these targets are directly related to NSR function and/or indirectly affected by other proteins, we aimed to identify direct targets of NSRs using a genome-wide RIP-seq approach. We focused our analysis on NSRa as it is globally more highly expressed than NSRb (Bardou et al., 2014). Transgenic lines expressing an epitope tagged version of NSRa under its native promoter in the nsra mutant genetic background were used to avoid interference with the endogenous version of NSRa. Ten days-old seedlings treated for 24 h with NAA were used to match the transcriptome analysis. Immunoprecipitation was performed on UV crosslinked tissue using HA antibodies and mouse IgG as negative control (**Figure 8A**). NSRa-HA was detected from the input sample as well as from the eluate of the immunoprecipitation when it was performed with an HA antibody but not when mouse IgG were used (**Figure 8B**) qRT-PCR analysis of previously identified targets and a randomly selected abundant housekeeping gene confirmed the specific enrichment of target genes in the RIP sample compared to the input (**Figure 8C**). In addition, RNA extracted from mock IP eluate did not give

( ∗∗p-value < 0.01; ∗∗∗p-value < 0.001).

and gene with significant isoforms switching events (D). Each circle represents a significant GO category but only group with the highest significance are labeled. Related GOs have similar (x, y) coordinates.

detectable amount of RNA supporting the specificity of this assay. Total RNA-seq libraries were prepared in duplicate from input, RIP and Mock samples. PCA and correlation analysis showed a dispersion of the data compatible with statistical comparisons between groups (**Supplementary Figure S3**). To detect putative NSRa targets, we used a multi-factor differential expression analysis using DEseq2 in order to identify transcripts significantly enriched in RIP as compared to the input (FDR < 0.01; log2FC > 2) that were depleted from Mock samples. After filtering out all transcripts with less than two TPM in RIP libraries, we finally identified 342 putative targets of NSRa (**Figure 9A**).

Comparing this list of genes with those differentially expressed in nsra/b in mock or NAA treated seedling, we found that 33% of putative target genes were also deregulated in nsra/b (**Figure 9B**). Further examination of putative targets genes revealed that the large majority of these genes are up-regulated in nsra/b suggesting that NSRs are negatively controlling their transcript abundance

in vivo (**Figure 9D**). GO enrichment analysis revealed that putative NSRa targets (**Figure 9E**) are enriched for genes involved in biological processes associated with defense responses such as "response to chitin" (FDR < 1.76e-9), "response to wounding" (FDR < 2.6e-3) or "immune system processes" (FDR < 1.7e-3). Interestingly, NSR target genes were also enriched for the GO category "regulation of transcription, DNA-templated" (FDR < 1.6e-8). Further examination of targets genes belonging to this GO category revealed that 56 transcription factors (TFs) are likely to be direct targets of NSRa (**Supplementary Table S1E**). Among them, we found the mRNA encoding the MYC2 TF, a key regulator of immune responses (Kazan and Manners, 2013) as well as nine WRKY and seven ERF TF transcripts, which both classes have been associated with the regulation of the plant immune response (Pandey and Somssich, 2009; Huang et al., 2016). Ten putative target genes were selected for RT-qPCR validation of the RIP assay. Among them, seven showed a significant enrichment over the input samples (**Figure 9C**) further supporting the genome-wide approach of NSRa target identification. Together, these results suggest that direct recognition of a subset of defense response genes by NSRa may affect their steady state abundance during auxin response.

### LncRNAs Are Overrepresented Among NSRa Targets

It was previously demonstrated that a direct interaction between NSR and the lncRNA ASCO is able to modulate NSR function (Bardou et al., 2014). Thus, we thoroughly analyzed global lncRNA abundance in RIP-seq datasets. Interestingly, lncRNAs appeared among the most highly enriched transcripts within the putative targets of NSRa. We found that, out of the 342 putative NSRa targets, 53 were lncRNA including 20 and 33 intergenic and antisense lncRNA, respectively (**Figure 10A**). In fact, relatively to the total number of lncRNAs detected in the input, lncRNA were significantly enriched over mRNA in the set of putative targets transcripts (hypergeometric test: 1.9 fold, p.value < 4.06e-4) (**Figure 10B**).

We further validated the NSR-lncRNA interaction by RIP-qPCR. We found four out of five lncRNA enriched over the input RNA in NSRa RIP samples (**Figure 10C**). Analyses of target lncRNA expression in nsra/b revealed that, similarly to the behavior of ASCO, seven target lncRNA are significantly upregulated in the nsra/b mutant (**Figures 10D,E**). Together, these results suggest that lncRNAs are overrepresented among targets of NSRa and that NSRs might control the accumulation of lncRNA in vivo. Future works on the interplay between lncRNA and mRNAs in NSR-containing complexes should shed light on their global impact over the transcriptome.

### DISCUSSION

In agreement with our previous study based on microarrays, a novel thorough analysis of nsra/b transcriptome using RNA-seq has revealed an important role of these RBPs in the control of auxin-responsive genes. A previous study monitoring AS changes of a subset of 288 genes using high-resolution real-time PCR, first uncovered the important roles of NSR in auxin-driven AS changes and targeted RIP-qPCR showed that both NSR proteins were able to bind AS mRNA targets in planta (Bardou et al., 2014). Our global AS analysis further confirmed this function of NSRs on AS modulation and demonstrated the impact of these proteins at genome-wide level. However, our RIP-seq global analysis of NSR targets did not show a strong enrichment toward AS modulated transcripts. Instead, a large fraction of NSR targets were transcriptionally upregulated in nsra/b, suggesting that NSR may play a direct role in controlling their stability or transcription. Several splicing factors have been shown to affect transcription by interacting with the transcriptional machinery and to modulate Pol II elongation rates (Kornblihtt et al., 2004). In addition, specific RBPs deposited during pre-mRNA splicing at exon–exon splicing junctions, can influence their mRNA decay (Lumbreras et al., 2010; Nishtala et al., 2016). Further dissection of the NSR recognition sites on mRNAs may support a role of NSRs on mRNA decay.

The combination of our RNA-seq and RIP-seq approaches revealed that lncRNAs are privileged targets of NSRa and that a significant fraction of the auxin-responsive non-coding transcriptome is deregulated in the nsra/b genetic background. This is in accordance with our previous results showing that the specific interaction of NSR with the ASCO lncRNA is able to modify AS pattern of a subset of NSR-target genes. Our study suggests that NSRs may play a broader role in lncRNA biology. In particular, we found that a large majority of lncRNA targeted by NSRa are upregulated in nsra/b, suggesting a new role of these proteins in the control of lncRNA transcription and/or stability. So far, very little is known about lncRNA biogenesis,

especially in plants. Other RBPs have been shown to affect lncRNA abundance. For instance several members of the cap binding complex such as CBP20, CBP80, and SERRATE have been shown to co-regulate the abundance of a large subset of lncRNAs in Arabidopsis seedlings (Liu et al., 2012). Interestingly, these three proteins, like NSRs, have also been associated with major roles in the control of AS patterns (Raczynska et al., 2010, 2014). This suggests that the splicing machinery might be used to control lncRNAs abundance in the nucleus and that the interplay between lncRNA and mRNAs may be an emerging mechanism in splicing regulation. Further genetic dissection is required to determine whether NSRs are involved in the same pathway that CBP20, CPB80, and SERRATE.

The strong deregulation of the FLC/COOLAIR module in nsra/b led us to identify a new role of NSRa in the control of flowering time. A number of forward genetic screenings aiming to identify new genes controlling flowering time through FLC expression modulation have consistently identified RNA processing and splicing factors that promote formation of the short COOLAIR isoforms, such as FCA, FPA, HLP1, GRP7 and the core spliceosome component PRP8a (Deng and Cao, 2017). Loss of function mutants of these factors lead to a reduced usage of COOLAIR proximal polyadenylation site and an increase of FLC transcription which is associated with late flowering phenotypes (Deng and Cao, 2017). Interestingly, our analysis of the FLC/COOLAIR module in nsr mutants revealed an opposite role of NSRa in COOLAIR polyadenylation site usage, leading to the increased use of COOLAIR proximal polyadenylation site, and reduced FLC levels associated with an early flowering phenotype.

We also identified a new role of NSRs in the regulation of auxin-mediated expression and AS of transcripts related to biotic stress response. Interestingly, it has been shown for several years that natural (i.e., IAA) and synthetic (i.e., NAA) auxins can promote pathogen virulence of P. syringae (Mutka et al., 2013). More recently, a conserved pathway of auxin biosynthesis was demonstrated in Pseudomonads as contributing to pathogen virulence in Arabidopsis thaliana (McClerklin et al., 2018). However, little is known on the specific plant factors that modulate immune responses upon endogenous or pathogen produced auxins. Our work shows that NSRs do not affect the global auxin responses but rather have an impact on the abundance of mRNAs coding for proteins involved in plant immune response, suggesting that these RBPs may participate in the regulation of plant defense by endogenous or pathogen-produced auxins.

In higher plants, AS plays a key role in gene expression as shown by the fact that 60–70% of intron-containing genes undergoes alternative processing. Several genome-wide studies of AS has shown that this mechanism may represent a way to enhance the ability for plant cells to cope with stress via the modulation of transcriptome plasticity. Here we show that among the genes with significant isoforms switching events in nsra/b mutant treated with auxin, we identified several genes involved in the modulation of the MAPK kinase modules, a core regulator of defense responses. They included MKP2 phosphatase which functionally interacts with MPK3 and MPK6 to mediate disease response in Arabidopsis (Lumbreras et al., 2010) and PTI-4 kinase which was found in MPK6 containing complexes in vivo and was shown to function in the MPK6 signaling cascade (Forzani et al., 2011). As activation of MAPK signaling cascades regulate the expression of 1000s of downstream targets genes, we can speculate that a large fraction of the transcriptome change observed in nsra/b mutant could be a consequence of AS defect of genes involved in such early phase of the defense response pathway.

FIGURE 9 | Identification of putative NSRa targets by RIP-seq. (A) Identification of NSRa targets: comparison of mean transcript abundance (TPM) in input vs. RIP-seq libraries Dots in red correspond to putative targets, e.g., significantly enriched transcripts in RIP as compared to input (FDR < 0.01 Log2 fold change > 2) and depleted in Mock IP. (B) Overlap between putative target genes and differentially regulated genes in nsra/b in mock (nsra/b DEG) or NAA-treated (nsra/b NAA DEG) seedlings. (C) RIP-qPCR assays using ProNSRa::NSRa::HA (NSRa) plants on total cell lysates of 10-day-old seedlings treated with 10 mM NAA for 24 h. Genes were randomly selected from NSRa putative target list Results of RT-qPCR are expressed as the mean of the percentage of input of three independent experiments ± standard error. (D) MA plot of showing the relationship between foldchange and transcript abundance for the comparison between nsra/b and Col-0 in the presence of NAA. Red dots correspond to putative NSRa targets. Plain dots correspond to differentially expressed genes. (E) REVIGO plots of GO enrichment clusters of putative target genes Each circle represents a significant GO category but only clusters with highest significance are labeled. Related GOs have similar (x, y) coordinates.

### MATERIALS AND METHODS

### Plant Material and Treatments

All mutants were in the Columbia-0 (Col-0) background. Atnsra (SALK\_003214) and Atnsrb (Sail\_717) were from the SALK and SAIL T-DNA collections, respectively. For RIP, a lines expressing pNSRa::NRSa-HA in Atnsra or pNSRa::NRSb-HA in Atnsrb were used (Bardou et al., 2014). Plants were grown on soil in long day (16 h light/8 h dark) conditions at 23◦C. For RNAsequencing and RIP-seq WT and nsra/nsrb were grown on nylon membrane (Nitex 100 µm) in plate filled with <sup>1</sup>/2MS medium for 10 days and then transferred for 24 h to <sup>1</sup>/2MS medium containing 100 nM NAA or a mock solution before the whole seedlings were harvested. For flowering time analysis, plants were grown under long day conditions and the number of rosette leave were counted from 12 plants when the flower stem was 1 cm tall.

### RNA Sequencing Analysis

Stranded mRNA sequencing libraries were performed on three biological replicate of Col-0,nsra/b treated with a 100 nM NAA or a mock solution. One µg of total RNA from Col-0 and nsra/b seedlings was used for library preparation using the Illumina TruSeq Stranded mRNA library prep kit according to the manufacturer instruction. Libraries were sequenced on an HiSeq2000 sequencer using 150 nt pair-end read mode. A minimum 28 Million of were obtained for each sample, quality filtered using fastqc (Andrews, 2010) with default parameters and aligned using tophat (Trapnell et al., 2012) with the following arguments: -g 1 -i 5 -p

intergenic (red) and antisense (blue) lncRNA which are putative targets of NSRa. The dotted line delineates a p-value of 0.05.

6 -I 2000 –segment-mismatches 2 –segment-length 20 –librarytype fr-firststrand. Read were counted using SummarizeOverlap function from the GenomicRange R package (Lawrence et al., 2013) using strand specific and Union mode. Differential gene expression analysis was done one pairwise comparison using DEseq2 (Love et al., 2014) with FDR correction of the p-value. K-mean clustering analysis was performed in R on scaled log2 fold change data and the optimal number of cluster was determined using the elbow method. Heatmap was plotted using heatmap.2 function of the gplots package (Warnes et al., 2009). Sequence files have been submitted to the NCBI GEO database under accession GSE65717 and GSE116923.

# Gene Ontology Analysis

Gene ontology enrichment analysis was done using the AgriGO server<sup>1</sup> using default parameters. Lists of GO terms were visualized using REVIGO<sup>2</sup> and plotted in R. Only GO terms with a dispensability factor over 0.5 were printed in REVIGO plots.

<sup>1</sup>http://bioinfo.cau.edu.cn/agriGO/

### AS Analysis

RNAprof (v1.2.6) was used on BAM alignment files with the following parameters: LIBTYPE = fr-unstranded, SEQTYPE = "–Pair", MIS = 1000. All possible pairwise comparisons were computed. Overlap of differential events (pval < 1e-04) with gene annotation was done using findOverlaps of the GenomicRanges Package in R and custom in house scripts. Only events that were completely included in gene feature (e.g., intron, exons, 3<sup>0</sup> UTR, and 5<sup>0</sup> UTR) were kept for further analysis.

For isoforms switching identification, transcript isoforms abundance was quantified with pseudo alignment read count with kallisto (Bray et al., 2016), on all isoforms of the AtRTD2 database (Zhang et al., 2017). Then the IsoformSwitchAnalyzeR package was used to detect significant changes in isoform usage. Only significant switches (p.adj < 0.1) were kept for further analyses (Vitting-Seerup and Sandelin, 2017).

# RNA Immunoprecipitation and Sequencing (RIP-Seq)

NSRa protein tagged with HA was immunoprecipitated from the nrsa mutant background expressing the pNSRa:NSRa-HA

<sup>2</sup>http://revigo.irb.hr/

construct (Bardou et al., 2014). Briefly, 10 day old seedlings treated with 100 nM NAA for 24 h were irradiated three times with UV using a UV crosslinker CL-508 (Uvitec) at 0.400 J/cm<sup>2</sup> . Plants were ground in liquid nitrogen and RNA-IP was performed as in Sorenson and Bailey-Serres (2014) with the following modification: immunoprecipitation (IP) was performed using anti mouse HA-7 monoclonal antibody (Sigma) and the negative IP (Mock) was done using anti mouse IgG (Millipore). RNA was eluted from the beads with 50 U proteinase K (RNase grade, Invitrogen) in 2 µl of RNase inhibitor at 55◦C for 1 h in wash buffer and extracted using Trizol according to manufacturer instructions. A 10th of the input fraction was saved for RNA and protein extraction. For western blot analysis, proteins were extracted from the beads and input fraction with 2X SDS-loading Buffer for 10 min at 75◦C, directly loaded on SDS PAGE, transferred onto Nitrocellulose membranes and blotted with HA-7 antibody. For RT-qPCR analysis, RNA was reverse transcribed with Maxima Reverse Transcriptase (Thermo) using random Hexamer priming. cDNA from input, IP and Mock were amplified with primers listed in **Supplementary Table S2**. Results were analyzed using the percentage of input method. First, Ct values of input sample (10% of volume) were adjusted to 100% as follows: Adjusted Ct input = Raw Ct input-log2(10). Percentage of input was calculated as follow: 100<sup>∗</sup> 2ˆ(Adjusted Ct input − Ct IP). Results are mean of three independent experiments. Student's t-test was performed to determine significance. For RNA-seq : input mock and IP RNA were depleted of rRNA using the plant leaf ribozero kit (Illumina) and libraries were prepared using the Illumina TruSeq Stranded mRNA library prep kit according to the manufacturer instruction but omitting the polyA RNA purification step and sequenced on a NextSeq500 sequencer (Illumina) using single-end 75 bp reads mode. Sequence files have been submitted to the NCBI GEO database under accession GSE116914.

### Analysis of RIP-Seq Data

Reads were mapped using STAR (Dobin et al., 2013) and TPM was calculated using RSEM (Li and Dewey, 2011). Read were counted using SummarizeOverlap function from the GenomicRange R package (Lawrence et al., 2013) using strand specific and Union mode. To identify putative NSRa targets we used pairwise comparison with DESeq2 package. Only genes significantly enriched in IP with anti HA as compared with the anti-mouse IgG (mock) IP were kept for further analysis (logFC >= 1; FDR < 0.01). Putative targets genes were defined as gene highly enriched in the IP with anti HA compared to their global level in input used for the IP (logFC > 2; FDR < 0.01). To reduce noise associated with low read counts, we excluded from this list any gene with less than two TPM in at least one of the RIP-seq libraries.

### REFERENCES

Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data. Available at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc

# Measuring Distal and Proximal COOLAIR Variants

This was performed essentially as in Marquardt et al. (2014). 5 µg of total RNA was reverse transcribed with and oligo(dT) primer. qPCR was performed with set of primers specific to distal and proximal COOLAIR described in Marquardt et al. (2014). qPCR reactions were performed in triplicates for each sample. Average values of the triplicates were normalized to the expression of total COOLAIR quantified in the same sample.

# AUTHOR CONTRIBUTIONS

JB designed study, performed the experiments, analyzed the data, and wrote the article. NR, FA, RR, and CC performed the experiments and participated in writing. TB analyzed the data. MC designed the study and wrote the paper.

# FUNDING

This work was supported by grants of The King Abdulla University of Science and Technology (KAUST) International Program OCRF-2014-CRG4 and The LIA (Associated International Laboratory) of CNRS NOCOSYM and 'Laboratoire d'Excellence (LABEX)' Saclay Plant Sciences (SPS; ANR-10- LABX-40) and the ANR grant SPLISIL, France.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2018.01209/ full#supplementary-material

FIGURE S1 | (A) Pearson correlation matrix heatmap with dendograms showing the relative distance between each poly(A)+ RNA-seq samples. (B) PCA analysis showing the effect of auxin and genotype on the variance between samples.

FIGURE S2 | (A) Proximal and (B) distal variant relative abundance normalized to an housekeeping transcript (PP2C). Error bars correspond to ± the standard deviation of three biological replicates. Significance was determined using a Student's t-test (∗∗∗p-value < 0.001).

FIGURE S3 | (A) Pearson correlation matrix heatmap with dendograms showing the relative distance between each sample of the RIP-seq experiments. (B) PCA analysis showing the effect the variance between samples.

TABLE S1 | Summary of RNA-seq and RIP-seq data analysis. (A) Description of spreadsheet tab. (B) Differential gene expression analysis. (C) RNA prof analysis. (D) Expression and usage of all isoforms from genes containing at least one isoforms switching event. (E) NSRa targets identified by RIP. (F) Transcription Factor identified in NSRa targets.

TABLE S2 | Sequence of primers used in this study.

Ariel, F., Jegu, T., Latrasse, D., Romero-Barrios, N., Christ, A., Benhamed, M., et al. (2014). Noncoding transcription by alternative RNA polymerases dynamically regulates an auxin-driven chromatin loop. Mol. Cell 55, 383–396. doi: 10.1016/ j.molcel.2014.06.011


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Bazin, Romero, Rigo, Charon, Blein, Ariel and Crespi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Regulation of Plant Microprocessor Function in Shaping microRNA Landscape

Jakub Dolata, Michał Taube, Mateusz Bajczyk, Artur Jarmolowski, Zofia Szweykowska-Kulinska\* and Dawid Bielewicz\*

Department of Gene Expression, Institute of Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University in Poznan, Poznan, Poland

### Edited by:

Dora Szakonyi, Instituto Gulbenkian de Ciência (IGC), Portugal

### Reviewed by:

Patricia Baldrich, Donald Danforth Plant Science Center, United States Sascha Laubinger, Center for Plant Molecular Biology, Germany

### \*Correspondence:

Zofia Szweykowska-Kulinska zofszwey@amu.edu.pl Dawid Bielewicz bieda@amu.edu.pl

### Specialty section:

This article was submitted to Plant Cell Biology, a section of the journal Frontiers in Plant Science

Received: 14 February 2018 Accepted: 16 May 2018 Published: 05 June 2018

### Citation:

Dolata J, Taube M, Bajczyk M, Jarmolowski A, Szweykowska-Kulinska Z and Bielewicz D (2018) Regulation of Plant Microprocessor Function in Shaping microRNA Landscape. Front. Plant Sci. 9:753. doi: 10.3389/fpls.2018.00753 MicroRNAs are small molecules (∼21 nucleotides long) that are key regulators of gene expression. They originate from long stem–loop RNAs as a product of cleavage by a protein complex called Microprocessor. The core components of the plant Microprocessor are the RNase type III enzyme Dicer-Like 1 (DCL1), the zinc finger protein Serrate (SE), and the double-stranded RNA binding protein Hyponastic Leaves 1 (HYL1). Microprocessor assembly and its processing of microRNA precursors have been reported to occur in discrete nuclear bodies called Dicing bodies. The accessibility of and modifications to Microprocessor components affect microRNA levels and may have dramatic consequences in plant development. Currently, numerous lines of evidence indicate that plant Microprocessor activity is tightly regulated. The cellular localization of HYL1 is dependent on a specific KETCH1 importin, and the E3 ubiquitin ligase COP1 indirectly protects HYL1 from degradation in a light-dependent manner. Furthermore, proper localization of HYL1 in Dicing bodies is regulated by MOS2. On the other hand, the Dicing body localization of DCL1 is regulated by NOT2b, which also interacts with SE in the nucleus. Post-translational modifications are substantial factors that contribute to protein functional diversity and provide a fine-tuning system for the regulation of protein activity. The phosphorylation status of HYL1 is crucial for its activity/stability and is a result of the interplay between kinases (MPK3 and SnRK2) and phosphatases (CPL1 and PP4). Additionally, MPK3 and SnRK2 are known to phosphorylate SE. Several other proteins (e.g., TGH, CDF2, SIC, and RCF3) that interact with Microprocessor have been found to influence its RNA-binding and processing activities. In this minireview, recent findings on the various modes of Microprocessor activity regulation are discussed.

Keywords: microprocessor, DCL1, SE, HYL1, miRNA biogenesis, Arabidopsis

## INTRODUCTION

Mature microRNAs are derived from long primary transcripts (pri-miRNAs) that are produced by RNA Polymerase II (RNAPII). They are capped at their 5<sup>0</sup> ends and possess a poly A tail at their 3<sup>0</sup> ends (Xie et al., 2005). Pri-miRNA levels are tightly regulated at the transcriptional (Zhang et al., 2013; Sun et al., 2015, 2018), co-transcriptional (Fang et al., 2015; Dolata et al., 2016)

**115**

and post-transcriptional levels (Ben Chaabane et al., 2013; Bielewicz et al., 2013; Zhang et al., 2015; Barciszewska-Pacak et al., 2016; Knop et al., 2017; Stepien et al., 2017; Yu et al., 2017). Interestingly, in many cases, changes in the level of a given pri-miRNA are not reflected in changes in the level of the mature microRNA (Barciszewska-Pacak et al., 2015; Dolata et al., 2016). This might be a consequence of regulation at the pri-miRNA or pre-miRNA (intermediate product during microRNA biogenesis) processing/degradation step. Production of microRNAs is driven by a complex called Microprocessor, which in Arabidopsis consists of three core proteins: the RNase type III enzyme Dicer-Like 1 (DCL1), the zinc finger protein Serrate (SE), and the double-stranded RNA binding protein Hyponastic Leaves 1 (HYL1). Microprocessor term was originally coined for a nuclear protein complex in animal cells for pre-miRNA production (Denli et al., 2004; Gregory et al., 2004).

### MICROPROCESSOR COMPONENTS LOCALIZATION

In plants, Microprocessor action is limited to the nucleus (Fang and Spector, 2007; Yu et al., 2017), whereas localization of its components is not restricted to one cellular compartment. According to current knowledge, the first steps of plant microRNA biogenesis occur in specialized nuclear foci called dicing bodies (D-bodies) (Fang and Spector, 2007). Together, DCL1 and HYL1 in the nucleus are found almost exclusively in D-bodies. However, SE is present in D-bodies as well as in nuclear speckles that contain serine/arginine-rich (SR) splicing factors (Ali et al., 2003). How the Microprocessor complex is assembled in D-bodies and how it is recruited to pri-miRNAs are still not clear. Nevertheless, several factors have been shown to be important for D-body formation and effective cleavage of microRNA precursors.

A growing amount of evidence indicates a direct link between RNAPII transcription and the biogenesis of small RNAs in Arabidopsis (Fang et al., 2015; Dolata et al., 2016; Liu et al., 2017). One of the overlapping elements is Elongator complex, firstly described in yeast (Otero et al., 1999) and further purified from plant cells. Elongator was described as a six-component complex involved in the regulation of transcription elongation (Nelissen et al., 2010). Fang et al. (2015) have shown that disruption of the Elongator complex results in reduced RNAPII occupancy at tested MIR genes and lower levels of a few pri-miRNAs. Mutants of two Elongator subunits (elp2-2 and elp5-1) have disrupted DCL1 localization and a reduced number of D-bodies. Furthermore, all core Microprocessor components interact with Elongator complex subunits (ELP2, ELP4, and ELP5-Elongator complex Proteins 2/4/5). DCL1 associates with chromatin on MIR loci, and a functional Elongator complex is necessary for DCL1 recruitment to nascent MIR transcripts. These data suggest that the processing of at least some pri-miRNAs occurs cotranscriptionally.

More evidence for a connection between transcription and Microprocessor was presented by Wang et al. (2013), who described two Arabidopsis NOT2 (Negative on TATA-less 2) proteins (NOT2a and NOT2b) as factors that promote microRNA production. In yeasts, NOT2 was shown to bind directly to RNAPII and to promote transcription elongation (Kruk et al., 2011). Similarly, in Arabidopsis, NOT2b coprecipitates with the large subunit of RNAPII and affects transcription. Moreover, NOT2b interacts with DCL1 and SE; however, it does not interact with HYL1. Furthermore, a not2a-1 not2b-1 double mutant was shown to have an increased number of nuclear foci containing DCL1. Still, the possibility that NOT2 links MIR gene transcription and post-transcriptional processing for better coordination and efficiency cannot be excluded.

In Arabidopsis, the MOS4-Associated Complex (MAC) (Palma et al., 2007; Monaghan et al., 2009) is a counterpart of the NineTeen Complex (NTC) in yeast (Fabrizio et al., 2009) and is directly linked to transcription and microRNA biogenesis (Zhang et al., 2013, 2014). Recently, it was found that mutants of MAC subunits (mac7-1 and double mac3a mac3b) have reduced number of HYL1-containing D-bodies (Jia et al., 2017; Li et al., 2018). Authors suggest that it is a consequence of the fact that Microprocessor complex assembly requires pri-miRNA (Wu et al., 2013). Therefore decreased level of pri-miRNAs in mac mutants may affect HYL1 localization.

These assumptions are made based on the previous paper by Wu et al. (2013) who found the RNA-binding protein MOS2 (Modifier of Snc1) is supporting D-body assembly. In a mos2-2 mutant, HYL1 localization was found to be different than that in WT plants as it was relatively homogeneous in the nucleus; however, HYL1 interactions with DCL1 and SE were not affected. MOS2 does not interact directly with core Microprocessor components but instead binds pri-miRNAs. Additionally, in the absence of MOS2, the association of HYL1 with pri-miRNAs is significantly reduced (Zhang et al., 2005). These data suggest that microRNA precursors may serve as scaffolds for D-body formation.

The balance between pri-miRNA and Microprocessor components assembly may be disturbed by over-accumulation of pre-mRNA splicing intermediates. Non-debranched intron lariats sequestrate dicing complexes and negatively affect primiRNA processing. Arabidopsis mutant in debranching enzyme (dbr1-2) shows increased number of DCL1 and HYL1 nuclear bodies (Li et al., 2016).

The distribution of HYL1-GFP indicates that HYL1 is present in the nucleus and the cytoplasm (Han et al., 2004). HYL1 degradation in the cytoplasm is regulated by the RING-finger E3 ligase COP1 (Constitutive Photomorphogenic 1) (Cho et al., 2014). During the day, COP1 moves to the cytoplasm and indirectly protects HYL1 from degradation, most likely by inhibiting an undefined protease. During the night, COP1 remobilizes to the nucleus, allowing the protease to cleave the N-terminus of HYL1. This specific cleavage inhibits HYL1 function and causes an immediate reduction in correctly processed microRNAs (Cho et al., 2014).

A connection between light signaling and microRNA biogenesis comes from studies on Phytochrome Interacting Factor 4 (PIF4). It was shown that PIF4 promotes the

destabilization of both: DCL1 and HYL1 during dark-to-red-light transition (Sun et al., 2018).

Processing of MIR gene transcripts occurs in the nucleus, and effective cytoplasm–nucleus trafficking of Microprocessor components is necessary for its proper function. Recently, KETCH1 (Karyopherin Enabling the Transport of the Cytoplasmic HYL1) was described as an HYL1-interacting importin-β protein (Zhang et al., 2017). KETCH1 null mutants are embryo-lethal, whereas the downregulation of KETCH1 using artificial microRNAs causes a reduction in HYL1 level in the nucleus, although SE localization in the nucleus is not affected. A decreased level of KETCH1 leads to disturbances in microRNA production (accumulation of several pri-miRNAs as well as pre-miRNAs and reduced levels of mature microRNAs). An amiR-ketch1 and hyl1-2 double mutant was found to resemble the hyl1-2 phenotype morphologically as well as at the microRNA level, which indicates that both proteins act in the same pathway. It is not known if KETCH1 functions are limited to HYL1 cytoplasm–nucleus transport; however, the regulation of HYL1 level and its accessibility in different tissues and under different growth conditions might play an important role in the regulation of proper levels of miRNAs that are HYL1-dependent (Szarzynska et al., 2009).

# POST-TRANSLATIONAL MODIFICATIONS (PTMs) OF MICROPROCESSOR COMPONENTS

Most proteins require PTMs for proper function. More than 40 different post-translational protein modifications have been identified (Beck-Sickinger and Mörl, 2006), and they can play important roles in protein folding, subcellular localization, catalytic activity, or stability. For the plant Microprocessor, the phosphorylation of HYL1 is the only PTM that has been found to be crucial for efficient microRNA production. The interplay between protein phosphorylation and dephosphorylation is known to enable the rapid and efficient tuning of protein function. Using a forward genetic screen, Manavella et al. (2012) found that a mutation in the CPL1 (C-Terminal Domain Phosphatase-like 1) gene causes impaired processing of microRNA precursors and aberrant strand selection during RISC loading. CPL1 encodes a phosphatase that was shown to dephosphorylate the C-terminal domain (CTD) of the RNAPII largest subunit specifically at Ser5 (Koiwa et al., 2004). In vivo, CPL1 interacts with two components of the plant Microprocessor, SE and HYL1 (Jeong et al., 2013). Two serine residues of HYL1, S42 and S159, are especially important for HYL1 function, and hyperphosphorylated HYL1 is inactive (Manavella et al., 2012). Both serine residues are located within the dsRNA binding domains of HYL1. Dephosphorylation of HYL1 by CPL1 is stimulated by a protein called RCF3 (Regulator of CBF Gene Expression 3, also known as HOS5 or SHINY1 (Jiang et al., 2013; Chen et al., 2015; Karlsson et al., 2015). RCF3 expression is restricted to the vegetative shoot apical meristem, young leaf primordia and newly emerging leaves, which suggests that fine-tuning of HYL1 activity via phosphorylation can be tissue specific (Karlsson et al., 2015). Moreover, the expression of RCF3 is reduced by salt, hyperosmotic stress, and ABA. This may indicate that plants modulate the phosphorylation status of HYL1 in response to environmental changes (Jiang et al., 2013). Another protein that dephosphorylates HYL1 is PP4 (Protein Phosphatase 4, also termed PPX) (Su et al., 2017). PP4 is a highly conserved protein among eukaryotes that functions to assist specific regulatory subunits (for example, SMEK1 in plants, PP4RS in mammals, and PSY2 in yeast (Gingras et al., 2005; Kataya et al., 2017; Su et al., 2017). In Arabidopsis thaliana, the PP4 phosphatase is encoded by two genes (PP4-1 and PP4-2), the proteins of which share 93% sequence identity and have the same expression pattern, suggesting that their biological functions might be very similar if not redundant (Pujol et al., 2000). Attempts to obtain stable A. thaliana knockdown/out lines for the PP4-1/2 genes have been unsuccessful; however, knockout mutants of the PP4 regulatory subunit SMEK1 (Suppressor of MEK1) are viable (Kataya et al., 2017; Su et al., 2017). In smek1 mutants, microRNA expression levels are reduced due to the accelerated degradation of hyperphosphorylated HYL1. Importantly, SMEK1 protects HYL1 from degradation in a COP1- and light-independent manner; therefore, the regulation of HYL1 activity by PP4 represents another regulatory network present in plants (Su et al., 2017).

Beside phosphatases, kinases have also been found to be important for HYL1 phospho-regulation. HYL1 is phosphorylated by MPK3 (Mitogen-Activated Protein Kinase 3) and SnRK2 (SNF1-related protein kinase subfamily 2) (Raghuram et al., 2015; Yan et al., 2017). In mpk3 mutant plants, HYL1 protein accumulates, and consequently, the levels of mature microRNAs are significantly higher than those in wild-type plants. Interestingly, the repertoires of small RNAs are affected in both mpk3 and cpl1 mutants, but they do not overlap with each other, which may suggest that CPL1 (phosphatase) and MPK3 (kinase) act separately in parallel pathways (Raghuram et al., 2015). Furthermore, upon ABA treatment, MPK3 activation depends on the presence of HYL1 in the cell, which suggests that MPK3 and HYL1 are regulated in a feedback loop (Lu et al., 2002). The SnRK2 subfamily consists of 10 members (SnRK2.1-10) among which SnRK2.2/2.3/2.6 are involved in pri-miRNA processing and are strongly activated by ABA treatment. Surprisingly, in a snrk2.2/3/6 triple mutant, the levels of HYL1 and mature microRNAs were found to be decreased (Yan et al., 2017). Yan et al. (2017) have shown that SnRK2.6 phosphorylates HYL1 in vitro and that SnRK2.2, SnRK2.3, and SnRK2.6 interact with HYL1 in plants. Many putative phosphorylation sites were found in the HYL1 protein (Supplementary Table S1); however, the precise amino acid residues that are phosphorylated in HYL1 by MPK3 and SnRK2.2/2.3/2.6 have not been identified. Thus, various HYL1 phosphorylation patterns might exert different functional effects on pri-miRNA biogenesis. Yan et al. (2017) showed that SnRK2.6 can phosphorylate the SE protein in vitro in addition to HYL1. The observation that SE can be phosphorylated in plants was reported previously (see chapter below) (Wang et al., 2013), but, currently, nothing is known about the effect of this modification on SE localization, stabilization, or activity. A model presenting

current knowledge on the role of PTMs in Microprocessor activity regulation and localization of its components is shown in the **Figure 1**.

# STRUCTURAL ASPECTS OF PTMs OF CORE MICROPROCESSOR COMPLEX PROTEINS

DCL1 is the largest protein in the Microprocessor complex. It contains several domains: a helicase domain at the N-terminus, a domain of unknown function 283, a PAZ domain, two catalytic RNase III domains and two dsRNA binding domains at the C-terminus. The HYL1 protein contains two dsRNAbinding domains (dsRBD1 and dsRBD2) at the N-terminus and six 28-amino acid imperfect repeats at the C-terminus. The SERRATE protein possesses a core domain (195–543) that can be divided into three regions: an N-terminal alpha helical fragment, a middle domain fragment and a C2H2 zinc finger fragment. Both the N and C termini of SERRATE are predicted to be disordered in solution (Machida et al., 2011). The structures of the core domains of the A. thaliana SERRATE and HYL1 proteins and the second dsRBD domain of DCL1 have been determined (Rasia et al., 2010; Yang et al., 2010; Machida et al., 2011; Burdisso et al., 2012, 2014). Moreover, the structure of the dsRBD1 of HYL1 was solved as a complex containing a short 10 bp RNA duplex. Both dsRBD domains of HYL1 possess an alphabeta-beta-beta-alpha fold, which is a signature of the dsRBD domain family (Masliah et al., 2013). Thus far, no data regarding how PTMs might interfere with the structure of Microprocessor proteins have been reported. Serine 42 is located within the loop between beta strand 1 and beta strand 2 of the HYL1 dsRBD1.

FIGURE 1 | Plant microRNA biogenesis is fine-tuned via regulation of Microprocessor components localization and their post-translational modifications. The level of HYL1 in the cytoplasm is indirectly regulated by COP1 in a light-dependent manner; HYL1 nuclear import is mediated by KETCH1 importin. Core components of the Microprocessor: DCL1, SE and HYL1 as well as microRNA precursors are located and interact with each other in the D-bodies. MOS2 is important for the D-bodies assembly however, does not interact directly with the Microprocessor. The Elongator complex and NOT2 proteins form a bridge coupling transcription and processing of microRNAs. Phosphorylation status of HYL1 is crucial for its efficient function and is a consequence of an interplay between kinases (MPK3 and SNRK2) and phosphatases (CPL1 and PP4). Direct influence of SnRK2 on SE and HYL activity is not known. CBC, nuclear Cap Binding Complex; dashed lines indicate that exact localization of the interactions is unknown.

Using the published structure, we noticed that the side chain of serine 42 may interact with the minor groove of the dsRNA and may form hydrogen bonds with the N2 and N3 nitrogen atoms of guanine and the 2<sup>0</sup> hydroxyl group of the ribose ring (**Figure 2A**). A bulky phosphate group attached to serine 42 could potentially interfere with the minor groove interactions and negatively regulate binding to RNA duplexes. Serine 159 is localized in alpha helix 2 of the HYL1 dsRBD2, and it may interact with the loop between beta strand 3 and alpha helix 2. In addition, this serine may form a hydrogen bond with the nitrogen from the peptide bond between glycine 147 and alanine 148 in the opposite loop of the same HYL1 molecule (**Figure 2B**). Yang et al. (2014) found that a G147E mutation significantly reduces dimerization of the HYL1 protein. Moreover, this substitution was present in a hyl1-3 mutant (Manavella et al., 2012). Therefore, serine 159 phosphorylation could destabilize the hydrogen bond network, leading to the mislocation of beta strand 3, the disruption of the dimerization interface and, ultimately, the inactivation of HYL1.

From high-throughput studies of the A. thaliana phosphoproteome, several phosphorylation sites in the SERRATE protein were identified and deposited in the PhosPhAt 4.0 database (Durek et al., 2010). The phosphorylated residues are mostly located within the N-terminal region, a low complexity domain rich in proline and serine residues, and within the middle fragment from the core structure domain. In addition, one phosphorylation site was found within the C-terminal fragment. A list of the phosphorylation sites found in the SERRATE and HYL1 proteins is shown in Supplementary Table S1. The N-terminal fragment of SERRATE (amino acids 1–194) is predicted to be disordered in solution. Additionally, the amino acid sequence conservation in this region is relatively low in comparison to that of the core domain (195–543). Nevertheless, the overall amino acid composition between SERRATE proteins from different species is very similar. Phosphorylation of low complexity domains has been shown to affect aggregation, local structure and protein-protein interactions (Kwon et al., 2013; Monahan et al., 2017). The N-terminal domain of SERRATE is responsible for its interaction with the two U1 small nuclear ribonucleoprotein particle (U1 snRNP) proteins, PRP40b and PRP40a, and deletion of the N-terminal domain causes more homogeneous nuclear localization in comparison to the specklelike localization of the wild-type protein (Knop et al., 2017). Thus, the phosphorylation status of this domain in SERRATE may affect interactions with a large number of interacting partners in different processes in which SE is involved.

### NEGATIVE FEEDBACK REGULATION OF MICROPROCESSOR

To provide fine tuning of microRNA production and to maintain balance in mRNA target degradation, Microprocessor components are regulated at the post-transcriptional level by a negative feedback loop. This feedback regulation of Microprocessor was first shown in 2003 by Carrington group. Xie et al. (2003) showed that DCL1 mRNA level is regulated by miRNA162. In wild-type plants, the DCL1 transcript is in relatively low-abundance because functional DCL1 catalyzes miRNA production, and miRNA162 targets the DCL1 transcript for degradation. However, in mutants with impaired miRNA biogenesis (for example dcl1-7, hen1-1), increased level of DCL1 mRNA has been detected. Moreover, plants expressing the P1/HC-pro protein [Turnip mosaic virus (TuMV) RNA silencing suppressor], which inhibits the small RNA-guided cleavage of RNAs, had increased DCL1 mRNA level (Xie et al., 2003). The abundance of the DCL1 transcript is also regulated by the production of miRNA838, which is encoded within the 14th intron of the DCL1 pre-mRNA. The generation of this miRNA is a consequence of DCL1 pre-mRNA cleavage into two non-functional transcripts that are 4- and 2.5-kb in length (Rajagopalan et al., 2006). Rajagopalan et al. (2006) suggested that higher level of DCL1 protein results in more efficient processing of DCL1 primary transcripts by Microprocessor than its recognition by the Spliceosome, which results in a higher level of miR838 and a lower level of DCL1 transcript in the cell. Similar to DCL1, the SE level is determined by a negative feedback loop that involves miR863-3p. This microRNA targets the 3<sup>0</sup> UTR of SE mRNA as well as two negative regulators of plant

REFERENCES


defense: ARLPK1 and ARPLK2. The level of miR863-3p increases after bacterial infection and silences two negative regulators of plant defense by cleaving their mRNAs. At subsequent steps of infection, when at its highest level, miR863-3p inhibits the translation of SE mRNA, which results in lower efficiency of miRNA biogenesis and decreased miRNA levels, including those of miR863-3p (Niu et al., 2016).

### AUTHOR CONTRIBUTIONS

JD, MT, MB, AJ, ZS-K, and DB participated in preparation of draft manuscript. MT and JD prepared figures. JD, ZS-K, and DB participated in assembly and editing of the final manuscript.

### FUNDING

This work was supported by the KNOW RNA Research Centre in Poznan (Grant No. 01/KNOW2/2014) and the National Science Center projects UMO-2016/23/D/NZ1/00152 (DB), UMO-2017/25/BNZ1/00603 (JD), UMO-2013/10/A/NZ1/00557 (AJ), UMO-2016/23/B/NZ9/00862 (ZS-K), and UMO-2014/13/N/NZ1/00049 (MB).

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2018.00753/ full#supplementary-material

ribonuclease 1: structural and biochemical characterization. Biochemistry 51, 10159–10166. doi: 10.1021/bi301247r




**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Dolata, Taube, Bajczyk, Jarmolowski, Szweykowska-Kulinska and Bielewicz. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# miRVIT: A Novel miRNA Database and Its Application to Uncover Vitis Responses to Flavescence dorée Infection

Walter Chitarra1,2† , Chiara Pagliarani<sup>1</sup>† , Simona Abbà<sup>1</sup> , Paolo Boccacci<sup>1</sup> , Giancarlo Birello<sup>3</sup> , Marika Rossi<sup>1</sup> , Sabrina Palmano<sup>1</sup> , Cristina Marzachì<sup>1</sup> , Irene Perrone<sup>1</sup> \* and Giorgio Gambino<sup>1</sup>

1 Institute for Sustainable Plant Protection, National Research Council of Italy, Turin, Italy, <sup>2</sup> Viticultural and Enology Research Centre, Council for Agricultural Research and Economics, Conegliano, Italy, <sup>3</sup> Research Institute on Sustainable Economic Growth, National Research Council of Italy, Turin, Italy

### Edited by:

Ana Confraria, Instituto Gulbenkian de Ciência (IGC), Portugal

### Reviewed by:

Marina Dermastia, National Institute of Biology (NIB), Slovenia Alfredo Pulvirenti, Università degli Studi di Catania, Italy

> \*Correspondence: Irene Perrone irene.perrone@ipsp.cnr.it

†These authors have contributed equally to this work.

### Specialty section:

This article was submitted to Plant Cell Biology, a section of the journal Frontiers in Plant Science

Received: 02 May 2018 Accepted: 26 June 2018 Published: 17 July 2018

### Citation:

Chitarra W, Pagliarani C, Abbà S, Boccacci P, Birello G, Rossi M, Palmano S, Marzachì C, Perrone I and Gambino G (2018) miRVIT: A Novel miRNA Database and Its Application to Uncover Vitis Responses to Flavescence dorée Infection. Front. Plant Sci. 9:1034. doi: 10.3389/fpls.2018.01034 Micro(mi)RNAs play crucial roles in plant developmental processes and in defense responses to biotic and abiotic stresses. In the last years, many works on small RNAs in grapevine (Vitis spp.) were published, and several conserved and putative novel grapevine-specific miRNAs were identified. In order to reorganize the high quantity of available data, we produced "miRVIT," the first database of all novel grapevine miRNA candidates characterized so far, and still not deposited in miRBase. To this aim, each miRNA accession was renamed, repositioned in the last version of the grapevine genome, and compared with all the novel and conserved miRNAs detected in grapevine. Conserved and novel miRNAs cataloged in miRVIT were then used for analyzing Vitis vinifera plants infected by Flavescence dorée (FD), one of the most severe phytoplasma diseases affecting grapevine. The analysis of small RNAs from healthy, recovered (plants showing spontaneous and stable remission of symptoms), and FD-infected "Barbera" grapevines showed that FD altered the expression profiles of several miRNAs, including those involved in cell development and photosynthesis, jasmonate signaling, and disease resistance response. The application of miRVIT in a biological context confirmed the effectiveness of the followed approach, especially for the identification of novel miRNA candidates in grapevine. miRVIT database is available at http://mirvit.ipsp.cnr.it.

Highlights: The application of the newly produced database of grapevine novel miRNAs to the analysis of plants infected by Flavescence dorée reveals key roles of miRNAs in photosynthesis and jasmonate signaling.

Keywords: disease resistance, jasmonate, miRNAs, phytoplasma, target genes, univocal database

# INTRODUCTION

RNA silencing or RNA interference is a pivotal mechanism of gene regulation conserved in a broad range of eukaryotic organisms, including fungi, plants, and animals (Baulcombe, 2002). Small-endogenous RNAs are important effectors of RNA silencing phenomena and are classified into two major categories based on the nature of their RNA precursors: microRNAs (miRNAs)

and small-interfering RNAs (Axtell, 2013). miRNAs are singlestranded RNA molecules of approximately 21 nt length generated from endogenous MIR genes (Nozawa et al., 2012). The precursors of miRNAs were processed through DICER-LIKE 1 proteins (DCL1) to release a mature miRNA:miRNA<sup>∗</sup> duplex (Kurihara and Watanabe, 2004). The mature miRNA loaded into a specific ARGONAUTE (AGO)-associated RNAinduced silencing complex (RISC) guides the identification and cleavage of complementary mRNA targets in a sequence-specific manner (Mallory et al., 2008). The miRNA guide strand is generally more abundant than the miRNA star strand (miRNA<sup>∗</sup> ) and is responsible for the RISC-mediated target regulation. However, increasing evidence also suggests an association between miRNA<sup>∗</sup> and AGO proteins exerting relevant biological functions (Liu et al., 2017). miRNAs are involved in several developmental processes, in genome stability maintenance, and in plant adaptation to biotic and abiotic stresses, as reviewed in many studies (Sunkar et al., 2012; Staiger et al., 2013; Borges and Martienssen, 2015).

Among pathogens infecting plants, phytoplasmas, phloemlimited bacteria belonging to the class of Mollicutes, cause serious yield and economic losses in many crops (Bendix and Lewis, 2018). In Vitis vinifera, Flavescence dorée (FD) and Bois noir (BN) are associated with grapevine yellows, the most important and damaging phytoplasma-induced diseases in Europe. In particular, the quarantine disease FD is caused by a phytoplasma (FDp) of the elm yellows group (16SrV-C and V-D; Angelini et al., 2001) transmitted in a persistent and propagative manner by the ampelophagous leafhopper Scaphoideus titanus Ball (Hemiptera Cicadellidae). Typical symptomatology observed in FDp-infected grapevines generally includes downward rolling of leaves, yellowing or reddening of the leaves according to white or red grape varieties, drying of inflorescences and bunches, shortening of internodes, and lack of cane lignification (Caudwell, 1990). In some cases, the spontaneous and stable remission of FD symptoms accompanied by the disappearance of FDp from the plants may occur, giving rise to "recovery" phenomena (Osler et al., 2004; Maggi et al., 2017). Molecular and physiological changes in grapevine infected and recovered by FDp have recently been investigated (Musetti et al., 2007; Gambino et al., 2013; Margaria et al., 2013, 2014; Vitali et al., 2013; Prezelj et al., 2016); nevertheless, the biological mechanisms underlying these processes are not yet fully understood. Although for some plant-phytoplasma systems, the role of miRNAs in disease development appeared clear (Ehya et al., 2013; Gai et al., 2014, 2018; Minato et al., 2014; Shao et al., 2016; Snyman et al., 2017), no data are currently available in the case of FDp– grapevine interaction.

In the last few years, from the first report concerning the identification of novel miRNAs in grapevine (Carra et al., 2009), several works have been published on this topic, and many conserved and putative novel miRNAs were identified. Different species from the Vitis genus were analyzed, the most studied being several cultivars of Vitis vinifera (Carra et al., 2009; Pantaleo et al., 2010, 2016; Alabi et al., 2012; Belli Kullan et al., 2015; Sun et al., 2015; Paim Pinto et al., 2016; Bester et al., 2017a,b; Pagliarani et al., 2017; Snyman et al., 2017) and the "Summer Black" hybrid of V. vinifera × Vitis labrusca (Wang et al., 2011, 2014; Han et al., 2014; Zhao et al., 2017). In addition, some miRNAs derived from Vitis amurensis (Wang et al., 2012) and from a powdery mildew-resistant accession of the wild Chinese Vitis pseudoreticulata Baihe-35-1 (Han et al., 2016) were characterized as well. Substantially, all grapevine tissues in several developmental stages (Belli Kullan et al., 2015) and from plants subjected to biotic (Alabi et al., 2012; Singh et al., 2012; Pantaleo et al., 2016; Bester et al., 2017a; Snyman et al., 2017) or abiotic stresses were analyzed (Sun et al., 2015; Pantaleo et al., 2016; Pagliarani et al., 2017). Nevertheless, in these studies, there is no uniformity in the classification of the identified novel miRNAs: hundreds of putative novel miRNAs were indicated in different ways, and often the same miRNA was named differently in different works.

In this study, we classified all putative novel miRNAs discovered in grapevine so far, and not deposited in the public database miRBase, into a single database, namely, miRVIT (available at http://mirvit.ipsp.cnr.it), in order to standardize their classification by assigning a new and univocal denomination. The obtained database was then used to analyze nine small RNA libraries produced from leaf midribs collected from FDp-infected (FD), recovered (R), and healthy (H) plants of the highly susceptible V. vinifera cv. Barbera. The expression dynamics of conserved, novel miRNAs and their target transcripts associated with FDp infection and/or recovery processes were investigated.

### MATERIALS AND METHODS

# Construction of a Single Database of Novel miRNAs From Grapevine

The data related to all the novel miRNAs identified so far in grapevine were collected from 18 already published works (Carra et al., 2009; Pantaleo et al., 2010, 2016; Wang et al., 2011, 2012, 2014; Alabi et al., 2012; Singh et al., 2012; Han et al., 2014, 2016; Belli Kullan et al., 2015; Sun et al., 2015; Paim Pinto et al., 2016; Bester et al., 2017a,b; Pagliarani et al., 2017; Snyman et al., 2017; Zhao et al., 2017). Each novel miRNA was aligned using BLAST both against the known miRNAs deposited in miRBase (Kozomara and Griffiths-Jones, 2014) and against all the novel miRNAs retrieved by the different studies above cited. miRNAs already classified by other authors as isomiRs of conserved miRNAs were not considered. Two or more miRNAs were considered to be a single entity, i.e., unique miRNA, when: (i) the miRNA sequences were the same, (ii) they localize in the same position on the genome, and (iii) they had the same precursors. Each novel miRNA was re-named using a "vvi\_miC" code followed by the indications "5p" or "3p" according to its location within the precursor sequence. The sequences of miRNA and miRNA<sup>∗</sup> (when available) were repositioned on the last version of the grape genome 12X V1 (PN40024<sup>1</sup> ). Novel miRNAs sharing the same sequences, but with different precursors located in different regions of the grape genome, were indicated with

<sup>1</sup>http://genomes.cribi.unipd.it/grape/

the same name followed by lowercase letters (a, b, c, etc.). Different isoforms (isomiR) of the same novel miRNA shifting a few nucleotides in the same genomic region within the same precursor were indicated as versions V1, V2, etc., of the same novel miRNA.

# Database of miRNA Targets

The target transcripts of all the conserved and novel miRNAs already validated in grapevine through either degradome or 5<sup>0</sup> RACE approaches were collected from 11 works (Carra et al., 2009; Pantaleo et al., 2010, 2016; Wang et al., 2012, 2013, 2014; Han et al., 2014; Jiu et al., 2015; Sun et al., 2015; Leng et al., 2017; Zhao et al., 2017). The annotation of the targets previously identified using the old grapevine annotation, referred to as the 8X genome, was updated to the grapevine transcriptome version 12X V1<sup>2</sup> . In addition, target transcripts of novel miRNAs were searched on the degradome library of "Pinot noir" (Pantaleo et al., 2010), and in parallel using the default parameters of the webbased tool StarScan (sRNA Target Scan, Liu et al., 2015). This tool classifies the sRNA-mediated cleavage events in different categories basing on several parameters related to the cleavage of the sequence of the target gene and miRNA, only the three most reliable classes (Z, I, and II) were considered. For each identified target, the annotation 12X V1 was integrated by Gene Ontology (GO) classifications using the Blast2GO tool<sup>3</sup> .

### Plant Material and Experimental Setup

The biological study was performed on plants of the red grape cultivar Barbera (V. vinifera) grown in a vineyard located in Cocconato (Piemonte), northwest Italy, and monitored since 2008 for phytoplasma infection (Maggi et al., 2017). FDp and BNp were detected following a phytoplasma enrichment protocol (Marzachì et al., 1999), and the viruses that commonly affect grapevine in Italy were detected by multiplex RT-PCR (Gambino, 2015). Only the plants without both BNp and the most dangerous viruses infecting grapevine were considered for the experimental trial and further divided into three groups: FDp-infected (FD), recovered (R, i.e., plants found positive to FDp infection in the past, but then resulted FDp-negative and symptomless for the last 2 years), and healthy (H). The FDp isolate present in the vineyard belongs to the 16SrV-C subgroup (Maggi et al., 2017). From each group, five plants of the same age and located on neighboring rows in the vineyard were selected, and fully expanded leaves were collected in July 2014. For diseased plants (FD), only canes showing the typical FD symptoms (Caudwell, 1990) were chosen for sampling, and leaf tissues with advanced necrotic areas were excluded. In R and H plants, leaves inserted in the central region of the shoot were randomly collected. Leaf midribs were isolated using a sterile scalpel, immediately frozen in liquid nitrogen and stored at −80◦C until RNA extraction and JA quantification. The five plants selected for each category constituted independent biological replicates; three of them were used for small RNA libraries construction, and all the five plants were used for real-time PCR (RT-qPCR) validation assays.

# Small RNA Library Construction and Sequencing

RNA was extracted from leaf midribs using the mirPremier <sup>R</sup> microRNA Isolation Kit (Sigma-Aldrich, Inc.) following the manufacturer's instructions. Nine libraries (three biological replicates for each category H, R, and FD) of small RNAs were obtained using the TrueSeq Small RNA Sample Kit (Illumina, San Diego, CA, United States) and were sequenced using the HISeq 2000 Illumina platform by an external service. All data were processed using the UEA small RNA Workbench (Mohorianu et al., 2017) by first removing 3<sup>0</sup> -adaptor sequences and then filtering out according to the following criteria: (a) low-quality and length (minimum 16 nt; maximum 30 nt); (b) low-complexity, i.e., sequences containing less than three distinct nucleotides were discarded; (c) reads matching to known transfer and ribosomal RNA were discarded; and (d) only reads matching to the V. vinifera assembly 12X<sup>1</sup> were retained. Any sequences without adaptor matches were excluded from further analysis. The miRNA predictions were performed using miRCat (Moxon et al., 2008) on the above mentioned V. vinifera assembly. Precursor sequences were then processed using the MFOLD software (v. 2.3<sup>4</sup> ; Zuker, 2003) to analyze folding of hairpin secondary structures. Detection of similarities among predicted miRNAs and annotated miRNAs was performed running the BLAST algorithm against both miRBase and our newly produced database.

Bowtie 1.1.2 software<sup>5</sup> with no mismatch allowed in the alignment was employed to establish accurate miRNA abundance profiles of the nine sequenced samples. Following alignment, the resulting miRNA counts were normalized for differences in sequencing depths to account for the technical differences across samples. Statistically significant differences (P < 0.05) in miRNA expression were estimated by applying the Student's ttest between the three possible couples of combinations (R vs. H, FD vs. R; FD vs. H).

Hierarchical clustering (HCL) analysis was then applied using Pearson's correlation distance and the software MeV v. 4.9<sup>6</sup> , using as input the normalized values of the reads generated for the three categories H, R, and FD.

### Real-Time PCR Analysis

The quantification of miRNA expression levels was carried out by RT-qPCR following the protocol by Shi and Chiang (2005), with the modifications reported in Pantaleo et al. (2016). For target quantification, total RNA was extracted using the SpectrumTM Plant Total RNA extraction kit (Sigma-Aldrich, Inc.) and RTqPCR assays were performed as reported by Pantaleo et al. (2016). Expression levels of miRNAs and related target genes were quantified after normalization to either 5.8S rRNA and U6 or VvUBI and VvACT endogenous genes used as internal controls, respectively. Specific primer pairs used in RT-qPCR reactions are listed in Supplementary Table S1. Transcript and

<sup>2</sup>http://genomes.cribi.unipd.it/DATA/V1/

<sup>3</sup>https://www.blast2go.com/

<sup>4</sup>http://unafold.rna.albany.edu

<sup>5</sup>https://sourceforge.net/projects/bowtie-bio/files/bowtie/1.1.2/

<sup>6</sup>http://compbio.dfci.harvard.edu/compbio/tools/mev

miRNA levels were expressed as the mean and standard errors were calculated over five biological replicates.

# Quantification of Jasmonate Content in Leaf Midribs

400 mg of homogenized sample was freeze dried and transferred with 0.5 mL of methanol:water (1:1 v/v) acidified with 0.1 % of formic acid in an ultrasonic bath for 1 h. Samples were centrifuged for 2 min at 4◦C and 15,000 rpm, and the supernatant was analyzed by high-performance liquid chromatography (HPLC). Original standard of (±)-jasmonic acid (purity ≥ 95%, Sigma-Aldrich) was used for the identification by comparing retention time and UV spectrum. The quantification was made by external calibration method.

The HPLC apparatus was an Agilent 1220 Infinity LC system (Agilent <sup>R</sup> , Waldbronn, Germany) model G4290B equipped with gradient pump, auto-sampler, and column oven set at 30◦C. A 170 Diode Array Detector (Gilson, Middleton, WI, United States) set at 280 nm was used as detector. A XTerra RP18 analytical column (150 mm × 6 mm i.d., 5 µm, Waters) was used. The mobile phases consisted in water acidified with formic acid 0.05% (A) and acetonitrile (B), at a flow rate of 0.500 mL min−<sup>1</sup> in gradient mode, 0–20 min: from 10 to 35% of B, 20–25 min: from 35 to 100% B; 20 µL was injected for each sample.

### Statistical Analyses

Significant differences among treatments were statistically analyzed by applying a one-way ANOVA test followed by the Tukey's HSD post hoc test (P ≤ 0.05). Significant differences of pairwise comparisons were assessed by Student's t-test. The SPSS statistical software package (SPSS Inc., Cary, NC, United States, v.23) was used to run statistical analyses.

### Accession Numbers

Raw sequences from the nine miRNA libraries were deposited at the NCBI Sequence Read Archive under the accession number SRP129862.

# RESULTS AND DISCUSSION

## miRVIT, a Novel Grapevine miRNA Database

Using the data reported in 18 works published until now on miRNAs in the Vitis genus, all putative novel miRNAs (hereafter referred to as novel miRNAs) identified and still not deposited in miRBase were collected into a single database, called "miRVIT" (available at http://mirvit.ipsp.cnr.it, Supplementary Table S2). Overall, 901 sequences referred to as novel miRNAs, re-named using a "vvi\_miC" code, were found; among these, 22 are miRNAs or miRNAs<sup>∗</sup> already deposited in miRBase and were excluded. In some cases, the same novel miRNA was reported using different classifications in more than 10 different works (Supplementary Table S2). For instance, 14 research groups identified vvi\_miC1039, 13 identified vvi\_miC132 (Supplementary Figure S1), and 11 identified vvi\_miC137. The sequences of precursors were obtained by bioinformatics predictions, and different software or different versions of the same software were employed depending on the year of publication of the previous works, using in some cases different version of the grapevine genome. Consequently, some differences in the precursor sequences outside the region between miRNA and miRNA<sup>∗</sup> were identified for the same novel miRNA (Supplementary Figure S1).

Such a large number of putative grapevine-specific novel miRNAs do not appear realistic based on current knowledge, as it is conceivable that many false positives are present in this list, since in Arabidopsis (the most intensively studied plant species), only a few hundred miRNAs are currently known (Taylor et al., 2017; Axtell and Meyers, 2018). Starting from the criteria published for plant miRNA annotation (Meyers et al., 2008; Axtell and Meyers, 2018), we considered as novel miRNAs only the 20–22 nt long small RNAs, thus excluding small RNA sequences of all other lengths. Indeed, heterochromatic siRNAs of 23–24 nt are very common in small RNA-seq libraries, while miRNAs of 23–24 nt in length are rare and require extremely strong evidence to be classified as miRNAs (Axtell and Meyers, 2018), generally not available for grapevine novel miRNAs. The remaining 621 miRNAs 20–22 nt long were classified as 469 unique novel miRNAs (**Figure 1A** and Supplementary Table S3). The largest families of these novel miRNAs are vvi\_miC44-5p, with 10 members, identified by Pantaleo et al. (2010) and partially by Han et al. (2016), and vvi\_miC1038-5p, counting nine members and two isomiRs identified in several works (Supplementary Table S3). It must be taken into account that one-sixth of these novel miRNAs (110 on 621) showed either sequence similarity or some relationship with conserved miRNAs already deposited in miRBase (**Figure 1B** and Supplementary Table S3). This is the case of vvi\_miC999-3p, vvi\_miC1013, vvi\_miC1022, and vvi\_miC1076, which share the same sequences with vvi-miR3635-3p, vvi-miR171, vvimiR390, and vvi-miR3634-3p, respectively, although they are located in different loci. Moreover, 72 novel miRNAs showed sequence similarity with several conserved miRNAs (particularly vvi-miR535, vvi-miR477, and vvi-miR3631), although they originated from different precursors at different genomic locations (Supplementary Table S3). Twenty-four of these are isomiRs of conserved miRNAs, which may have been not correctly classified in the original works, and nine are reverse-complementary miRNAs (RC-miRNAs, generated from the antisense strands of the miRNA precursors) of conserved miRNAs or miRNAs<sup>∗</sup> (**Figure 1B** and Supplementary Table S3). In addition, other RC-miRNAs were detected in three pairs of novel miRNAs not related to conserved miRNAs: vvi\_miC265-5p/vvi\_miC266-5p, vvi\_miC343-5p/vvi\_miC344, and vvi\_miC427a/vvi\_miC427b. RC-miRNAs, originally detected in animals (Tyler et al., 2008), were rarely described in plants (Shao et al., 2012; Belli Kullan et al., 2015), and their biological function is not yet completely defined, even though a potential role in transcriptional regulation through a DNA methylation-mediated pathway was hypothesized (Shao et al., 2012).

Of the 621 novel miRNAs analyzed, miRNA<sup>∗</sup> sequences were identified for only 150 accessions, and 45 of these were found in at least two different works, e.g., in at least two different small RNA libraries originating from different genotypes or tissues (Axtell and Meyers, 2018). Hypothetically, only 5% (45 of 901) of the novel miRNAs previously identified in grapevine satisfied the stringent requirements and could be really considered as novel miRNA candidates (**Figure 1A** and Supplementary Table S4). However, in the following sections, all the 621 20–22 nt long novel miRNAs were analyzed and their functional role was studied in a biological experiment to validate the incidence of these miRNA candidates in V. vinifera cv. Barbera.

### Target Genes of Grapevine miRNAs

Besides novel miRNAs, the miRVIT database groups all the target genes of conserved and novel miRNAs validated so far in grapevine using either degradome sequencing or 5<sup>0</sup> RACE. Overall, 215 targets of conserved miRNAs and 113 targets of novel miRNAs were reported in Supplementary Table S5. In addition, the database was integrated by searching target transcripts of novel miRNAs (Supplementary Table S3) in the degradome library of "Pinot noir" (Pantaleo et al., 2010) using the web-based tool StarScan (sRNA Target Scan, Liu et al., 2015). A similar analysis was previously validated by confirming the degradome predictions through 5<sup>0</sup> RACE (Pantaleo et al., 2016), thus proving to be a more reliable approach than bioinformatics prediction alone. Using StarScan, 213 potential targets (seven of which were previously validated, see Supplementary Table S5) of 120 novel miRNAs were identified (Supplementary Table S6). Noteworthy, the novel miRNAs related to conserved miRNAs (**Figure 1B**) generally targeted the same transcripts of the conserved miRNA. For example, the targets of vvi\_miC1026 and vvi\_miC1027 that have sequences similar to vvi-miR396 are a number of transcripts encoding growth-regulating factors, whereas vvi\_miC1031-5p (similar to vvi-miR403) targeted VvAGO2 and vvi\_miC1000 (similar to vvi-miR156) targeted mRNAs encoding Squamosa promoter-binding proteins (Supplementary Table S6).

### Small RNA Sequencing and Identification of miRNAs in "Barbera" Leaf Midribs

After removing adapters, redundant and low-quality sequences, reads from nine small RNA libraries of "Barbera" leaf midribs ranging from 16 to 30 nucleotides (see the metrics of the libraries in Supplementary Table S7 and Supplementary Figure S2) were aligned onto sequences of miRNAs deposited in miRBase and against miRVIT (Supplementary Table S3). As expected, we identified members of almost all known miRNAs: 137 of 168 (73.6%) deposited in miRBase. The normalized reads of each conserved miRNA were reported in Supplementary Table S8, showing that several members of vvi-miR166 family and vvimiR3634-3p were the most expressed in "Barbera" leaf midribs. Of the 621 novel miRNAs previously identified in grapevine tissues, 308 were expressed at least in one "Barbera" small RNA library, and in 67 cases both miRNA and miRNA<sup>∗</sup> were identified (Supplementary Table S9). Interestingly, the probability of detecting previously published novel miRNAs

in "Barbera" was associated to specific factors, such as tissuetype, stress conditions, and genotype from which the novel miRNAs were originally isolated. In particular, 73.1% of the novel miRNAs originally isolated from either leaf or phloem tissues of grapevine (Alabi et al., 2012; Singh et al., 2012; Sun et al., 2015; Han et al., 2016; Pantaleo et al., 2016; Bester et al., 2017a,b; Snyman et al., 2017) were also detected in "Barbera" leaf midribs; on the contrary, only 53.8% of the novel miRNAs previously characterized in different organs and tissues (Carra et al., 2009; Pantaleo et al., 2010; Wang et al., 2011, 2012, 2014; Han et al., 2014; Belli Kullan et al., 2015; Paim Pinto et al., 2016; Pagliarani et al., 2017; Zhao et al., 2017) were found in the sequenced samples. Furthermore, 76.3% of the novel miRNAs identified in grapevine plants subjected to virus or phytoplasma infection (Alabi et al., 2012; Pantaleo et al., 2016; Bester et al., 2017a; Snyman et al., 2017) were also retrieved in FDp-infected samples of "Barbera" (Supplementary Table S9). Gene expression is notoriously influenced by tissue and environmental conditions; consequently, the probability to identify a miRNA in a small RNA library can be affected by these conditions as well. The genotype effect appeared more complicated to be clarified, as the database was biased to a higher number of works published on V. vinifera (Carra et al., 2009; Pantaleo et al., 2010, 2016; Alabi et al., 2012; Belli Kullan et al., 2015; Sun et al., 2015; Paim Pinto et al., 2016; Bester et al., 2017a,b; Pagliarani et al., 2017; Snyman et al., 2017) than on V. amurensis (Wang et al., 2012) or on V. pseudoreticulata (Han et al., 2016). While it was evident the low probability to identify in "Barbera" novel miRNAs originally published in V. amurensis (Wang et al., 2012), we did not observe a marked difference among novel miRNAs originally uncovered in V. vinifera, its hybrids, or V. pseudoreticulata (Supplementary Figure S3). Currently, the identification of novel miRNA loci in different Vitis spp. was biased to the use of PN40024 (Jaillon et al., 2007) or "Pinot noir" (Velasco et al., 2007) reference genomes, which show potential genomic differences relying on the cultivars or on other Vitis species in which the miRNAs were detected (Morgante et al., 2007; Gambino et al., 2017). Consequently, the hairpin prediction and the mapping of the reads produced for instance from V. pseudoreticulata on the "Pinot noir" genome (Han et al., 2016) allowed the prediction of only the miRNA loci common to V. vinifera cv. Pinot noir. The species-specific reads not mapping on the reference genome were generally discarded. This could be an explanation of the unexpected elevated percentage of miRNAs originally identified in V. pseudoreticulata and detected in "Barbera" (Supplementary Figure S3). In the near future, whether new Vitis genomes became available, it would be interesting to re-analyze small RNA libraries already published, in order to identify potential new species- or cultivar-specific miRNAs.

## Putative Novel miRNAs Identified for the First Time in "Barbera"

In addition to the novel miRNAs previously identified and annotated in miRVIT (Supplementary Table S2), other novel miRNA candidates were characterized for the first time in "Barbera" using the miRCat tool and the latest release of the V. vinifera genome. Excluding the miRNAs already present in miRBase and miRVIT, and considering only the 20–22 nt long miRNAs and those expressed in at least three different libraries with at least 10 non-normalized reads, 13 putative novel miRNAs were selected (hereafter called "Barbera"-novel miRNAs; **Table 1** and Supplementary Figure S4). In addition, four variants of novel miRNAs already included in miRVIT were identified. However, no miRNA<sup>∗</sup> sequences were found for "Barbera"-novel miRNAs, and no significant candidate target genes were retrieved starting from the degradome library of "Pinot noir" (Pantaleo et al., 2010) and using the StarScan tool.

# Expression Profiles of miRNAs in "Barbera" Leaf Midribs

Despite the high number of miRNAs expressed in our libraries, only 25 conserved (**Table 2**) and 32 novel miRNAs (**Table 3**) showed significant differences in at least one comparison among the three considered theses (FD vs. H, R vs. H, and FD vs. R). A possible explanation for this could rely on the highly variable response of the different biological replicates constituting single "Barbera" plants. Nevertheless, by following this approach, the significant expression differences investigated in this work should be more reliable than those obtained with other sampling methodologies. An HCL analysis involving these 57 miRNAs was conducted to investigate the relationships of similarity among the three experimental categories. H and R formed a separate clade from FD, and the miRNAs were clustered in three groups: Cluster 1 included miRNAs with higher accumulation in FD, Cluster 2 counted miRNAs showing a slight peak of expression in R, and Cluster 3 contained only six miRNAs with higher expression in H (**Figure 2**).

The expression changes of the most interesting miRNAs (**Tables 2**, **3**) and their already validated target genes (Supplementary Table S5) were further investigated by RTqPCR. In Cluster 1, the higher accumulation of miRNAs induced by FDp infection was confirmed for vvi-miR156, vvi-miR167, vvi-miR2950-5p, and vvi\_miC137 (**Figure 3**) and for vvimiR3632-3p, vvi-miR3623-5p, vvi-miR403, vvi\_miC360-3p, and vvi\_miC430-3p (Supplementary Figure S5). In addition, RT-qPCR assays conducted on the miRNAs belonging to Cluster 2, vvi-miR166, vvi-miR169, vvi-miR319, vvi-miR482, vvi\_miC1031-5p, vvi\_miC64 (**Figure 4**), vvi\_miC1038- 3p, vvi\_miC132-3p, vvi\_miC197-5p, vvi\_miC281-5p, and vvi\_miC413-5p (Supplementary Figure S6), confirmed the expression trends observed in sequencing results. The "Barbera" novel miRNAs were expressed in all groups of the considered experimental plan, and for six of them, significant differences were observed among the treatments (Supplementary Table S10). The RT-qPCR analyses conducted on a selection of these miRNAs, vvi\_miC606-3p, vvi\_miC617-3p, vvi\_miC644-3p, and vvi\_miC648-5p, were consistent with sequencing data, and vvi\_miC644-3p, which was the most expressed among the "Barbera"-novel miRNAs, was overexpressed in FD samples (Supplementary Figure S7). Further analyses are needed to prove


fpls-09-01034 July 13, 2018 Time: 16:10 # 7

TABLE 2 | Normalized number of reads (mean of three biological replicates) of conserved miRNAs in leaf midribs of FDp infected (FD), recovered (R), and healthy (H) "Barbera."


The asterisk denotes significant differences in the three comparisons FD/H, R/H, and FD/R using the Student's t-test (P ≤ 0.05).

their biological functions definitely. Indeed, the biological effect of miRNAs is generally mediated by their cleavage activity on specific target genes.

# Dissecting the Role of miRNAs in the Regulation of the FDp–Grapevine Interaction

miRNAs and target transcripts significantly modulated in the presence of FDp influenced the metabolic processes linked to photosynthesis, cellular development, jasmonate (JA) signaling, and defense responses. The reduction of photosynthesis efficiency caused by FDp infection, which was previously suggested by physiological experiments (Vitali et al., 2013), was likely induced by the down-regulation of specific genes, such as the photosystem I reaction center subunit II (VvPSI, VIT\_05s0020g03180) and the Photosystem II subunit X (VvPSBx, VIT\_04s0008g01730), encoding important proteins of the Photosystems I and II reaction centers, which were targeted by vvi\_miC137-3p and vvi-miR169, respectively (**Figures 3**, **4**). These results were consistent with the observed downregulation of a chlorophyllaseencoding gene (VvCHL, VIT\_07s0151g00250), targeted by the grape specific miRNA vvi-miR2950 and influencing chlorophyll catabolism in infected plants (**Figure 3**). Accordingly, both vvimiR2950 and the novel vvi\_miC137-3p were highly induced upon FDp infection, showing a modulation pattern that was negatively correlated to the photosynthetic efficiency measured in diseased grapevines (Pantaleo et al., 2016).

Among highly conserved miRNA categories involved in plant development, miR156 is a well-known regulator of Squamosa promoter-binding protein-like genes (SPL, Wu and Poethig, 2006; Ferreira e Silva et al., 2014). The SPL transcription factor family exerts a key role in controlling the transition of floral phases, plant architecture, and in the determination of leaf cell number and size (Chen et al., 2010). In grapevine, the VvSPL gene (VIT\_01s0010g03910) is a target of vvi-miR156 (Pantaleo et al., 2010), together with an expansin-encoding gene (VvEXPA14, VIT\_13s0067g02930, Wang et al., 2014), involved in cell wall modification under stress conditions (Dal Santo et al., 2013). Our results indicated that FD presence strongly inhibited these transcripts, differently from what observed for the expression trend of vvi-miR156 upon infection (**Figure 3**). It is thus conceivable that the FD-mediated activation of vvi-miR156 with the consequent repression of SPL genes could be at the base of the developmental alterations (i.e., shortening of internodes, smaller inflorescences, etc.) that are typical symptoms associated to FDp infection in grapevine (Caudwell, 1990). These data are also supported by previous evidences attesting that, besides being a master regulator of plant development, vvi-miR156 is able to dynamically respond to environmental stresses in

TABLE 3 | Normalized number of reads (mean of three biological replicates) of novel miRNAs in leaf midribs of FDp infected (FD), recovered (R), and healthy (H) "Barbera."


The asterisk denotes significant differences in the three comparisons FD/H, R/H, and FD/R using the Student's t-test (P ≤ 0.05).

different ways (Alabi et al., 2012; Sun et al., 2015; Pantaleo et al., 2016; Pagliarani et al., 2017). The strong upregulation here observed in FDp infected leaves is in contrast with data reported in grapevines infected by aster yellows (AY; Snyman et al., 2017). Nevertheless, in the associations of Ziziphus jujuba/Jujube witches'-broom (Shao et al., 2016), Mexican lime (Citrus aurantifolia L.)/Witches' broom disease (Ehya et al., 2013) and mulberry (Morus multicaulis Perr.)/mulberry yellow dwarf disease (Gai et al., 2014), miR156 was always overexpressed in infected plants, as in our study, thus highlighting its potential function in directing symptom development.

Another family of miRNAs typically tied to plant development is miR166, which regulates the broad class of III homeodomain leucine zipper (HD-ZIPIII) transcription factors. In particular, these genes were associated with Arabidopsis embryo and meristem development, organ polarity, and vascular development (Baucher et al., 2007 and the references therein). The levels of vvi-miR166 in FDp infected plants seemed not to justify the high transcriptional rates of Phabulosa (VvPHB, VIT\_10s0003g04670) and Revoluta (VvREV, VIT\_13s0019g04320) transcripts (**Figure 4**). A possible explanation could be linked to a direct activation of the promoters of these genes in response to FDp that may escape the miRNA-mediated regulation, as also suggested for other miRNA– target interactions in grapevine (Pantaleo et al., 2016; Pagliarani et al., 2017). In addition, it must be considered that the complex regulatory system "AGO10-mir166-HD-ZIPIII" uncovered in Arabidopsis, in which AGO10 specifically sequesters miR166 and antagonizes the silencing activity mediated by AGO1-miR166 against HD-ZIP III (Zhou et al., 2015), may also occur in grapevine. This hypothesis is consistent with the observed overexpression of VvAGO10 (VIT\_05s0020g04190) in FDp infected plants (inset in **Figure 4**), which could reduce the post-transcriptional activity of vvi-miR166 with a consequent increase in VvPHB and VvREV expression levels. In addition to development regulation, REV was also involved in leaf senescence

FIGURE 2 | Hierarchical clustering analysis of conserved and novel miRNAs differentially modulated among FDp-infected (FD), recovered (R), and healthy (H) leaf midribs of "Barbera" (p ≤ 0.05). The clustering was generated with the TMeV software (v. 4.9) using the average normalized number of reads of three biological replicates. The results were represented by heat map blue–yellow corresponding, respectively, to low and high miRNAs accumulation levels.

for target quantification. Lowercase and uppercase letters denote significant differences (p ≤ 0.05) among miRNAs and target expression levels, respectively, tested using Tukey's HSD test. Data are presented as mean ± standard error of five biological replicates (n = 5).

in Arabidopsis through the direct transcriptional induction of a WRKY transcription factor (Xie et al., 2014). The homologous WRKY gene in grapevine, VvWRKY1, was overexpressed in FD-infected samples (Gambino et al., 2013), in agreement with a potential role as trigger of defense responses by causing a decrease in the plant susceptibility to the pathogen presence, as indicated for some fungi (Marchive et al., 2013). The activity of VvWRKY1 in grapevine against fungi, and hypothetically against FDp, involved the activation of distinct sets of defense-related genes. In particular, in grapevine infected by FDp, salicylic acid (SA)-mediated signaling activation and JA repression were previously suggested (Gambino et al., 2013; Prezelj et al., 2016). This type of regulation was confirmed by the present work showing that could be influenced by the activity of some miRNAs. The alteration of JA biosynthetic pathway could be promoted by tuning the TEOSINTE BRANCHED/CYCLOIDEA/PCF (VvTCP, VIT\_12s0028g02520)/miR319 interplay (**Figure 4** and Supplementary Figure S7) through the downregulation of JA ZIM domain-containing protein (VvJAZ3, VIT\_01s0011g05560), target of both vvi-miR169 and the novel vvi\_miC197-5p (**Figure 4**), and an auxin response factor-encoding transcript (VvARF, VIT\_12s0028g01170), target of vvi-miR167 (**Figure 3**). The downregulation of VvTCP occurred almost exclusively in FDp-infected plants and in parallel with the significant reduction of both VvLOXA and 12-oxophytodienoate reductase (VvOPR3) expression (Supplementary Figure S8), two of the most important genes involved in JA biosynthesis (Wasternack and Hause, 2013). Interestingly, the interplay VvTCP/miR319 controls the biosynthesis of JA through the regulation of lipoxygenase (LOX) genes (Schommer et al., 2008). Consistently, the quantification of JA content in the same tissues displayed a progressive reduction or even absence of this metabolite in R and FD samples, respectively (Supplementary Figure S8).

The ARF/miR167 system is involved in the regulation of auxin responsive genes influencing both auxin signaling and JA pathway (Gutierrez et al., 2012; Boer et al., 2014, and the references therein). Consequently, in FDp-infected grapevine, the VvARF/vvi-miR167 regulation, associated to the observed decrease in JA content, could be the base of the flower alterations typical of infected plants, in agreement with other reports showing that the overexpression of miR167 in tomato induces floral developmental defects in a JA-dependent manner (Liu et al., 2014). Moreover, a clear correlation among auxin, JA, and phytoplasma was previously demonstrated

FIGURE 4 | Expression levels of vvi-miR166, vvi-miR169, vvi-miR319, vvi-miR482, vvi\_miC1031-5p, vvi\_miC64, and their respective target transcripts, VvPHB (VIT\_10s0003g04670), VvREV (VIT\_13s0019g04320), VvPSBx (VIT\_04s0008g01730), VvJAZ3 (VIT\_01s0011g05560), VvTCP (VIT\_12s0028g02520), VvMLA10 (VIT\_04s0023g02380), VvAGO2 (VIT\_10s0042g01180), VvPPO (VIT\_10s0116g00560), and VvAGO10 (VIT\_05s0020g04190, inset), in FDp-infected (FD), recovered (R), and healthy (H) leaf midribs of "Barbera." qRT-PCR signals were normalized to U6 and 5.8 rRNA for miRNA quantification, and to actin and ubiquitin transcripts for target quantification. Lowercase and uppercase letters denote significant differences (p ≤ 0.05) among miRNAs and target expression levels, respectively, tested using Tukey's HSD test. Data are presented as mean ± standard error of five biological replicates (n = 5).

in transgenic Arabidopsis plants containing the phytoplasma virulence effector tengu-su inducer (TENGU) of "Candidatus Phytoplasma asteris," onion yellows isolate (Hoshi et al., 2009). These Arabidopsis plants showed high levels of miR167 and a decreased expression of both ARF and LOX genes associated to symptom developments and reduction in JA and auxin levels (Minato et al., 2014). Similar results were obtained using transgenic Arabidopsis expressing the AY-WB protein 11 (SAP11), an effector produced by the Aster Yellows Witches' Broom (AY-WB) phytoplasma (Sugio et al., 2011). However, repression of JA pathway is not a generic response to phytoplasma infection, as in BNp-infected grapevines very different molecular and metabolic responses were reported. Indeed, JA biosynthesis was strongly activated in both BNprecovered and diseased grapevines over the whole vegetative season (Paolacci et al., 2017). In parallel, the activation of the SA signaling pathway was also observed, as previously reported (Hren et al., 2009: Dermastia et al., 2015). The

authors hypothesized an SA-mediated repression of the genes downstream the JA biosynthesis with relative inhibition of the JA signaling in infected plants. Additionally, the activation of JA-mediated pathway in recovered grapevines could support the role of this defense pathway in the maintenance of the recovery condition in former BNp-infected grapevines (Paolacci et al., 2017). Although this hypothesis may be interesting, it could be valid only for the BNp, as FDp induces very different responses in plant (i.e., induction of SA and JA repression); thus, further insights on the specific FDp-grapevine pathosystem will be necessary to investigate this point in depth.

The regulation of miRNAs and targets linked to disease resistance induced in some cases a reduction of these responses in FDp infected plants. For example, miR482 is involved in a feedback control of NBS-LRR genes in several species, lowering the energy cost of disease resistance process by downregulating the production of these genes in the absence of pathogens (González et al., 2015). In FDp-infected grapevines, we observed an opposite regulation of this system with reduction of VvMLA10 (VIT\_04s0023g02380) in the presence of the pathogen (**Figure 4**). A similar downregulation was noticed for VvAGO2 (VIT\_10s0042g01180) in FD samples. In grapevine, this gene is targeted by the novel vvi\_miC1031-5p (with a sequence similar to vvi-miR403) and likely by vvi-miR403, as observed in other plants (Harvey et al., 2011), although this has not been confirmed in grapevine yet. AGO2 has an antiviral role against several viruses in different species (Harvey et al., 2011), while its relationship with phytoplasma infection is not currently clear. The downregulation of AGO2 here reported in FD condition (**Figure 4**) could suggest that phytoplasma-infected plants may be more susceptible to virus infection compared to healthy ones. This hypothesis was not confirmed in BNp infected plants, where the presence of phytoplasma did not influence the virus infection (Rotter et al., 2018). However, considering the above-reported differences in molecular and metabolic responses between plants infected by BNp and FDp, we cannot exclude that FDp-infected plants are more sensitive to virus attack, thus opening the venue for future research studies specifically addressed to explore this response.

The novel vvi\_miC64-3p, significantly induced in R (**Figure 4**), is the miRNA<sup>∗</sup> of vvi\_miC64-5p controlling a Polyphenol oxidase II-encoding transcript (VvPPO, VIT\_10s0116g00560) tied to lignin biosynthesis and to resistance responses to fungi and bacteria (Harm et al., 2011). The vvi\_miC64-5p was overall more abundant and followed a different trend of accumulation than vvi\_miC64-3p. In particular, the over-accumulation of vvi\_miC64-5p in R and FD samples occurred in parallel with the downregulation of VvPPO transcription (**Figure 4**). This downregulation may negatively influence lignification of infected tissues, thus representing another example of down-regulation of a pathway involved in response to pathogens.

Although mechanisms of plant response to the pathogen still require further deepening, even less is known on the molecular bases driving the phenomenon of symptom recovery in grapevine infected by FD. Recovered grapevines are asymptomatic and phenotypically similar to H plants, and the expression levels of miRNAs generally showed a close relationship between R and H categories (**Figure 2**). However, for some combinations of miRNAs/targets, this relationship was not observed. For example, VvCHL and VvPSBx, targeted by vvi-miR2950-5p and vvi-miR169, respectively, were downregulated in R suggesting a reduced photosynthetic efficiency or a not complete recovery of the gas exchange performances in these plants, as previously reported (Vitali et al., 2013). In addition, the lower levels of JA in R grapevines, associated to the modulation of the vvimiR319/VvTCP complex, point out that the recovery condition might be related to repression of JA signaling pathways even after 2 years from the disappearance of FDp. Taken together, these results suggested the persistence for long time of a "molecular memory" of the former phytoplasma infection; further investigations are needed to verify whether this "memory" could be effective to prime the recovered plants from new FDp infection events.

# CONCLUSION

In this work, we univocally cataloged for the first time all novel miRNAs identified so far in grapevine by producing a single database, miRVIT, with the final aim to put order in an intriguing research field that has been developing and evolving continuously over recent years. In addition to conserved miRNAs, some novel miRNAs detected by several authors (vvi\_miC64, vvi\_miC137) were shown to be bona fide miRNAs, spread among different genotypes, potentially regulated by diverse environmental conditions and likely playing important biological functions that should be further investigated in the future through ad hoc functional studies. miRVIT will integrate several existing bioinformatics resources designed for transcriptome, small RNAs and functional analysis in grapevine (Grimplet et al., 2009; Dereeper et al., 2011; Wong et al., 2013; Belli Kullan et al., 2015; Pulvirenti et al., 2015; Moretto et al., 2016). We also demonstrated that the application of miRVIT to the analysis of miRNAs from FDp-infected grapevines was effective for pinpointing complex interactions among miRNAs and related targets specifically linked to disease evolution. In particular, we evidenced that inhibition of cell development and photosynthetic processes (vvi-miR156/VvSPL, VvAGO10/vvi-miR166/VvPHB, and vvi\_miC137-3p/VvPSI) could be finely tuned by miRNAs in infected plants together with regulation of the crosstalk between JA signaling pathways (vvi-miR319/VvTCP and vvi-miR167/VvARF) and disease resistance response (vvimiR482/VvMLA10, vvi\_miC1031-5p/VvAGO2, and vvi\_miC64- 5p/VvPPO).

### AUTHOR CONTRIBUTIONS

GG, IP, and CM conceived the study. GG produced the miRVIT database. GB developed the database platform. SA performed the bioinformatic analyses. WC and CP performed most of

the molecular analyses and elaborated the corresponding data. PB, MR, SP, and IP helped with the molecular analyses and complemented the writing. GG wrote the manuscript with the help of WC, CP, and IP. All authors read and approved the manuscript.

### FUNDING

This research was supported by the project "Inteflavi: un approccio integrato alla lotta contro la flavescenza dorata della vite," jointly funded by Fondazione Cassa di Risparmio di Cuneo, Fondazione Cassa di Risparmio di Torino, and Fondazione Cassa di Risparmio di Asti. IP gratefully acknowledges the financial support by the Italian Ministry of Education, University and

### REFERENCES


Research (MIUR), FIR project RBFR13GHC5: "The epigenomic plasticity of grapevine in genotype per environment interactions."

### ACKNOWLEDGMENTS

The authors thank Marco Martini for conceiving and providing the miRVIT logo.

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2018.01034/ full#supplementary-material

growth and development. J. Integr. Plant Biol. 52, 946–951. doi: 10.1111/j.1744- 7909.2010.00987.x




strategy of modified RLM-RACE, newly developed PPM-RACE and qPCRs. J. Plant Physiol. 170, 943–957. doi: 10.1016/j.jplph.2013.02.005


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Chitarra, Pagliarani, Abbà, Boccacci, Birello, Rossi, Palmano, Marzachì, Perrone and Gambino. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The MicroRNA319d/TCP10 Node Regulates the Common Bean – Rhizobia Nitrogen-Fixing Symbiosis

José Á. Martín-Rodríguez, Alfonso Leija, Damien Formey and Georgina Hernández\*

Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Mexico

Micro-RNAs from legume plants are emerging as relevant regulators of the rhizobia nitrogen-fixing symbiosis. In this work we functionally characterized the role of the node conformed by micro-RNA319 (miR319) – TEOSINTE BRANCHED/CYCLOIDEA/PCF (TCP) transcription factor in the common bean (Phaseolus vulgaris) – Rhizobium tropici symbiosis. The miR319d, one of nine miR319 isoforms from common bean, was highly expressed in root and nodules from inoculated plants as compared to roots from fertilized plants. The miR319d targets TCP10 (Phvul.005G067950), identified by degradome analysis, whose expression showed a negative correlation with miR319d expression. The phenotypic analysis of R. tropici-inoculated composite plants with transgenic roots/nodules overexpressing or silencing the function of miR319d demonstrated the relevant role of the miR319d/TCP10 node in the common bean rhizobia symbiosis. Increased miR319d resulted in reduced root length/width ratio, increased rhizobial infection evidenced by more deformed root hairs and infection threads, and decreased nodule formation and nitrogenase activity per plant. In addition, these plants with lower TCP10 levels showed decreased expression level of the jasmonic acid (JA) biosynthetic gene: LOX2. The transcription of LOX2 by TCPs has been demonstrated for Arabidopsis and in several plants LOX2 level and JA content have been associate with TCP levels. On this basis, we propose that in roots/nodules of inoculated common bean plants TCP10 could be the transcriptional regulator of LOX2 and the miR319d/TCP10 node could affect nodulation through JA signaling. However, given the complexity of nodulation, the participation of other signaling pathways in the phenotypes observed cannot be ruled out.

Keywords: microRNAs, legume–rhizobia interaction, symbiotic nitrogen fixation, nodules, common bean, Phaseolus vulgaris

# INTRODUCTION

Legumes are ecologically important because of their ability to establish an efficient symbiotic association with nitrogen-fixing rhizobia, resulting in the formation of root nodules, where rhizobia can fix the atmospheric dinitrogen (N2) in forms that can be assimilated by the plant, in exchange for a carbon source. Symbiotic nitrogen fixation (SNF) reduces the cost of legume cultivation and is relevant for sustainable agriculture (Venkateshwaran et al., 2013). The evolution of this symbiosis was a key to success for the legume family that comprises 18,000 described species

### Edited by:

Ana Confraria, Instituto Gulbenkian de Ciência (IGC), Portugal

### Reviewed by:

Gary Stacey, University of Missouri, United States Ilker Buyuk, Ankara University, Turkey Carla Schommer, CONICET Instituto de Biología Molecular y Celular de Rosario (IBR), Argentina

> \*Correspondence: Georgina Hernández gina@ccg.unam.mx

### Specialty section:

This article was submitted to Plant Cell Biology, a section of the journal Frontiers in Plant Science

Received: 14 May 2018 Accepted: 23 July 2018 Published: 10 August 2018

### Citation:

Martín-Rodríguez JÁ, Leija A, Formey D and Hernández G (2018) The MicroRNA319d/TCP10 Node Regulates the Common Bean – Rhizobia Nitrogen-Fixing Symbiosis. Front. Plant Sci. 9:1175. doi: 10.3389/fpls.2018.01175

**139**

with approximately 700 genera and represents one-third of the primary crop production in the world; however, legume production necessary for feed and food relies on only a few cultivated species (Doyle and Luckow, 2003). Phaseolus vulgaris, known as common bean, is the principal source of nonanimal protein for human consumption in the developing world (Broughton et al., 2003). Besides the caloric and proteic intake, common bean grains have high contents of fiber, complex carbohydrates and other dietary elements as minerals, thiamine, folate, and a variety of flavonoids and secondary metabolites with medicinal properties (Blair et al., 2013).

In recent years, several studies have shown different classes of small non-coding RNAs (sRNA) that act as essential regulators of gene expression in plants. MicroRNAs (miRNA) are a major class of sRNA, 21 – 24 nt in length, that regulate gene expression post-transcriptionally through sequence complementarity, either via target transcript cleavage or translational inhibition. Plant miRNAs are involved in most, if not all, biological processes such as development, hormone regulation, nutrient homeostasis and interaction with pathogens and symbionts (reviewed by Jones-Rhoades and Bartel, 2004; Rogers and Chen, 2013; Li et al., 2017). Growing evidence supports the participation of miRNAs in the control of the legume-rhizobia symbiosis (Lelandais-Brière et al., 2016). Studies based on high-throughput sRNA sequencing have identified miRNA families that are expressed in nodules from different legume species (Subramanian et al., 2008; Lelandais-Brière et al., 2009; De Luis et al., 2012; Turner et al., 2012). For common bean we identified a set 185 mature miRNAs, 106 of this, including 50 previously unpublished sequences, were present in nodules (Formey et al., 2015). Aiming to understand the role in nodules of newly identified common bean miRNAs we constructed weighted correlation networks of miRNAs with differential expression in the nodule library as compared to other libraries. The networks include miRNAs known to play regulatory roles in nodules suggesting a similar role for novel miRNAs (Formey et al., 2015). One of these weighted correlation networks included an isoform of the miR319 family, pvu-miR319d, initially described in soybean (Wong et al., 2011). In this work we analyzed the role of miR319d from common bean in symbiosis with R. tropici.

Though the majority of plant miRNAs have been identified by large-scale sequencing strategies and bioinformatics approaches based on the conservation of fold-back precursors (Jones-Rhoades and Bartel, 2004), Arabidopsis miR319a is an exception as it was isolated through the screening of an activation-tagging T-DNA transgenic population that generated dominant gainof–function mutations (Weigel et al., 2000). The first described plant miRNA mutant, jaw-D, overexpresses miR319a that is one of the first characterized and conserved plant miRNA families. It was demonstrated that the conserved miR319 targets are the plant-specific transcription factors (TF) TCP (for TEOSINTE BRANCHED/CYCLOIDEA/PCF) (Palatnik et al., 2003). The TCP domain codes a DNA-binding motif that folds into a basic helixloop-helix structure (Cubas et al., 1999). The TCP TFs participate in various important aspects of plant development, especially the control of cell division, expansion, and differentiation during leaf development but also other important functions such as mitochondrial biogenesis, leaf senescence and floral development (Palatnik et al., 2003; Schommer et al., 2008, 2012; Martín-Trillo and Cubas, 2010). The TCPs can be subdivided in two main branches (class I and II) according to their sequence in the TCP domain. In Arabidopsis, the TCPs comprise a family of 24 members; only five of these (TCP2, TCP3, TCP4, TCP10, and TCP24) belonging to class II are targets of miR319 (Palatnik et al., 2003; Schommer et al., 2008).

The networks of TFs regulated by miRNAs can interact with others during plant development (Rubio-Somoza and Weigel, 2011). Several studies have revealed the interaction of the miR319/TCP node with miR164 and miR396 (Schommer et al., 2012). In Arabidopsis TCPs, belonging to class II, directly activate the transcription of MIR396; this miRNA targets GROWTH-REGULATING FACTORS (GRF) TFs that in turn regulate cell proliferation via the control of cell cycle genes (Rodríguez et al., 2010).

The leaf morphogenesis process that is regulated by the miR319/TCP node has been linked with other processes such as jasmonic acid (JA) biosynthesis and senescence (Schommer et al., 2012). JAs are lipid-derived signaling molecules in plants that regulate diverse responses to wounding, pathogen attack, reproduction, development, metabolic regulation and abiotic stress (Devoto and Turner, 2003; Howe, 2004). The participation of JA in the legume-rhizobium symbiosis has been reported in several studies (Sun et al., 2006; Seo et al., 2006; Poustini et al., 2007; Ferguson and Mathesius, 2014). However these studies are yet inconclusive, collectively these appear to indicate that JAs can act as either positive or negative regulators of nodulation and nitrogen fixation, depending on the legume species, the type of JA used, and when, where, and how the hormone is applied (Ferguson and Mathesius, 2014). The first dedicated step in the biosynthesis of JA is catalyzed by lipoxygenases encoded by the LOX genes. In Arabidopsis LOX2, and other three LOX genes, encode chloroplast-localized lipoxygenases that catalyze the conversion of α-linolenic acid (18:3) into (13S)-hydroperoxyo-linolenic acid. The LOX2 is one of the most affected genes in the transcriptome of tcp loss-of-function Arabidopsis mutants (Schommer et al., 2012). It has been demonstrated that Arabidopsis TCPs recognize specific binding sites present in the LOX2 promoter to directly regulate its transcription (Schommer et al., 2008; Danisman et al., 2012). Other JA biosynthetic genes also respond to miR319/TCP levels include the ALLENE OXIDE SYNTHASE (AOS) that catalyzes the conversion of 13-hydroperoxy-linolenic acid to an unstable allene oxide intermediate (Schommer et al., 2008; Zhang et al., 2016).

To our knowledge the participation of the miR319/TCP node as regulator of the legume – rhizobia symbiosis has not been reported. In this paper we analyzed the role of the common bean miR319d isoform and its target TCP10 in the symbiosis with Rhizobium tropici. We confirmed the high expression of miR319d in roots/nodules of SNF common bean as compared to tissues from fertilized (non-inoculated) plants. The functional analysis of composite common bean plants with modulated expression of this miRNA revealed the effect of the miR319d/TCP10 node

in root development, rhizobia infection, nodulation and SNF. These effects could be related with observed alterations in the expression of LOX2, a JA biosynthetic gene, and the participation of JA in the regulation of different stages of symbiosis with Rhizobium.

# MATERIALS AND METHODS

### Phylogenetic Analysis

miR319 isoform sequences from Phaseolus vulgaris were obtained from the small RNAseq analysis performed by Formey et al. (2015), where each miR319 isoform was referred as designated in the plant species it was discovered. The **Supplementary Table S1** shows the equivalence among the nomenclatures from this work and those from Formey et al. (2015) for each P. vulgaris miR319 isoform sequence.

TCP protein sequences were obtained from the Phaseolus vulgaris release v2.1, from Phytozome 12 database<sup>1</sup> . Sequence alignments were performed thanks to MAFFT online service v7 (Katoh et al., 2017) with L-INS-i option set. Construction of phylogenetic tree of miR319 isoforms and TCP protein sequences were based on the average linkage (UPGMA) method and Neighbor-Joining JTT model, respectively. Bootstrap values were obtained after 100 resampling.

# Plasmid Construction, Plant Transformation and Generation of Composite Plants

The overexpression and silencing of miR319d function in common bean transgenic roots were carried out using the pTDTO plasmid (Aparicio-Fabre et al., 2013). This expression plasmid contains the 35S cauliflower mosaic virus (35SCaMV) promoter and the tdTomato (red fluorescent protein) gene as a visible reporter gene. The precursor of miR319d (286-bp) was PCR-amplified using as template cDNA from common bean nodules and the specific primers Fw-pre319d (50 -ATGGATCCTGATACTAGAGTACAGGGAGA-3<sup>0</sup> ) and R-pre319d (5<sup>0</sup> -TCTCGAGTTGTGTGTATGTATTAATATTAATG-3 0 ). To silence miR319 function the "Short Tandem Target Mimicry" (STTM) method (Yan et al., 2012) was employed using the specific primers Fw-STTM319d (5<sup>0</sup> - ATGGATCCGAAGGAGCTCCCTACCTTCAGTCCAGTTGTT GTTGTTATGGTCTAATTTAAATATGGTC-3<sup>0</sup> ) and R-STT M319d (5<sup>0</sup> -ACTCGAGTGGACTGAAGGTAGGGAGCTCCTTC ATTCTTCTTCTTTAGACCATATTTAAATTAGACC-3<sup>0</sup> ). The purified PCR products were cleaved by XhoI and BamHI sites and cloned into the pTDTO expression vector. The empty vector pTDTO, hereafter denominated EV, and the resulting OEmiR319d and STTMmiR319d plasmids were introduced by electroporation into Agrobacterium rhizogenes K599, which was then used for plant transformation as described previously (Estrada-Navarrete et al., 2007) with minor modifications (Aparicio-Fabre et al., 2013). The presence of red fluorescence

### Plant Material and Growth Conditions

The common bean (P. vulgaris L.) Mesoamerican cv BAT 93 was used in this work. Seeds were surface sterilized in 10% (v/v) commercial sodium hypochlorite for 5–10 min and finally rinsed 5–6 times in sterile distilled water. Subsequently seeds were germinated on moist sterile paper towels at 30◦C for 2–3 days in darkness. Germinated seedlings of similar size were planted in pots with wet sterile vermiculite. After 2 days of adaptation plants were inoculated with 1 ml saturated liquid culture of the Rhizobium tropici CIAT 899 strain per plant. Plants were grown in growth chambers under controlled environmental conditions (25–28◦C, 16 h photoperiod) and were watered every 3 days with N-free B&D nutrient solution (Broughton and Dilworth, 1971). For fertilized and non-inoculated condition, full nutrient B&D solution was used to water the plants. Common bean composite plants with transgenic roots were generated as described below and grown in similar conditions to those for wild-type plants. Plants were harvested at different time points for analysis; tissues for RNA isolation were collected directly into liquid nitrogen and stored at −80◦C.

### RNA Isolation and Analysis

Total RNA was isolated from 100 mg tissues using mirVanaTM miRNA Isolation Kit (Ambion) following the supplier's recommendations. For R. tropici-inoculated BAT 93 plants, the tissues used for RNA isolation were roots separated from nodules and detached nodules. For R. tropici-inoculated composite plants RNA was isolated from transgenic nodulated root system. Three samples (biological replicates) for each tissue from different plants grown under similar conditions were analyzed.

For the quantification of mature miRNA transcript accumulation levels, cDNAs were prepared using RevertAid reverse transcriptase (Fermentas) following the stem-loop method (Kramer, 2011). Stem-loop primers for reverse transcription of miRNAs were designed as reported by Chen et al. (2005). The conditions used were: denaturation at 65◦C for 5 min, then 16◦C for 30 min; 60 cycles of 30◦C for 30 s, and 42◦C for 30 s, 50◦C for 1 s followed by 70◦C for 15 min. Primers for qRT-PCR amplification are listed in **Supplementary Table S2**. Resulting cDNAs were then diluted 10-fold and used to perform the qRT-PCR experiments using SYBR Green qPCR Master Mix (Fermentas) following manufacturer's instructions. The reaction mix was then dispensed in a 96 well plate and analyzed using real-time thermocycler Applied Biosystem 7300 (Foster City, CA, United States). The thermal cycler settings were as follows: 94◦C for 1 min, followed by 40 cycles of 94◦C for 20 s and 60◦C for 60 s. Relative transcript levels for each sample were obtained using the 'comparative C<sup>t</sup> method' and normalized with the geometrical mean of three housekeeping genes (HSP, MDH, and UBQ9) (Vandesompele et al., 2002) and the U6 sRNA, for the mRNA transcripts and the miRNAs, respectively. In all of our qRT-PCR analyzes a well-defined melting curve was obtained both for miRNAs and for cDNAs. A Mann-Whitney statistical test was performed to evaluate the significance of

resulting from the of the tdTomato reporter gene was routinely checked in the putative transgenic roots using light microscopy.

<sup>1</sup>https://phytozome.jgi.doe.gov/pz/portal.html

the differential expression using the mean values from three biological replicates for each condition, using the GraphPad Prism program.

# Phenotypic Analysis

Nitrogenase activity was determined in detached nodulated roots form composite plants by the acetylene reduction assay essentially as described by Hardy et al. (1968). Specific activity is expressed as nmol ethylene h−<sup>1</sup> per plant. The root fresh weight, area, length and root width were determined in composite plants grown, for 24 dpi, under symbiotic conditions. The quantification of root hair deformation and induction of infection thread upon rhizobial inoculation was performed in samples from the responsive zone of roots inoculated with R. tropici CIAT 899 for 2 or 6 days. Samples were collected into PBS buffer and were stained with 0.01% (w/v) Methylene Blue for 1 h and washed three times with double-distilled water. Infection events were observed in the optical microscope Axioskop 2 (Zeiss), at least 5 different root responsive zone samples (biological replicates) were used for analysis. Statistical analyses were performed using the Mann–Whitney null hypothesis statistical test.

## Prediction of Transcription Factors Binding Sites (TFBS)

To predict the transcription factors that could regulate the LOX2 and LOX5 genes, we performed an analysis using Clover (pre-2010 version, Frith et al., 2004) and the plant JASPAR CORE motif library (Sandelin and Wasserman, 2004) on the 4 kb upstream sequence of 50UTR end of the corresponding genes. Predicted motifs with a p-value > 0.05 were discarded.

# RESULTS

# Common Bean miR319 Isoforms and Target Genes

The Arabidopsis genome contains three loci that generate miR319 isoforms ath-miR319a to ath-miR319c, while seventeen miR319 isoforms are reported for soybean (Glycine max) (v.22)<sup>2</sup> . The high-throughput small RNA (sRNA) sequencing analysis by Formey et al. (2015) identified nine miR319 isoforms in common bean. Of these, five isoforms have been identified in soybean, a legume related to common bean and one was identified in grape (Vitis vinifera) (v.22)<sup>2</sup> , while four are new miR319 isoforms similar to the soybean isoforms gma-miR319c, d or f. Initially their nomenclature referred to the species where each miR319 isoforms were identified, while in this work we propose the pvumiR nomenclature for the P. vulgaris miR319 isoforms (**Figure 1** and **Supplementary Table S1**). The common bean miR319 isoforms, 18–22 nucleotides long, showed sequence similarities and are grouped into two well-differentiated clades (**Figure 1**). One clade included mature miRNAs with four guanosines in their central region nucleotide sequence while in the other clade only three guanosines were observed. Each clade grouped two of the novel common bean miR319 isoforms.

Analyses of sRNA-seq data from libraries generated from different plant organs have identified conserved and legumespecific miRNA families differentially expressed during nodule organogenesis in different legumes (De Luis et al., 2012; Turner et al., 2012; Formey et al., 2014). Our previous reports (Formey et al., 2015) revealed three isoforms of common bean miR319 with higher expression level in nodules as compared to roots: pvu-miR319e (35-fold), pvu-miR319q (189-fold) and pvu-miR319d (424-fold) (**Figure 1**). In addition to showing the highest nodule/root expression ratio, pvu-miR319d isoform was included by Formey et al. (2015) in a weighted correlation network of common bean miRNAs with significantly increased expression in the nodule library as compared to other libraries. These features lead us to select the common bean pvu-miR319d isoform for this study aiming to characterize its regulatory role in the rhizobia symbiosis. The pvu-miR319d, hereafter referred as miR319d, was initially identified via high-throughput sequencing data and annotated in the soybean miRBase database by Wong et al. (2011). In common bean the gene encoding miR319d was mapped to chromosome 9 (nucleotides 8534451-8534637), it generates a 187-nucleotides pre-miRNA with bona fide stem-loop secondary structure that give rise to the 22-nucleotides mature miRNA encoded close to the 3<sup>0</sup> end of pre-miR319d (Formey et al., 2015).

The conserved targets for miR319 in different plants are transcripts that encode transcription factors (TF) of class II subclass of the TCP family. Of the 24 Arabidopsis TCP TF genes, five contain a target site for miR319 (TCP2, TCP3, TCP4, TCP10, and TCP24) that, in every case, is located outside the TCP domain and near the 3<sup>0</sup> part of the coding region (Schommer et al., 2012). In soybean, 14 TCP TF genes have been proposed as miR319 targets (Song et al., 2011; Goettel et al., 2014). Sequence analyses from genomic and transcriptomic data (O'Rourke et al., 2014; Schmutz et al., 2014), led us to identify 27 TCP TF genes for common bean (**Figure 2**). From the whole set (27) we have identified 4 TCP genes with putative miR319 binding sites near their 3<sup>0</sup> part of their coding sequence (Formey et al., 2015). Interestingly, the TCP predicted targets of miR319 were organized in a single clade of the phylogenetic tree (**Figure 2**). From these, Phvul.011G136115, Phvul.005G067950 and Phvul.011G156900 were identified as targets in a degradome analysis (Formey et al., 2015). The base pairing of each predicted TCP target gene with the miR319d isoform as well as their expression level in roots and nodules, (v12.1.6, Phaseolus vulgaris v2.1)<sup>3</sup> are shown in **Table 1**. Phvul.005G097200 and Phvul.011G136115 transcripts showed several mismatches thus a high penalty score for miRNA:mRNA pairing (**Table 1**) that would not fulfill the requirements for a miR319d target according to Jones-Rhoades and Bartel (2004). Phvul.011G136115 was not expressed and Phvul.005G097200 showed similar expression in roots and nodules (**Table 1**). Contrastingly, Phvul.005G067950 and Phvul.011G156900 transcripts showed similar low penalty score, a perfect base pairing in the 5<sup>0</sup> miRNA region and a hybridization energy value of −41.05 Kcal/mol (Jones-Rhoades and Bartel, 2004;

<sup>3</sup>www.phytozome.net

<sup>2</sup>www.mirbase.org


FIGURE 1 | Alignment of mature microRNA sequences from Phaseolus vulgaris miR319 family. (Left) Panel contains a tree representing the phylogenetic relationship between the miRNA sequences. (Central) Panel contains the aligned microRNA sequences. Dashes represent the mismatches of the alignment. (Right) Panel is a table containing the expression level (RPM) of each microRNA in the Root and Nodule libraries published in Formey et al. (2015).

Hammell et al., 2008). The root/nodule expression profile was different among these transcripts, Phvul.011G156900 showed low expression level and slightly higher in nodules while Phvul.005G067950 showed higher expression level in roots than in nodules (**Table 1**). Combined together, these results converge toward the fact that miR319d induces the cleavage of its target Phvul.005G067950. On this basis, we selected this TCP gene, hereafter denominated TCP10, as the target of miR319d for the analysis of this node as possible regulator in the common bean – rhizobia symbiosis. However, the targeting of miR319d to Phvul.011G156900 in other plant organs or growth conditions cannot be ruled out.

## Expression Analysis of miR319d and TCP10 During Rhizobia Symbiosis

The role of miR319 in leaf/shoot development has been well documented for Arabidopsis and other plants (Schommer et al., 2012, 2014; Koyama et al., 2017). However, although miR319 isoforms have been identified in roots from different plants, its

### TABLE 1 | Common bean TCP transcripts with miR319 binding sites.


Pairing of four TCP putative target genes identified with a miR319 binding site (Formey et al., 2015). Watson-Crick base pairing is indicated by two dots, G-U base pairing by one dot and mismatches are empty. Penalty scores shown in parenthesis were calculated as described by Jones-Rhoades and Bartel (2004). Expression values (FPKM, Fragments Per Kilo Million) from Phaseolus vulgaris release v2.1 from Phytozome 12 database, are shown.

possible regulatory role in this organ has not been extensively studied. There are some reports on miR319 response to heavy metals, to ethylene or to pathogens in roots of different plants (Valdés-López et al., 2010; Chen et al., 2012; Li et al., 2012; He et al., 2014).

To assess the possible role of common bean miR319d/TCP10 in the rhizobia symbiosis we performed a qRT-PCR expression analysis in roots separated from nodules and in detached nodules of R. tropici-inoculated common bean plants at different stages of development. The expression level of miR319d/TCP10 in roots from fertilized (non-inoculated) plants was included for comparison (**Figure 3**). The different developmental stages of root and nodules from inoculated plants could be defined by the differential expression of an early-nodulin gene and with the level of nitrogenase activity (**Supplementary Table S3**). ENOD40 (Phvul.002G064166), which lacks an open reading frame but encodes two small peptides and may function as a cell-cell signaling molecule for nodulation (Wang et al., 2014) showed a very low expression in fertilized roots that contrasts with its increased transcript level in inoculated roots at early stages (3 dpi) and the level persists until decreasing in senescent nodule (35 dpi). Also, ENOD40 showed a high expression in nodules, especially in immature nodules (15 dpi). Nitrogenase showed its highest activity in mature (21 dpi) nodules as compared to immature (15 dpi) and senescent (35 dpi) nodules (**Supplementary Table S3**). As shown in **Figure 3A**, the fertilized root showed a low expression level of miR319d, that remains constant in the different developmental stages. In contrast, the inoculated roots showed significantly increased expression of miR319d at all stages. In nodules, miR319d expression level varied according to its developmental stages being low in immature nodules, increasing (ca. 2-fold) in mature nodules and slightly decreasing in senescent nodules (**Figure 3A**).

We determined the transcript levels of the miR319d target gene TCP10 in similar tissues from fertilized and inoculated common bean plants (**Figure 3B**). TCP10 transcript level was high in fertilized roots and it remained constant during the different time points. However TCP10 expression level in roots/nodules from inoculated plants was significantly lower, reaching the lowest values in senescent nodules. The data obtained by qRT-PCR expression analysis validated those previously reported from RNA-seq data analysis (O'Rourke et al., 2014; Formey et al., 2015). Overall, a negative correlation was observed between the miR319d vs. TCP10 expression levels in fertilized as compared to inoculated plants (**Figures 1**, **3** and **Table 1**).

To complete our study, we analyzed the expression of the Phvul.011G156900 TCP gene an alternative putative target of miR39d (**Table 1**). The expression analysis of this TCP gene, that validated previous data<sup>4</sup> , revealed low values, up to 60 fold lower as compared to those for TCP10; the values did not vary among the different tissues and time points tested for inoculated and fertilized plants (**Supplementary Figure S1**). In addition, no negative correlation between Phvul.011G156900 and miR319d expression was observed. These data support our analysis of TCP10 as the target gene of miR319d in common bean roots/nodules.

### Effect of the Modulation of miR319d Expression on Root Development and Rhizobial SNF

To further investigate the role of miR319d/TCP10 in the common bean – rhizobia SNF, we aimed to modulate the miR319d expression in common bean composite plants with transgenic roots and untransformed aerial organs, generated through A. rhizogenes-mediated genetic transformation (Estrada-Navarrete et al., 2007). This protocol, used as an alternative method for stable transformation in common bean and other recalcitrant legume species, has been successfully used by our group to demonstrate miRNA functionality

<sup>4</sup>www.phytozome.net

FIGURE 3 | Expression analysis of common bean miR319d and TCP10 target gene in roots (blue histograms) and nodules (orange histograms) of R. tropici CIAT899-inoculated plants and root (white histograms) of fertilized (non-inoculated) plants. Expression level of mature miR319d (A) and its target gene TCP10 (B) were determined by qRT-PCR in inoculated roots or nodules harvested at the indicated time points, corresponding to days post inoculation for inoculated plants or days after planting for fertilized plants. Expression level refers to gene expression, based on Ct value, normalized with the expression of the housekeeping U6 for miR319d and UBC9, HSP and MDH for TCP10 gene. Values represent means ± SD from three biological replicates and two technical replicates each. The Mann-Whitney null hypothesis statistical test is relative to data from fertilized plants from the same harvest (<sup>∗</sup> and ∗∗ represent a p-value < 0.05 and p-value < 0.01, respectively).

(Valdés-López et al., 2008; Naya et al., 2014; Nova-Franco et al., 2015). The construct for overexpression of the miR319d precursor (OE319d) and the function silencing function construct (STTM319d) (Yan et al., 2012) were driven by the 35SCaMV promoter. Both constructs as well as the control empty vector (EV) contained the tdTomato (red fluorescent protein) reporter gene (Naya et al., 2014). The transcript level of miR319d and TCP10 from transgenic roots of rhizobiainoculated composite plants transformed with OE319d and STTM319d constructs, normalized to the transcript level values from EV transgenic roots, are shown in **Figure 4A**. As expected, the OE319d composite plants showed very high level of miR319d and a decreased level of TCP10 transcript. Conversely, the STTM319d plants showed low levels of miR319d and increased TCP10 transcript levels. Also, we determined

FIGURE 4 | miR319d and TCP10 affect root morphology of R. tropici-inoculated common bean plants. (A) Transcript levels of mature miR319d (diagonally striped histograms) and TCP10 (dotted histograms) were determined by qRT-PCR in 24 dpi inoculated roots of composite plants transformed with OE319d or STTM319d constructs. Values (log2) were normalized to the value from control or EV-transformed inoculated roots that was set to 0. Values represent means ± SD from three biological replicates and two technical replicates each. (B) Roots dry weight (DW), area and length/width ratio from composite bean plants with transgenic roots overexpressing (OE319d, black histograms) or silencing miR319d (STTM319d, gray histograms) as compared with control roots (EV-transformed, white histograms). Values represent means ± SD from nodulated roots of sixteen to twenty independent composite plants each. The Mann–Whitney null hypothesis statistical test is relative to EV control data (<sup>∗</sup> and ∗∗ represent a p-value < 0.05 and p-value < 0.01, respectively).

gray histograms: STTM319d, EV white histograms. Values represent means ± SD from roots of 10 to 20 independent composite plants each. The Mann–Whitney null hypothesis statistical test is relative to EV control data (<sup>∗</sup> and ∗∗ represent a p-value < 0.05 and p-value < 0.01, respectively).

the transcript levels of the Phvul.011G156900 TCP gene, the alternative putative target of miR39d (**Table 1**), the values in OE319d and STTM319d transgenic roots were not significantly different from those of EV (**Supplementary Figure S1**). These data again indicate that TCP10 (Phvul.005G067950), and not Phvul.011G156900, is the target gene of miR319d in common bean roots/nodules.

We first assessed if the modulation of miR319d expression affects the root phenotype of R. tropici-inoculated common bean plants. As compared to control EV roots, the roots with low miR319d (STTM319d) showed decreased root biomass and area as well as higher length/width ratio due to longer and less dense roots (**Figure 4B** and **Supplementary Figure S2**). By contrast, the OE319d roots showed lower length/width ratio and similar root biomass and area as compared to control roots (**Figure 4B**).

To analyze if the effect of miR319d on root development also affects rhizobial infection and SNF, we investigated the response of miR319d-modulated composite plants to R. tropici CIAT 899 infection, nodulation and SNF. Regarding rhizobial infection, we quantified the root hair deformation and the infection thread formation at early symbiotic stages. Notably, the amount of deformed root hairs was significantly higher in 2 dpi-inoculated roots that over-express miR319d, while the opposite effect was observed in STTM319d roots (**Figure 5A**). In agreement with this result, 6 dpi-inoculated OE319d roots showed a high increase in the amount of infection threads and the opposite effect was observed in STTM319d inoculated roots (**Figure 5A**). In addition, earlier infection thread formation, at 2 dpi, was observed only in OE319d roots and not in the other composite plants (**Supplementary Figure S3**). At nodule maturation (24 dpi) OE319d composite plants showed lower nodule biomass and nitrogenase activity as compared to EV and STTM plants (**Figure 5B**). Nodule biomass correlated with nodule number and not with altered nodule size because the nodule perimeter was similar in nodules from the different composite plants (OE319d = 0.43 ± 0.011 mm, EV = 0.43 ± 0.018 mm, STTM319d = 0.43 ± 0.018 mm). Overall, miR319d-overexpressing composite plants, with low TCP10 content (**Figure 4A**), showed a different pattern of effects in rhizobial infection vs. nodule formation/SNF (**Figures 5A,B**).

### Exploring Downstream TCP10 Regulation in OE319d and STTM319d SNF Plants

The TCP class II TF, targets of miR319, participate in complex regulatory networks that coordinate and balance different events

that are important for plant development and physiology. Relevant functions of TCP genes are the control of leaf and flower size and shape. The signaling pathways associated with these functions include, among others, the regulation of leaf cell proliferation by GROWTH REGULATING FACTOR (GRF) TFs, targets of miR396, and also the biosynthesis of JA that regulates different processes such as senescence (Nicolas and Cubas, 2016).

On this basis, we first analyzed if TCP10 activates the transcription of MIR396, resulting in the degradation of GROWTH REGULATING FACTOR (GRF) induced by mature miR396 to control cell proliferation in common bean

inoculated-roots. We quantified the mature miR396 transcript accumulation levels in OE319d and STTM319d nodulated roots, with diminished and increased TCP10 levels, respectively. Our data showed similar levels of miR396 in roots form the different composite plants analyzed (**Supplementary Figure S4**) thus indicating that TCP10 does not regulate miR396 transcription. We propose that miR319d/TCP10 node is not involved in the regulation of the miR396/GRF node nor in the proliferation of cells from common bean roots/nodules. However, further work is required to define if other common bean miR319 isoforms (**Figure 1**) that are highly expressed in leaves, such as pvu-miR319g and pvu-miR319h (Formey et al., 2015) and their TCP target genes participate in the miR396/GRF regulatory network to control leaf development.

The miR319/TCP regulation of leaf morphogenesis has been linked to JA biosynthesis (Schommer et al., 2012). It has been demonstrated that, two Arabidopsis TCP TFs bind to the LOX2 gene promoter and directly regulate its transcription (Schommer et al., 2008; Danisman et al., 2012). In addition, the expression of other JA-biosynthetic and -responsive genes depends on miR319/TCP levels in several plant species (Schommer et al., 2008; Danisman et al., 2012; Hao et al., 2012; Zhang et al., 2016, 2017). On this basis, we analyzed a possible correlation of TCP10 levels and the expression level of JA-related genes in common bean roots /nodules.

In P. vulgaris at least five LOX genes have been identified, these genes differ in their expression pattern in different plant organs. Of these, the LOX2 and LOX5 genes are expressed during nodule development (Porta and Rocha-Sosa, 2000). In this work we analyzed the common bean LOX2 and LOX5 expression level in transgenic nodulated roots with overexpression or silencing of miR319d (**Figure 6**). As compared to control (EV) plants, the expression level of LOX2 was lower in OE319d plants in contrast to its high level in STTM plants, both at early (2 dpi) and later (24 dpi) symbiotic stages (**Figure 6A**). These data indicate the correlation of LOX2 expression level with the level of TCP10 in transgenic nodulated roots (**Figures 4**, **6**). In addition, the expression levels of the JA-biosynthetic genes LOX5 and AOS as well as the MULTICYSTATIN (MC) JA-responsive gene (Uppalapati et al., 2005; López-Ráez et al., 2010; Martínez-Medina et al., 2017) showed a similar trend to that observed for LOX2, being highly expressed in STTM319d nodulated roots at 24 dpi (**Figures 6A,B**).

## DISCUSSION

Small RNAs differentially expressed during nodule organogenesis have been identified in different legumes such as Medicago truncatula, soybean (Glycine max), Lotus japonicus and common bean (Lelandais-Brière et al., 2009; De Luis et al., 2012; Turner et al., 2012; Formey et al., 2014, 2015, 2016). However, only few in-depth studies that evidence the role of miRNAs in the rhizobial infection, nodulation or SNF processes have been reported (reviewed by Lelandais-Brière et al., 2016; Li et al., 2017). Our group has demonstrated the participation of common bean miR398b and miR172c in different stages of the rhizobia symbiosis (Naya et al., 2014; Nova-Franco et al., 2015). In this work, we identified common bean miR319d as an important regulator of the rhizobial infection and nodulation.

The conserved miR319 family and its targets TCP TF have been extensively characterized in several plant models but most of these studies have been focused in their contribution to the aerial parts, especially leaf development (Schommer et al., 2012, 2014; Koyama et al., 2017). There are no previous studies about the participation of the miR319/TCP node in the control of the legume – rhizobia SNF symbiosis. The miR319d, one of the nine isoforms identified in the common bean, was highly expressed in nodules with respect to other plant organs and was included in a weighted correlation network together with other miRNAs known as regulators of the rhizobial symbiosis (Formey et al., 2015). Here we evidenced that TCP10 (Phvul.005G067950), previously identified through degradome analysis (Formey et al., 2015), is the target gene of miR319d. Our data on TCP10 expression profile validated those from the P. vulgaris Gene Expression Atlas (O'Rourke et al., 2014) and from the Phytozome data base<sup>5</sup> , regarding the negative correlation with miR319d expression in fertilized roots vs. inoculated roots/nodules at different stages of the symbiosis (**Figure 1**) and in roots/nodules from composite plants overexpressing or silencing the function of miR319d (**Figure 4**).

The regulatory role miR319/TCP in leaf development has been linked with other processes such as the control of cell proliferation by GRF TFs targets of miR396 (Nicolas and Cubas, 2016). However, our data do not support the link of TCP10 as activator of miR396 in inoculated common bean roots (**Supplementary Figure S4**). The regulation of the miR319/TCP node has also been linked to JA signaling that controls different developmental processes such as senescence (Nicolas and Cubas, 2016). JA, along with other phytohormones like ethylene and cytokinin, is a signaling molecule involved in leaf senescence and other developmental processes. The binding of the Arabidopsis TCP TFs TCP4, the target of miR319, and the class I TCP20 to specific motifs within the promoter regions of the LOX2 JA-biosynthetic gene has been demonstrated through electrophoretic mobility shift and chromatin immunoprecipitation analyses (Schommer et al., 2008; Danisman et al., 2012). In agreement, Arabidopsis LOX2 is one of the most affected genes depending on TCP4 levels (Schommer et al., 2008, 2012). On this basis, subsequent research in different plant species (i.e., rice, cotton) has linked the TCP transcriptional regulation to JA signaling, through the direct effect in LOX2 expression, that results in modulation of other JA-related genes (Hao et al., 2012; Zhang et al., 2016). In this work we showed a correlation between LOX2 and TCP10 (target of miR319d) transcript levels in transgenic roots with miR319d over-expression or function silencing (**Figure 6**). The latter indicates that, in common bean roots, LOX2 may be transcriptionally regulated by TCP10, a hypothesis also supported by the identification of TCP TF binding sites (TFBS) statistically over-represented (p-value < 0.05) within the LOX2 5 0 -promoter region. In addition, we observed a correlation of the

<sup>5</sup>www.phytozome.net

transcriptomic response of other JA-related genes (LOX5, MC, AOS2) with TCP10 and LOX2 levels (**Figure 6**). Based on this correlation, we propose that the effect of miR319d/TCP10 node in common bean root growth and nodulation, reported here, may be mediated by JA signaling (**Figure 7**).

The root growth inhibition was one of the first physiological effects detected for JA (Dathe et al., 1981; Staswick et al., 1992; Wasternack, 2007). Previous reports from Arabidopsis relate elevated JA levels with reduced root growth (Ellis et al., 2002; Wasternack, 2007). In this work we showed decreased biomass and area of common bean roots from miR319d-silenced plants with higher expression of TCP10 and LOX2 genes (**Figures 4**, **6** and **Supplementary Figure S2**).

Several studies have shown the participation of JA as a regulatory/signaling molecule in the rhizobia symbiosis with different legume species, including common bean (Poustini et al., 2007; Ferguson and Mathesius, 2014). For example, there are reports showing negative JA effects in early symbiotic stages of L. japonicus- and M. truncatula – rhizobia symbioses (Nakagawa and Kawaguchi, 2006; Sun et al., 2006) as well as positive JA effects in soybean nodulation (Seo et al., 2006; Kinkema and Gresshoff, 2008). Here, we showed that the R. tropici-inoculated OE319d common bean plants, with low level of TCP10 and LOX2, exhibit a significant increase in the amount of root hair deformation and infection thread formation at early stages of the symbiosis but a decreased nodulation. These results indicate an arrested infection, after infection thread formation stage, that prevents nodule development (**Figures 5**–**7**). We propose that the regulation of common bean nodulation by miR319d/TCP10 could be mediated by JA signaling. However, because of the complex and intricate regulation of the rhizobia symbiosis we cannot rule out the participation of other signaling pathways in the affected nodulation of common bean plants modulated in the expression of miR319d/TCP10. This is the

### REFERENCES


first report about the miR319/TCP node as regulator of the rhizobial symbiosis, future in-depth studies would indicate the commonalities of such regulatory network in other legume species.

### AUTHOR CONTRIBUTIONS

JM-R and DF conceived and performed the experiments, interpreted the data and contributed to the drafting of the manuscript. AL performed the experiments. GH conceived and supervised the whole project and wrote the manuscript.

# FUNDING

This work was supported in part by Dirección General de Asuntos del Personal Académico (DGAPA) – UNAM (Grants Nos. PAPIIT IN200816 and IA203218). JM-R received a postdoctoral fellowship from DGAPA-UNAM.

### ACKNOWLEDGMENTS

We are grateful to Sara I. Fuentes, Mario Ramírez, Víctor M. Bustos, Marisel Lliteras, and M. Beatriz Pérez-Morales for advice and technical assistance.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2018.01175/ full#supplementary-material




ENOD40 expression and regulate nodule initiation. Plant Cell 26, 4782–4801. doi: 10.1105/tpc.114.131607


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Martín-Rodríguez, Leija, Formey and Hernández. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Respective Contributions of URT1 and HESO1 to the Uridylation of 5<sup>0</sup> Fragments Produced From RISC-Cleaved mRNAs

### Hélène Zuber\*, Hélène Scheer, Anne-Caroline Joly and Dominique Gagliardi\*

Institut de Biologie Moléculaire des Plantes (IBMP), Centre National de la Recherche Scientifique (CNRS), Université de Strasbourg, Strasbourg, France

In plants, post-transcriptional gene silencing (PTGS) represses gene expression by translation inhibition and cleavage of target mRNAs. The slicing activity is provided by argonaute 1 (AGO1), and the cleavage site is determined by sequence complementarity between the target mRNA and the microRNA (miRNA) or short interfering RNA (siRNA) loaded onto AGO1, to form the core of the RNA induced silencing complex (RISC). Following cleavage, the resulting 5<sup>0</sup> fragment is modified at its 3<sup>0</sup> end by the untemplated addition of uridines. Uridylation is proposed to facilitate RISC recycling and the degradation of the RISC 5<sup>0</sup> -cleavage fragment. Here, we detail a 3<sup>0</sup> RACEseq method to analyze the 3<sup>0</sup> ends of 5<sup>0</sup> fragments produced from RISC-cleaved transcripts. The protocol is based on the ligation of a primer at the 3<sup>0</sup> end of RNA, followed by cDNA synthesis and the subsequent targeted amplification by PCR to generate amplicon libraries suitable for Illumina sequencing. A detailed data processing pipeline is provided to analyze nibbling and tailing at high resolution. Using this method, we compared the tailing and nibbling patterns of RISC-cleaved MYB33 and SPL13 transcripts between wild-type plants and mutant plants depleted for the terminal uridylyltransferases (TUTases) HESO1 and URT1. Our data reveal the respective contributions of HESO and URT1 in the uridylation of RISC-cleaved MYB33 and SPL13 transcripts, with HESO1 being the major TUTase involved in uridylating these fragments. Because of its depth, the 3<sup>0</sup> RACE-seq method shows at high resolution that these RISC-generated 5<sup>0</sup> RNA fragments are nibbled by a few nucleotides close to the cleavage site in the absence of uridylation. 3<sup>0</sup> RACE-seq is a suitable approach for a reliable comparison of uridylation and nibbling patterns between mutants, a prerequisite to the identification of all factors involved in the clearance of RISC-generated 5<sup>0</sup> mRNA fragments.

Keywords: uridylation, TUTase, RISC, RNA silencing, Arabidopsis, RNA degradation, miRNA, Illumina

# INTRODUCTION

Small RNAs are key regulators of gene expression (Borges and Martienssen, 2015; Bartel, 2018). They are classified as two main types, microRNAs (miRNAs) and short interfering RNAs (siRNAs), because of key distinctions in their respective mode of biogenesis (Martínez de Alba et al., 2013; Borges and Martienssen, 2015). miRNAs are processed from primary transcripts that fold as a

### Edited by:

Ana Confraria, Instituto Gulbenkian de Ciência (IGC), Portugal

### Reviewed by:

Tony Millar, Australian National University, Australia Laura Arribas-Hernández, University of Copenhagen, Denmark

### \*Correspondence:

Hélène Zuber helene.zuber@ibmp-cnrs.unistra.fr Dominique Gagliardi dominique.gagliardi@ibmpcnrs.unistra.fr

### Specialty section:

This article was submitted to Plant Cell Biology, a section of the journal Frontiers in Plant Science

Received: 31 May 2018 Accepted: 10 September 2018 Published: 09 October 2018

### Citation:

Zuber H, Scheer H, Joly A-C and Gagliardi D (2018) Respective Contributions of URT1 and HESO1 to the Uridylation of 5<sup>0</sup> Fragments Produced From RISC-Cleaved mRNAs. Front. Plant Sci. 9:1438. doi: 10.3389/fpls.2018.01438

hairpin with an imperfectly paired stem. By contrast, siRNAs are generated from near-perfect double stranded RNAs (dsRNAs) or fully paired dsRNAs when the complementary strand is synthesized by a RNA-dependent RNA polymerase (RDR), which uses the sense strand as template. miRNAs and siRNAs are loaded onto members of the argonaute (AGO) protein family to form the core of RNA induced silencing complexes (RISCs) (Vaucheret, 2008; Zhang et al., 2015). RISCs are then guided to their targets by sequence complementarity with the loaded small RNA. In plants, the base pairing of miRNAs with their targets is rather extensive, and mRNAs regulated by RISCs are repressed by AGO1-mediated cleavage, but also by translation repression (Chen, 2004; Brodersen et al., 2008; Yang et al., 2012; Li et al., 2013; Iwakawa and Tomari, 2015; Reis et al., 2015; Arribas-Hernández et al., 2016). Cleavage of mRNAs by RISC produces a 5<sup>0</sup> fragment and a 3<sup>0</sup> fragment. As detailed below, both the 5<sup>0</sup> -30 and 3<sup>0</sup> -5<sup>0</sup> RNA degradation pathways contribute to the elimination of these fragments.

In Arabidopsis, the cytosolic 5<sup>0</sup> -30 exoribonuclease XRN4 participates in the degradation of RISC 3<sup>0</sup> -cleavage fragments, as indicated by their accumulation in xrn4 mutants (Souret et al., 2004). XRN4 was also proposed to be involved in the degradation of RISC 5<sup>0</sup> -cleavage fragments because the 5<sup>0</sup> fragment resulting from the cleavage of MYB domain protein 33 (MYB33) mRNA by miR159-loaded RISC accumulates in a xrn4 mutant (Ren et al., 2014). RISC 5<sup>0</sup> -cleavage fragments are also definitely degraded by the 3<sup>0</sup> -5<sup>0</sup> RNA degradation pathway because they accumulate in ski2, ski3, and ski8 mutants (Branscheid et al., 2015). Together, SKI2-3-8 form the Ski complex, which is the major activator of the RNA exosome in the cytosol. Therefore, the involvement of the RNA exosome in the degradation of RISC 5<sup>0</sup> -cleavage fragments is likely in Arabidopsis, although it remains to be demonstrated using appropriate mutants. This implication of the RNA exosome would be consistent with previous findings in other organisms, such as Drosophila melanogaster (Orban and Izaurralde, 2005). In addition, two ribonucleases were recently described in Arabidopsis as taking part in the metabolism of RISC 5<sup>0</sup> -cleavage fragments: RISC-interacting clearing 3<sup>0</sup> -50 exoribonucleases 1 and 2 (RICE-1 and -2) (Zhang et al., 2017). RICEs are homohexamers with a DnaQ-like exonuclease fold, and they interact with AGO1 and AGO10 (Zhang et al., 2017). RICEs are proposed to initiate the destabilization of RISC 5<sup>0</sup> cleavage fragments thereby facilitating RISC dissociation. This would grant access of the 3<sup>0</sup> extremity of RISC 5<sup>0</sup> -cleavage fragments to the RNA exosome and importantly, recycle RISC, which is essential to maintain functional RISC and miRNA abundance (Zhang et al., 2017). The access of the 3<sup>0</sup> extremity of RISC 5<sup>0</sup> -cleavage fragments to the RNA exosome may also be facilitated by components of the non-stop decay (NSD) pathway when the RISC 5<sup>0</sup> -cleavage fragment is engaged in polysomes (Szádeczky-Kardoss et al., 2018). The prime function of NSD is to eliminate mRNAs lacking stop codons. Recently, orthologs of Pelota and Hbs1, two core components of NSD, were shown to participate in the elimination of RISC 5<sup>0</sup> -cleavage fragments in Nicotiana benthamiana and A. thaliana, provided that the cleavage site is within the coding region (Szádeczky-Kardoss et al., 2018). Likely, the NSD machinery promotes the dissociation of ribosomes stalled at the extremity of a RISC 5 0 -cleavage fragment to promote access to the RNA exosome (Szádeczky-Kardoss et al., 2018).

Besides exoribonucleases and RNA helicases, terminal uridylyltransferases (TUTases) constitute another type of enzymatic activities involved in the clearance of RISC 5<sup>0</sup> -cleavage fragments. Indeed a striking molecular event in this process is the untemplated addition of uridines at the 3<sup>0</sup> extremity of RISC 5 0 -cleavage fragments (Shen and Goodman, 2004; Ren et al., 2014; Zhang et al., 2017). The uridylation of several of such fragments was originally reported in both Arabidopsis and mice (Shen and Goodman, 2004). Since then, uridylation has emerged as a conserved post-transcriptional process that shapes the coding and non-coding transcriptomes in eukaryotes (Munoz-Tello et al., 2015; Scheer et al., 2016; De Almeida et al., 2018). In Arabidopsis, two TUTases have been characterized: HEN1 SUPPRESSOR 1 (HESO1) and URIDYLYLTRANSFERASE 1 (URT1) (Kwak and Wickens, 2007; Ren et al., 2012; Zhao et al., 2012; Sement et al., 2013). Both HESO1 and URT1 contain the core catalytic domain (CCD) that defines proteins belonging to the terminal nucleotidyltransferase family. In addition, URT1 contains a large intrinsically disordered region (IDR) in its N-terminal region, while a shorter IDR is present in the C-terminal region of HESO1 (De Almeida et al., 2018). Those IDRs may mediate the recognition of protein partners by URT1 and HESO1, or be a key to their localization in P-bodies and/or stress granules (Sement et al., 2013; Ren et al., 2014; Wang et al., 2015). HESO1 was identified as the main TUTase uridylating miRNAs and siRNAs to trigger their degradation (Ren et al., 2012; Zhao et al., 2012). In addition, HESO1 was shown to uridylate three RISC 5<sup>0</sup> -cleavage fragments (Ren et al., 2014). Those fragments are generated from MYB33, Auxin Response Factor 10 (ARF10), and Lost Meristems 1 (LOM1) mRNAs, which are targets of miR159, miR160, and miR171, respectively. A residual uridylation of these RISC 5<sup>0</sup> -cleavage fragments is observed in heso1 mutants (Ren et al., 2014) and, this secondary activity may be due to URT1, although experimental evidence supporting this hypothesis is lacking to date. URT1 is the main TUTase uridylating mRNAs in Arabidopsis, because mRNA uridylation is decreased by 70–80% in urt1-1 mutants (Sement et al., 2013; Zuber et al., 2016). URT1 can also uridylate miRNAs, mostly when HESO1, the primary TUTase involved in small RNA uridylation, and HUA ENHANCER 1 (HEN1), a methyltransferase that methylate small RNA duplexes, are absent (Yu et al., 2005; Yang et al., 2006; Ren et al., 2012; Zhao et al., 2012; Tu et al., 2015; Wang et al., 2015). miRNAs are therefore the first documented example of shared RNA substrates between HESO1 and URT1 (Tu et al., 2015; Wang et al., 2015). Yet, both overlapping and distinctive roles in miRNA uridylation were attributed to each TUTase (Tu et al., 2015; Wang et al., 2015). mRNAs and RISC-cleaved transcripts could constitute other cases of shared RNA substrates between HESO1 and URT1. These possibilities remain to be experimentally addressed.

To date, the characterization of uridylated RISC 5<sup>0</sup> -cleavage fragments has relied on the use of 3<sup>0</sup> RACE PCR followed by cloning and subsequent analysis based on Sanger sequencing. Although this experimental strategy has proven useful, it has

some inherent limitations. The first one is that this method is low-throughput. It is time-consuming and the depth is usually quite limited, with often less than 20–30 clones analyzed per genotype. The second major drawback is the lack of discrimination between amplicons and independent molecules. This turns out to be a real issue when analyzing low complexity samples by PCR amplification, with the majority (up to 90% as determined here during the analysis of RISC 5<sup>0</sup> -cleavage fragments) of the final PCR products that correspond to very few independent templates. Taken together, these limitations hinder the qualitative and quantitative analysis of the uridylation of RISC-cleaved transcripts. Such an analysis is crucial to reliably compare uridylation between wild-type (WT) and mutant genetic backgrounds, and this comparison is required to identify all factors involved in the metabolism of 5<sup>0</sup> RNA fragments produced by RISC cleavage.

Here we detail a 3<sup>0</sup> RACE-seq method that has been optimized for analyzing the uridylation of 5<sup>0</sup> fragments from RISC-cleaved transcripts. Those molecules are usually low abundant within the complex mixture of all cellular RNAs, and they exhibit a rather poor diversity, with a few untemplated nucleotides usually added at a precise RISC-mediated cleavage site. We illustrate the use of 3<sup>0</sup> RACE-seq to analyze the tailing and trimming patterns of MYB33 and SPL13 RISC 5<sup>0</sup> -cleavage fragments by comparing WT plants and mutants lacking HESO1 and URT1. This analysis revealed the respective contributions of both TUTases, and that the absence of uridylation results in the accumulation of 5<sup>0</sup> cleavage fragments nibbled by a few nucleotides close to the site cleaved by RISC.

# MATERIALS AND METHODS

### Gene IDs and Primers

The Arabidopsis Genome Initiative (AGI) locus identifiers for the genes studied in this study are: AT2G39740 (HESO1), AT2G45620 (URT1), AT5G06100 (MYB33), and AT5G50570 (SPL13A). Please note that AT5G50570 (SPL13A) and AT5G50670 (SPL13B) have identical coding sequences and therefore cannot be discriminated in this study. For simplicity, the name SPL13 is used thereafter. The sequence of all primers used in this study is shown in **Supplementary Table S1**.

### Plant Material

The plant material used for analyzing RISC 5<sup>0</sup> -cleavage fragments by 3<sup>0</sup> RACE-seq corresponds to Arabidopsis plantlets of Col-0 accession grown for 24 days in vitro on Murashige and Skoog media with 0.8% agar and 12 h light (22◦C)/12 h darkness cycles (18◦C). For other analyses, flowers were harvested from Arabidopsis of Col-0 accession and grown on soil with 16 h light/8 h darkness cycles. urt1-1 (Salk\_087647C) and heso1-1 (GK-367H02-017041) T-DNA mutant lines have been previously described (Zhao et al., 2012; Sement et al., 2013). Double mutants were obtained by down regulating URT1 by cosuppression in heso1-1. For this purpose, heso1-1 plants were transformed with a construct expressing an inactive version of URT1 fused to YFP, which was fortuitously found to efficiently trigger co-suppression of the endogenous URT1 gene. The sequence encoding the inactive version of URT1 (URT1D491/3A) (Sement et al., 2013) was cloned in the pEarleyGate 104 Gateway plasmid under the control of the cauliflower mosaic virus (CaMV) 35S promoter. Analyses were performed on two biological replicates of three independent heso1-1 urt1SIL lines. As a control, we also analyzed two biological replicates of a urt1SIL line obtained by co-suppressing URT1 with a YFP-URT1 sequence cloned in the pEarleyGate 104 Gateway plasmid.

# Protein Extraction and Western Blot Analysis

Proteins were extracted from flowers of WT, urt1-1, urt1SIL and three independent heso1-1 urt1SIL lines under denaturing conditions. Proteins were resolved on a 8% SDS-PAGE gel and transferred to an Immobilon-P membrane. Immunoblots were incubated with anti-URT1 antibodies raised in rabbits against the full-length recombinant URT1. Following incubation with horseradish peroxidase-coupled secondary antibodies and Lumi-Light Western Blotting Substrate (Roche), signals were recorded using the Fusion-FX system (Vilber Lourmat).

# RNA Extraction and Northern Blot Analysis

Total RNA was extracted from 24-day-old plantlets and flowers for 3<sup>0</sup> RACE-seq and northern blot analyses, respectively, with TRI Reagent <sup>R</sup> (Molecular Research Center) according to manufacturer's instructions. RNA was further purified by acid phenol: chloroform: isoamyl alcohol extraction followed by ethanol precipitation. For northern blot analysis of MYB33 RISC 5 0 -cleavage fragments, 30 µg total RNAs from WT, urt1-1 and heso1-1 mutants were separated on a denaturing formaldehyde 1.5% (w/v) agarose gel and transferred onto a nylon membrane (HybondTM-N+, GE Healthcare Life SciencesTM). Following UV-cross-link at 120 mJ/cm<sup>2</sup> for two times 30 s and incubation for 30 min in PerfectHyb Plus Hybridization buffer (Sigma), the membrane was hybridized overnight at 65◦C with a probe that detects the 5<sup>0</sup> fragment of MYB33 RISC-cleaved transcripts. The probe was prepared by PCR amplifying a 219 bp sequence upstream of the RISC cleavage site (**Supplementary Table S1**) and by random primed labeling the PCR product using [α-<sup>32</sup>P]-dCTP and DecaLabel DNA labeling kit (Thermo Scientific). For northern blot analysis of miR159, 10 µg total RNA from WT, urt1-1, and heso1-1 mutants were separated on 17.5% polyacrylamide/7 M urea gels and transferred onto nylon membranes (HybondTM-NX, GE Healthcare Life SciencesTM). Following UV-cross-link at 120 mJ/cm<sup>2</sup> for two times 30 s and incubation for 30 min in PerfectHyb Plus Hybridization buffer (Sigma), membranes were hybridized overnight at 50◦C with a 5<sup>0</sup> [ <sup>32</sup>P]-labeled oligonucleotide to detect miR159 (**Supplementary Table S1**). The probe was labeled using [γ-<sup>32</sup>P] ATP and T4 PNK (NEB) according to manufacturer's instruction. Radioactive signals were detected by autoradiography and quantified using a Typhoon scanner (GE Healthcare Life Sciences) and Image Gauge software. Plant material used for biological replicates 1 and 2 were common for both northern analyses.

### 3 <sup>0</sup> RACE-Seq Protocol

fpls-09-01438 October 6, 2018 Time: 18:8 # 4

A 3<sup>0</sup> RACE-seq protocol was adapted for analyzing RISC 5<sup>0</sup> cleavage fragments. Total RNA was extracted from 24-day-old seedlings using TRI Reagent <sup>R</sup> as described above. Twenty pmoles of a 5<sup>0</sup> -riboadenylated DNA oligonucleotide (3<sup>0</sup> -Adap, **Figure 1** and **Supplementary Table S1**) were ligated to 10 µg of total RNA using 20 U of T4 ssRNA Ligase 1 (NEB) in a final volume of 100 µl for 1 h at 37◦C and 1X T4 of RNA Ligase Reaction Buffer (NEB, 50 mM Tris–HCl, 10 mM MgCl2, 1 mM DTT, pH 7.5). The ligation products were purified from reagents and non-ligated adapter molecules with Nucleospin <sup>R</sup> RNA Cleanup columns (Macherey Nagel). RNA was then precipitated with ethanol, solubilized in water and quantified. cDNA synthesis was performed in two 20 µl-reactions for each sample. Each 20 µlreaction contained 2 µg of purified ligated RNA, 50 pmol of the 3<sup>0</sup> -RT oligonucleotide (**Supplementary Table S1**), 10 nmol of dNTP, 0.1 µmol of DTT, 40 U of RNaseOUT (InvitrogenTM), 200 U of SuperScript IV reverse transcriptase (InvitrogenTM) and 1X of SuperScript IV RT buffer (InvitrogenTM). Reactions were incubated at 50◦C for 10 min, and then at 80◦C for 10 min to inactivate the reverse transcriptase. The two 20 µlreactions for each sample were pooled, the cDNAs were extracted with phenol–chloroform, precipitated with ethanol and dissolved in 8 µl Milli-Q water. Two nested PCR amplification rounds of 20 and 25 cycles, respectively, were then performed. PCR1 was run using cDNA synthesized from 1 µg of total RNA, i.e., 2 µl of concentrated cDNA, 10 pmol of MYB33 or SPL13 gene-specific sense primer 1, 10 pmol of RACEseq\_rev1 primer (**Supplementary Table S1**), 10 nmol of dNTP, 1 U of GoTaq <sup>R</sup> DNA Polymerase (Promega) and 1X of Green GoTaq <sup>R</sup> Reaction Buffer (Promega) in a 20 µl final volume. The conditions for PCR1 were as follows: a step at 94◦C for 30 s; 20 cycles at 94◦C for 20 s, 50◦C for 20 s and 72◦C for 30 s; a final step at 72◦C for 30 s. PCR2 was performed using 1 µl of PCR1 product, 10 pmol of MYB33 or SPL13 gene-specific sense primer 2, 10 pmol of a TruSeq RNA PCR index (RPI, **Supplementary Table S1**) 10 nmol of dNTP, 1 U of GoTaq <sup>R</sup> DNA Polymerase (Promega) and 1X of Green GoTaq <sup>R</sup> Reaction Buffer (Promega) in a 20 µl final volume. The conditions for PCR2 were as follows: a step at 94◦C for 1 min; 25 cycles at 94◦C for 30 s, 56◦C for 20 s and 72◦C for 30 s; a final step at 72◦C for 30 s. For each sample, three to four 20 µl-reactions were run and pooled. All PCR2 products were purified using 1 volume of AMPure XP beads (Agencourt). Library concentrations were determined using a Qubit fluorometer (InvitrogenTM). Libraries were analyzed on a 2100 Bioanalyzer system (Agilent) to assess quality and estimate size distribution. Library were paired-end sequenced with MiSeq (v3 chemistry) with 41 × 111 bp cycle settings. The respective numbers of sequencing cycles for read 1 and read 2 can be adjusted according to other samples that are co-analyzed. Read 1 is used to identify target transcript whereas read 2 is used to map 3<sup>0</sup> extremities and analyze 3<sup>0</sup> potential untemplated nucleotides. To compensate for the poor diversity of the amplicon libraries, 25–33% of phiX control library (Illumina) were included. Two rounds of RACE-seq experiments were performed. For the first round, four independent biological replicates were analyzed for WT and heso1-1 genotypes. For the second round, two independent biological replicates were analyzed for WT, urt1-1, heso1-1, urt1SIL line, and each of the three heso1-1 urt1SIL lines (i.e., six heso1-1 urt1SIL samples). Plant material used for biological replicates 1 and 2 was common for both rounds.

# Bioinformatic Analysis of 3<sup>0</sup> RACE-Seq Data

Sequencing run quality fit Illumina specification with more than 90% bases higher than Q30. After initial data processing by the MiSeq Control Software v 2.5 (Illumina), base calls were retrieved and further analyzed by a suite of homemade scripts (**Supplementary Data Sheets S1**, **S2**) using python (v2.7), biopython (v1.63), and regex (v2.4) libraries. Data processing pipeline was adapted from Sikorska et al. (2017). Reads with low quality bases (= < Q10) within the 15-base random sequence of the read 2 or within the 20 bases downstream the delimiter sequence, were filtered out. Sequences with identical nucleotides in 15-base random sequence were deduplicated. Next, the sequences AAGAATTCTCGTCGCCTGAA and GCCAGAGCTATGTTGTTGGT were searched into reads 1 to identify MYB33 and SPL13 corresponding reads, respectively. One mismatch was tolerated. Matched reads 1 and their corresponding reads 2 were extracted and annotated. Reads 2 that contain the delimiter sequence were selected and subsequently trimmed from their random and delimiter sequences. In order to map 3<sup>0</sup> extremities of target 5<sup>0</sup> RISC generated fragments, the 20 nucleotide sequences downstream the read 2 delimiter sequence were mapped to the corresponding reference sequence, which goes from the first nucleotide of the transcript that maps the forward PCR primer MYB33\_RISC\_fw2 or SPL13\_RISC\_fw2 to the last nucleotide of the miRNA binding site. To map the 3<sup>0</sup> end position of reads 2 with untemplated tails, the sequences of the unmatched reads 2 were successively trimmed from their 3 0 end, with a 1 nt trimming step, until successfully mapped to the reference sequence or until a maximum of 30 nt has been removed. For each successfully mapped read 2, untemplated nucleotides at the 3<sup>0</sup> end were extracted and analyzed for their size and composition. 3<sup>0</sup> modifications longer than 1 were considered only if composed of at least of 50 % of the same base (i.e., 50% U, 50% A, 50% C, or 50% G). As explained in the Results section and as illustrated in **Figures 2**, **7** for MYB33 and SPL13, respectively, the sites cleaved by RISC were defined by using PARE-seq datasets to map the 5<sup>0</sup> most nucleotide of the 3<sup>0</sup> fragment. A single cleavage site was determined for MYB33, i.e., between nucleotides at position 0 and position +1, in contrast to two cleavage sites for SPL13, i.e., a major site between 0 and +1 and a minor site between +1 and +2. Because position +1 in SPL13 is a uridine, we could not determine whether this U is encoded or added post-transcriptionally and SPL13 graphs were generated by considering only U-tails > 2. Finally, a supplemental deduplication was performed to increase stringency: sequences with 13 or more identical nucleotides in

FIGURE 1 | Flowchart of the main steps to map the 3<sup>0</sup> ends of RISC 5<sup>0</sup> -cleavage fragments by 3<sup>0</sup> RACE-seq. Features of the 3<sup>0</sup> adapter and the principle of the main steps are indicated. The experimental workflow begins by ligating the 3<sup>0</sup> adapter to the RISC 5<sup>0</sup> -cleavage fragment. Please note that any RNA molecule with a 3<sup>0</sup> hydroxyl end in the total RNA sample is ligated to the 3<sup>0</sup> adapter. The target of interest is specifically amplified during PCR-1 and -2 due to the gene-specific sequences of the forward primers. The protocol is detailed in Methods and the scripts used to analyze data are given in Supplementary Data Sheets S1, S2.

the 15-base random sequence were deduplicated. Plotting and quantitative data analysis was performed with R software (v3.3.1) and ggplot2 R (v2.2.1). Percentages of uridylated fragments were plotted for reads with 3<sup>0</sup> extremities that map at the cleavage site, with U-tails being defined as tail composed of more than 50% U. Data obtained from the two rounds of RACE-seq experiments, referred to as dataset #1 and dataset #2, have been deposited to the NCBI Gene Expression Omnibus (GEO) database with the accession code GSE115470.

# RESULTS AND DISCUSSION

### Workflow for Mapping 3<sup>0</sup> Ends of RISC 5 0 -Cleavage Fragments by 3<sup>0</sup> RACE-Seq

The principle of 3<sup>0</sup> RACE-seq to analyze the 3<sup>0</sup> ends of RISC 5 0 -cleavage fragments is shown in **Figure 1**. Briefly, a 5<sup>0</sup> preriboadenylated oligodeoxynucleotide adapter is ligated to the 3<sup>0</sup> hydroxyl end of RNA molecules using T4 RNA ligase 1 and total RNA. The sequence of the 3<sup>0</sup> adapter is identical to the one previously described for the TAIL-seq procedure (Chang et al., 2014). However, unlike for TAIL-seq, the 3<sup>0</sup> adapter does not

fragments. Percentages of uridylated MYB33 RISC 5<sup>0</sup> four biological replicates for (A) WT and (B) heso1-1. Percentages of long (>2 Us), 2 U- and 1 U-tails are indicated by dark gray, light gray, and black, respectively.

need to be biotinylated. The sequence features of the adapter are detailed in **Figure 1**. Five nucleotides at the 5<sup>0</sup> end of the adapter form a delimiter sequence. All reads that do not contain

this delimiter sequence are discarded during the analysis. This ensures that we accurately map the 3<sup>0</sup> extremity of a transcript that has been ligated to the 3<sup>0</sup> adapter. Untemplated nucleotides are defined as any nucleotides present between the genomeencoded sequence and the delimiter. The delimiter is followed by a random sequence of 15 bases. This random sequence is essential to remove PCR duplicates during the bioinformatic analysis. This deduplication step is crucial when using 3<sup>0</sup> RACE-seq to analyze low abundant RNA species with limited complexity, which is typically the case for RISC 5<sup>0</sup> -cleavage fragments. To further prevent amplicon biases due to the misincorporation of nucleotides in the random sequence during PCR amplification or due to sequencing errors of the random sequence, we enhance the stringency of the deduplication step by not tolerating up to two mismatches within the 15-base random sequence of deduplicated sequences.

The 3<sup>0</sup> adapter then contains 22 additional bases, which provide an anchor sequence for cDNA synthesis and subsequent PCR amplification. Importantly, the primer used for cDNA synthesis is complementary to the sequence of the 3<sup>0</sup> adapter but stops five bases downstream of the random sequence (**Figure 1**). By using a reverse primer for PCR amplification that extends up to the random sequence, we eliminate the vast majority of cDNAs that are due to priming artifacts and specifically analyze transcripts that have the 3<sup>0</sup> adapter ligated at their 3<sup>0</sup> ends. This trick greatly enhanced the quality and depth of our libraries. Finally, the 3<sup>0</sup> adapter is terminated by a dideoxy-C to prevent self-ligation (**Figure 1**).

cDNAs are then subjected to two successive rounds of PCR amplification. For the first round, the forward primer is a genespecific primer matching the sequence of a selected mRNA and located ideally about 200–400 nucleotides upstream of the predicted RISC-mediated cleavage site. The reverse primer matches the sequence of the 3<sup>0</sup> adapter up to the random sequence (**Figure 1**). As mentioned above, this prevents the amplification of most cDNA priming artifacts. The second round of PCR is performed with a nested forward primer and a barcoded reverse primer complementary to the anchor sequence (**Figure 1**). Typically, thirty different barcodes can be used to simultaneously analyze different genotypes or replicates. Both forward and reverse primers contain 5<sup>0</sup> extensions corresponding to the Illumina sequences that are used for flow cell hybridization and sequencing. The number of PCR cycles must be kept as low as possible for both PCRs and ideally should not exceed 20– 25 per PCR. Amplicon libraries are purified using AMPure XP beads, quantified with an Invitrogen Qubit fluorometer and their size distribution is determined with a 2100 Bioanalyzer system (Agilent). Amplicon libraries are then sequenced using MiSeq paired-end sequencing for an average yield per run of 38 million of reads: 19 millions of read 1 and 19 millions of read 2.

# Mapping of the RISC-Cleavage Site in MYB33 mRNAs by 3<sup>0</sup> RACE-Seq

We selected the MYB33 mRNAs targeted by miR159 as a model substrate to set up the mapping of the 3<sup>0</sup> ends of RISC 5<sup>0</sup> cleavage fragments by 3<sup>0</sup> RACE-seq. MYB33 has been chosen in several studies to investigate uridylation of RISC 5<sup>0</sup> -cleavage fragments for two main reasons (Shen and Goodman, 2004; Ren et al., 2014; Zhang et al., 2017). First, MYB33 RISC 5<sup>0</sup> -cleavage fragments are detectable by northern blot analysis, and therefore their accumulation can be compared between WT plants and

membrane stained with Coomassie blue is shown as loading control. Please note that the construct used to co-suppress URT1 in the heso1-1 background expresses an inactive version of URT1 fused to YFP. Uncropped images are shown in Supplementary Figure S1. (B) Percentages of uridylated MYB33 RISC 5 0 -cleavage fragments in two biological replicates for WT, urt1-1, the urt1SIL line, heso1-1, and the three heso1-1 urt1SIL lines (i.e., six heso1-1 urt1SIL samples). Percentages of long (>2 Us), 2 U- and 1 U-tails are indicated by dark gray, light gray, and black, respectively.

relevant mutants. Second, a high proportion of MYB33 RISC 5<sup>0</sup> cleavage fragments is uridylated in WT plants. This proportion was in fact high enough to allow detection of uridylated MYB33 RISC 5<sup>0</sup> -cleavage fragments by sequencing of a limited number of clones (Shen and Goodman, 2004; Ren et al., 2014; Zhang et al., 2017). The high level of uridylation in WT plants is useful to monitor decrease of uridylation in mutants to identify factors that are involved in the metabolism of this RISC 5<sup>0</sup> -cleavage fragment. However, there is one drawback in choosing MYB33 to study uridylation: the predicted cleavage site, which is specified by the tenth and eleventh nucleotides of miR159, is situated between two uridines (**Figure 2A**). This can lead to uncertainties as to whether some 3<sup>0</sup> terminal uridines are genome-encoded or added post-transcriptionally by TUTases. To solve this issue, we took advantage of previous data generated using parallel analysis of RNA ends (PARE)-seq. PARE-seq is one of the sequencing methods designed to map 5<sup>0</sup> hydroxylated end of RNAs and used to map small RNA cleavage sites (German et al., 2008). PARE-seq unambiguously identifies the position defined here as +1 as the 5<sup>0</sup> nucleotide of the RISC-generated 3<sup>0</sup> fragment of MYB33 (**Figure 2B**). Therefore, cleavage of MYB33 by miR159 loaded AGO1 does occur at the canonical site, which we defined here between positions 0 and +1 (**Figure 2A**). This was further experimentally validated in the present study because MYB33 RISC 5<sup>0</sup> -cleavage fragments ending at position 0 accumulate in a genetic background abolishing uridylation (detailed later in **Figure 6B**).

To study MYB33 RISC 5<sup>0</sup> -cleavage fragments by 3<sup>0</sup> RACEseq, we first analyzed the aerial part of 24-day-old plants grown in vitro corresponding to four biological replicates for WT and four biological replicates for the heso1-1 mutant. We obtained a total of 29,689 reads for WT and 34,096 reads for heso1-1 (**Supplementary Table S2** and **Supplementary Data Sheet S1**). The WT data were first used to monitor the distribution of 3<sup>0</sup> extremities mapped in the sequence to which miR159 binds. The majority of reads (up to 85%) mapped at position 0 (**Figure 2C**). Therefore, we conclude that the 3<sup>0</sup> extremities of RISC-cleaved MYB33 are accurately mapped by 3<sup>0</sup> RACE-seq.

### Respective Contributions of HESO1 and URT1 in the Uridylation of MYB33 5 0 -Cleavage Fragments

To analyze untemplated nucleotides added after RISC-mediated cleavage of MYB33 mRNAs, the nucleotide extensions for reads that map to position 0 were analyzed first for the WT samples. Up

to 98 % of MYB33 RISC 5<sup>0</sup> -cleavage fragments in WT are tailed by nucleotide extensions, which are predominantly composed of uridines (**Supplementary Table S3**). This result is in agreement with previous observations (Shen and Goodman, 2004; Ren et al., 2014; Zhang et al., 2017). Most U-rich tails were longer than 2 Us in the four WT biological replicates (**Figure 3A** and **Supplementary Table S3**). We then compared the impact of HESO1 on the uridylation MYB33 RISC 5<sup>0</sup> -cleavage fragments. A major decrease in uridylation was observed in heso1-1 as compared with WT samples (**Figure 3B**). This observation confirmed that HESO1 is the main TUTase uridylating MYB33 5 0 -cleavage fragments, as shown here using four independent biological replicates in the Col-0 genetic background. In addition, and as previously observed (Ren et al., 2014), the size of U-tails detected in heso1-1 was reduced as compared to WT, with mainly short U-tails (<2 Us) detected in heso1-1 (**Figure 3**).

The residual uridylation in heso1-1 indicates the involvement of an alternative TUTase. A good candidate for this activity is URT1, the second TUTase that has been identified in Arabidopsis (Sement et al., 2013). To date, the possible involvement of URT1 in the uridylation of 5<sup>0</sup> RISC-cleaved mRNAs, including MYB33, has been proposed but not tested experimentally. Testing this hypothesis requires the production of a heso1 urt1 double mutant. To this end, we crossed the heso1-1 and urt1-1 single mutants. However, we failed to recover the expected double mutant in the F2 progeny. This failure is yet unexplained but we could obtain lines that were originally designed to overexpress an inactive version of URT1, but that in fact co-suppress the endogenous URT1 gene in the heso1-1 background. We selected three heso1-1 lines for which the endogenous URT1 was not detected anymore by western blot analysis, revealing a drastic downregulation of URT1 (**Figure 4A**). These lines, which have no particular phenotype when grown under optimal conditions, are called heso1-1 urt1SIL1 , heso1-1 urt1SIL2, and heso1-1 urt1SIL3 thereafter. The uridylation of MYB33 5 0 -cleavage fragments was down to background levels in both biological replicates for three heso1-1 urt1SIL lines as compared with the single heso1-1 mutant (**Figure 4B**). Therefore, both HESO1 and URT1 participate in

uridylating MYB33 5 0 -cleavage fragments, albeit HESO1 is clearly the main TUTase involved in uridylating these fragments.

Of note, HESO1 and URT1 might have a distinct contribution in the uridylation of MYB33 5 0 -cleavage fragments. HESO1 can synthesize short and long U-extensions, but URT1 seems to add only one or two uridines (**Figure 4B**). Interestingly, a similar distinction was proposed for HESO1 and URT1 in uridylating small RNAs. URT1 was proposed to add a single uridine to small RNAs to favor the subsequent action of HESO1, which prefers 3<sup>0</sup> extremities ending with uridines (Tu et al., 2015; Yu et al., 2017). A comparable scenario could exist for RISC 5<sup>0</sup> -cleavage fragments although additional investigation is required to confirm this hypothesis. In any case, and as previously observed for small RNAs, uridine addition by URT1 to RISC 5<sup>0</sup> -cleavage fragments does not seem to be a prerequisite to the action of HESO1, at least for MYB33 5 0 -cleavage fragments.

### Respective Contribution of HESO1 and URT1 in the Accumulation of MYB33 5 0 -Cleavage Fragments

To further check the predominant role of HESO1 in the metabolism of MYB33 5 0 -cleavage fragments, we analyzed their accumulation by northern blot analysis and phosphorimager quantification (**Figure 5A**). The accumulation of MYB33 RISC 5 0 -cleavage fragments in each sample was calculated relative to its full-length mRNA and each ratio was normalized to the ratio obtained for the WT control for each of the two replicates. As previously observed (Ren et al., 2014), MYB33 5 0 -cleavage fragments accumulated to higher levels in heso1- 1 with respect to WT (**Figure 5B**), although for unknown reasons the accumulation seemed variable in both replicates. Yet, our northern analysis confirmed that uridylation by HESO1 likely destabilizes MYB33 5 0 -cleavage fragments. The single urt1 mutation seemed to have no major effect on this accumulation. Furthermore, MYB33 5 0 -cleavage fragments accumulated to similar levels in the heso1-1 urt1SIL lines as compared to the single heso1-1 mutant (**Figure 5**). In other words, there was no additive effect of the lack of URT1 and HESO1, and this observation points to HESO1 as the main TUTase controlling the accumulation of MYB33 5 0 -cleavage fragments. Of note, miR159 accumulated to similar levels when HESO1 is absent, ruling out a higher rate of production of MYB33 5 0 -cleavage fragments in heso1-1 mutants (**Figure 5C**). Altogether, the 3<sup>0</sup> RACE-seq and northern analyses indicate that HESO1 is the main TUTase modifying MYB33 5 0 -cleavage fragments. Although URT1 could add short uridine extensions to MYB33 5 0 -cleavage fragments, it does not appear to be a limiting factor neither in the uridylation nor in the destabilization of this fragment produced by RISC cleavage.

# mRNA 5<sup>0</sup> Fragments Are Nibbled at RISC Cleavage Site in the Absence of Uridylation

The 3<sup>0</sup> truncation up to several hundreds of nucleotides upstream of the RISC cleavage site was previously observed for MYB33

5 0 -cleavage fragments in the heso1-2 mutant (Ren et al., 2014). We took advantage of the depth of the 3<sup>0</sup> RACE-seq procedure to analyze at high resolution the 3<sup>0</sup> extremities of MYB33 5 0 cleavage fragments in the vicinity of the cleavage site. Although the vast majority of extremities in the four WT biological replicates mapped at position 0, different patterns were observed for heso1-1. The patterns were not completely identical in the four biological replicates, but they all revealed the same trend: the 3<sup>0</sup> extremities were spread over positions from −10 to 0 (**Figure 6** and **Supplementary Figure S3**). This observation reveals that the MYB33 5 0 -cleavage fragments that accumulate in the absence of HESO1 are nibbled at close proximity to the cleavage site. This nibbling shortens MYB33 5 0 -cleavage fragments by up to 8–9 nucleotides (**Figure 6A** and **Supplementary Figure S3**). Such a nibbling was not observed in the single urt1-1 mutant (**Figure 6B** and **Supplementary Figure S4**) but it was consistently observed in heso1-1 and not aggravated in heso1-1 urt1SIL mutants (**Figure 6B** and **Supplementary Figure S4**). Therefore, the nibbling is solely attributed to the absence of HESO1, but not of URT1, in the case of MYB33 5 0 -cleavage fragments.

We then analyzed the respective contribution of HESO1 and URT1 in uridylating the 5<sup>0</sup> fragments produced by RISC cleavage of Squamosa promoter-binding-like protein 13 (SPL13) mRNAs that are targets of miR156 and miR157 (**Figure 7A**). PAREseq data identify a major and a minor 5<sup>0</sup> extremity for the 3<sup>0</sup> fragments produced by RISC cleavage (**Figure 7B**). Therefore, it is possible that in addition to the major cleavage site denoted 0 in **Figure 7A**, a minor site exists at position +1. This minor site at +1 presumably results from the action of miR157 (**Figure 7A**; He et al., 2018). Because nucleotide +1 is a U, it is not possible to determine in the 3<sup>0</sup> RACE-seq data whether this U is encoded or added post-transcriptionally. To eliminate this uncertainty that could affect the proportion of uridylated versus non-uridylated fragments, we considered only tails of at least two nucleotides. Of note, not considering the 1 U extensions may lead to the underestimation of the action of URT1 and/or HESO1 in adding 1 U. The overall level of uridylation of SPL13 5 0 -cleavage fragments was lower than for MYB33, with percentage of uridylation below 40% and an increased variability between replicates (**Figures 7C,D** and **Supplementary Table S4**). Yet, a similar pattern was observed for both targets: uridylation of RISC 5<sup>0</sup> -cleavage fragments is mostly reduced in the absence of HESO1 and close to background levels in heso1-1 urt1SIL lines (**Figures 7C,D**). Interestingly, the nibbling of RISC 5<sup>0</sup> -cleavage fragments was increased in the six replicates of heso1-1 urt1SIL although to a lesser extent than the one observed for MYB33 (**Figure 8**). This observation confirms the accumulation of RISC 5<sup>0</sup> -cleavage fragments that are nibbled close to the cleavage site in case of defective uridylation. The greater accumulation of nibbled fragments in absence of HESO1 and URT1 suggests

that in the case of SPL13 5 0 -cleavage fragments, the absence of uridylation per se is responsible for this accumulation.

Two, non-mutually exclusive, interpretations can explain the accumulation of nibbled RISC 5<sup>0</sup> -cleavage fragments in the absence of uridylation. First, uridylation of the nibbled fragments could trigger their degradation. Their fast turn-over would explain that they are not detected in WT plants. However, those fragments would accumulate in the absence of the TUTases. The second alternative possibility would be that in the presence of HESO1 and/or URT1, the 3<sup>0</sup> extremities are not accessible to the activity, presumably a 3<sup>0</sup> -50 exoribonucleolytic activity, that generates the nibbled RNA species. Such a possibility was previously evoked to explain the accumulation of truncated 5 0 -cleavage fragments in the heso1-2 mutant (Ren et al., 2014). Solving this question entails the identification of all ribonucleases involved in the metabolism of 5<sup>0</sup> RISC-cleaved transcripts.

### CONCLUSION

Here, we report the respective contribution of HESO1 and URT1 in the metabolism of two 5<sup>0</sup> RISC-cleaved mRNAs. In addition, we show the applicability of 3<sup>0</sup> RACE-seq to map the 3<sup>0</sup> ends of 5<sup>0</sup> RISC-cleaved transcripts and to identify untemplated nucleotides added at these 3<sup>0</sup> ends. The depth of 3<sup>0</sup> RACE-seq will be useful for both qualitative and quantitative comparisons across different targets, tissues, conditions or genotypes. For instance, different RISC 5<sup>0</sup> -cleavage fragments could be investigated to identify both common and specific behaviors of these RNA fragments produced by post-transcriptional gene silencing. Also, the full machinery involved in the degradation of RISC 5<sup>0</sup> cleavage fragments needs to be characterized. This is an on-going process with the recent identification of RICE exoribonucleases (Zhang et al., 2017) or the recent description that components of the NSD pathway and the Ski complex, a major co-factor of the cytosolic RNA exosome, are involved in the degradation of RISC 5<sup>0</sup> -cleavage fragments (Branscheid et al., 2015; Szádeczky-Kardoss et al., 2018). Yet the direct involvement of the RNA exosome in the clearance of RISC 5<sup>0</sup> -cleavage fragments remains to be demonstrated in Arabidopsis. The impact of SUPPRESSOR OF VARICOSE (SOV), whose ortholog is called Dis3L2 in non-plant eukaryotes, on the degradation of RISC 5 0 -cleavage fragments could also be investigated. Dis3L2 is a 3 0 -50 exoribonuclease belonging to the RNase II family and whose activity is stimulated by uridylation in fission yeast, fruit fly or human cells (De Almeida et al., 2018). Whether SOV participates in the clearance of uridylated 5<sup>0</sup> fragments of RISC-cleaved transcripts could be reliably addressed using 3<sup>0</sup> RACE-seq and by comparing Col-0 and Ler accessions, because a point mutation affects SOV activity in Col-0 (Zhang et al., 2010). All these examples illustrate that a large number of samples must be analyzed with sufficient depth and replicates to draw reliable conclusions. The 3<sup>0</sup> RACE-seq method adapted to the analysis of RISC 5<sup>0</sup> -cleavage fragments will contribute to fully characterize the tailing and nibbling events linked to the metabolism of these fragments and to address the respective roles of distinct factors of the RNA degradation machinery in this process.

# AUTHOR CONTRIBUTIONS

DG and HZ conceived and designed the study, wrote the paper, and acquired funding. HZ, A-CJ, and HS performed the experiments. HZ performed the bioinformatics analysis. A-CJ and HS edited the manuscript. HZ and HS prepared the illustrations.

## FUNDING

This work was supported by the Centre National de la Recherche Scientifique (CNRS, France) and the Agence Nationale de la Recherche (ANR, France) as part of the Programme d'Investissements d'Avenir in the frame of the LabEx NetRNA (ANR-2010-LABX-36) to DG and in the frame of the IdEx Unistra to HZ.

# ACKNOWLEDGMENTS

The authors are grateful to Camille Noblet for technical help.

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2018.01438/ full#supplementary-material

FIGURE S1 related to Figure 4 | Uncropped images of the western blot analysis and the membrane stained with Coomassie blue shown in Figure 4.

FIGURE S2 related to Figure 5 | Uncropped images of the northern blot analysis and the membrane stained with methylene blue for (A) MYB33 5 0 fragment analysis and for (B) miR159 analysis.

FIGURE S3 related to Figure 6 | Nibbled MYB33 RISC 5<sup>0</sup> -cleavage fragments accumulate in the absence of HESO1. Positions of 3<sup>0</sup> extremities of MYB33 RISC 5 0 -cleavage fragments mapped in a −10/0 window for four biological replicates in WT and heso1-1. Graphs are shown separately for each of the four replicates.

FIGURE S4 related to Figure 6 | Nibbled MYB33 RISC 5<sup>0</sup> -cleavage fragments accumulate in the absence of HESO1. Positions of 3<sup>0</sup> extremities of MYB33 RISC 5 0 -cleavage fragments mapped in a −10/0 window for two biological replicates for WT, urt1-1, the urt1SIL line, heso1-1, and the three heso1-1 urt1SIL lines. Graphs are shown separately for each of the two replicates.

FIGURE S5 related to Figure 8 | Positions of 3<sup>0</sup> extremities of SPL13 5 0 -cleavage fragments mapped in a −10/0 window for four biological replicates in WT and heso1-1. Graphs are shown separately for each of the four replicates.

FIGURE S6 related to Figure 8 | Nibbled SPL13 RISC 5<sup>0</sup> -cleavage fragments accumulate in the absence of HESO1 and URT1. Positions of 3<sup>0</sup> extremities of SPL13 RISC 5<sup>0</sup> -cleavage fragments mapped in a −10/0 window for two biological replicates for WT, urt1-1, the urt1SIL line, heso1-1, and the three heso1-1 urt1SIL lines. Graphs are shown separately for each of the two replicates.

TABLE S1 | List of primers used in this study.

TABLE S2 | Summary of the number of reads analyzed at each step of the data processing for each 3<sup>0</sup> RACE-seq library.

TABLE S3 | Exhaustive list of extensions found by 3<sup>0</sup> RACE-seq for MYB33 5 0 -cleavage fragments in WT for four biological replicates from dataset #1.

TABLE S4 | Exhaustive list of extensions found by 3<sup>0</sup> RACE-seq for SPL13 5 0 -cleavage fragments in WT for four biological replicates from dataset #1.

### REFERENCES

fpls-09-01438 October 6, 2018 Time: 18:8 # 13


DATA SHEET S1 | Scripts for 3<sup>0</sup> RACE-seq data processing, related to the analysis of 5<sup>0</sup> mRNA fragments generated by RISC cleavage of MYB33 mRNAs.

DATA SHEET S2 | Scripts for 3<sup>0</sup> RACE-seq data processing, related to the analysis of 5<sup>0</sup> mRNA fragments generated by RISC cleavage of SPL13 mRNAs.



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Zuber, Scheer, Joly and Gagliardi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.