# ROLE OF RNA MODIFICATION IN DISEASE

EDITED BY : Emanuele Buratti and Amit Bhardwaj PUBLISHED IN : Frontiers in Genetics

#### Frontiers eBook Copyright Statement

The copyright in the text of individual articles in this eBook is the property of their respective authors or their respective institutions or funders. The copyright in graphics and images within each article may be subject to copyright of other parties. In both cases this is subject to a license granted to Frontiers. The compilation of articles constituting this eBook is the property of Frontiers.

Each article within this eBook, and the eBook itself, are published under the most recent version of the Creative Commons CC-BY licence. The version current at the date of publication of this eBook is CC-BY 4.0. If the CC-BY licence is updated, the licence granted by Frontiers is automatically updated to the new version.

When exercising any right under the CC-BY licence, Frontiers must be attributed as the original publisher of the article or eBook, as applicable.

Authors have the responsibility of ensuring that any graphics or other materials which are the property of others may be included in the CC-BY licence, but this should be checked before relying on the CC-BY licence to reproduce those materials. Any copyright notices relating to those materials must be complied with.

Copyright and source acknowledgement notices may not be removed and must be displayed in any copy, derivative work or partial copy which includes the elements in question.

All copyright, and all rights therein, are protected by national and international copyright laws. The above represents a summary only. For further information please read Frontiers' Conditions for Website Use and Copyright Statement, and the applicable CC-BY licence.

ISSN 1664-8714 ISBN 978-2-88963-310-4 DOI 10.3389/978-2-88963-310-4

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

#### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

#### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# ROLE OF RNA MODIFICATION IN DISEASE

Topic Editors: Emanuele Buratti, International Centre for Genetic Engineering and Biotechnology, Italy Amit Bhardwaj, Langone Medical Center, New York University, United States

Citation: Buratti, E., Bhardwaj, A., eds. (2019). Role of RNA Modification in Disease. Lausanne: Frontiers Media SA. doi: 10.3389/978-2-88963-310-4

# Table of Contents

*04 Editorial: Role of RNA Modification in Disease* Emanuele Buratti and Amit Bhardwaj *06 Systematic Analysis of Gene Expression Profiles Controlled by hnRNP Q and hnRNP R, Two Closely Related Human RNA Binding Proteins Implicated in mRNA Processing Mechanisms* Sara Cappelli, Maurizio Romano and Emanuele Buratti *23 RNA Biology Provides New Therapeutic Targets for Human Disease* Lorna W. Harries *35 Aberrant Phase Transitions: Side Effects and Novel Therapeutic Strategies* 

*in Human Disease* Veronica Verdile, Elisa De Paola and Maria Paola Paronetto


# Editorial: Role of RNA Modification in Disease

#### *Emanuele Buratti1\* and Amit Bhardwaj2*

1 Molecular Pathology, International Centre for Genetic Engineering and Biotechnology (ICGEB), Trieste, Italy, 2 Kimmel Center for Biology and Medicine at the Skirball Institute, Department of Pathology, New York University School of Medicine, New York, NY, United States

Keywords: RNA, RNA Splicing, RNA binding proteins, RNA modifications, Human disease and tRNA modifications

**Editorial on the Research Topic** 

#### **Role of RNA Modification in Disease**

When first discovered, mRNA molecules were considered as the simple way through which the information stored in the DNA could be transformed in the real effector molecules, the cellular proteins. This view did not last very long because it became clear very soon that RNA molecules had the ability to determine the way the information was transferred from DNA to the proteins.

#### Edited by:

William Cho, Queen Elizabeth Hospital (QEH), Hong Kong

#### Reviewed by:

Michiaki Hamada, Waseda University, Japan Isaia Barbieri, University of Cambridge, United Kingdom

#### \*Correspondence:

Emanuele Buratti buratti@icgeb.org

#### Specialty section:

This article was submitted to RNA, a section of the journal Frontiers in Genetics

Received: 25 June 2019 Accepted: 30 August 2019 Published: 23 October 2019

#### Citation:

Buratti E and Bhardwaj A (2019) Editorial: Role of RNA Modification in Disease. Front. Genet. 10:920. doi: 10.3389/fgene.2019.00920

Today, it is very clear that RNA molecules play a fundamental role in two of the most complex macromolecular machineries found in cells: the spliceosome and the ribosome. Furthermore, the presence in our cells of a vast array of small and large noncoding RNAs (ncRNAs) whose function and regulation are still partially unknown has demonstrated that RNA can actively affect regulatory networks. In particular, the developmental and tissue-specific expression of these ncRNAs can profoundly affect constitutive and alternative mRNA regulatory processes and also participates in the fine tuning of translational processes. As a result of all this complexity, the emerging view is one where RNA molecules occupy a central position in almost all cellular processes within the cell, and only their correct expression can ensure both proper functioning and survival. Unfortunately, the presence of all these very complex regulatory networks also implicates that defects at the level of pre-RNA processing pathways are a major cause of human disease.

The purpose of this Research Topic has been to provide an overview of several issues that relate to this issue, with a special focus on new emerging aspects of disease-related RNA connections that range from studying modifications of the RNA molecule itself to the effect of proteins that regulate its processing.

Regarding the RNA molecule itself, the analysis of RNA *N*<sup>6</sup> -methyladenosine (m6 A) modifications has been recently suggested to play a critical role in a variety of biological processes and to be especially associated with cancer risk. In their approach, Tang et al. have developed an online database (DRUM) to support the query of disease-associated RNA m6 A methylation sites that will help unravel disease mechanisms at the epitranscriptome layer.

Another important regulatory layer of RNA processing is represented by the many RNA binding proteins that are expressed within the eukaryotic nucleus. In particular, an increasingly important role is being played by classical hnRNP binding properties, as described by Silva et al. In their work, Silva et al. have investigated the importance of TDP-43 in frontotemporal dementia and that mice conditionally expressing this protein recapitulate several core behavioral features of Frontotemporal Dementia/Amyotrophic Lateral Sclerosis (FTD/ALS) spectrum of human pathology. This particular study highlights the importance of RNA binding proteins in neurodegenerative diseases Interestingly, many of these proteins have not yet been characterized very well, despite being known since a long time, and this particular topic has been analyzed by Cappelli et al. with regard to the genes controlled

1 **4** by the two closely related hnRNP Q and R in neuronal cell lines and that were previously shown to affect TDP-43 toxicity in flies and neuronal cell lines (Appocher et al., 2017).

Finally, much of the excitement that is taking place nowadays with regard to RNA metabolism is targeted at the use of small effectors to recover RNA alterations in a variety of diseases. Therefore, the two final chapters in this research topic look at RNA-based therapeutic strategies both in general terms (Harries) and in targeting a more specialized behavior of RNA binding proteins that is represented by their ability to undergo phase transitions (Verdile et al.).

# AUTHOR CONTRIBUTIONS

Both authors contributed to the writing of this editorial.

# BIBLIOGRAPHY

Appocher, C., Mohagheghi, F., Cappelli, S., Stuani, C., Romano, M., Feiguin, F., et al. (2017). Major hnRNP proteins act as general TDP-43 functional modifiers both in *Drosophila* and human neuronal cells. *Nucleic Acids Res.* 45, 8026–8045.

**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2019 Buratti and Bhardwaj. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Systematic Analysis of Gene Expression Profiles Controlled by hnRNP Q and hnRNP R, Two Closely Related Human RNA Binding Proteins Implicated in mRNA Processing Mechanisms

#### Sara Cappelli <sup>1</sup> , Maurizio Romano<sup>2</sup> and Emanuele Buratti <sup>1</sup> \*

*<sup>1</sup> Molecular Pathology, International Centre for Genetic Engineering and Biotechnology, Trieste, Italy, <sup>2</sup> Department of Life Sciences, University of Trieste, Trieste, Italy*

#### Edited by:

*Naoyuki Kataoka, The University of Tokyo, Japan*

#### Reviewed by:

*Claudia Ghigna, Istituto di genetica molecolare (IGM), Italy Kunio Inoue, Kobe University, Japan*

> \*Correspondence: *Emanuele Buratti buratti@icgeb.org*

#### Specialty section:

*This article was submitted to RNA, a section of the journal Frontiers in Molecular Biosciences*

> Received: *16 May 2018* Accepted: *08 August 2018* Published: *30 August 2018*

#### Citation:

*Cappelli S, Romano M and Buratti E (2018) Systematic Analysis of Gene Expression Profiles Controlled by hnRNP Q and hnRNP R, Two Closely Related Human RNA Binding Proteins Implicated in mRNA Processing Mechanisms. Front. Mol. Biosci. 5:79. doi: 10.3389/fmolb.2018.00079* Heteregeneous ribonucleoproteins (hnRNPs) are a family of RNA-binding proteins that take part in all processes that involve mRNA maturation. As a consequence, alterations of their homeostasis may lead to many complex pathological disorders, such as neurodegeneration and cancer. For many of these proteins, however, their exact function and cellular targets are still not very well known. Here, we focused the attention on two hnRNP family members, hnRNP Q and hnRNP R, that we previously found affecting TDP-43 activity both in Drosophila melanogaster and human neuronal cell line. Classification of these two human proteins as paralogs is suported by the high level of sequence homology and by the observation that in fly they correspond to the same protein, namely Syp. We profiled differentially expressed genes from RNA-Seq and generated functional enrichment results after silencing of hnRNP Q and hnRNP R in neuroblastoma SH-SY5Y cell line. Interestingly, despite their high sequence similarity, these two proteins were found to affect different cellular pathways, especially with regards to neurodegeneration, such as PENK, NGR3, RAB26, JAG1, as well as inflammatory response, such as TNF, ICAM1, ICAM5, and TNFRSF9. In conclusion, human hnRNP Q and hnRNP R may be considered potentially important regulators of neuronal homeostasis and their disruption could impair distinct pathways in the central nervous system axis, thus confirming the importance of their conservation during evolution.

Keywords: hnRNP Q, Syncrip, hnRNP R, RNA-seq, brain, immune system

# INTRODUCTION

Regulation of RNA metabolism is an important step in the maintenance of neuronal homeostasis. RNA biosynthesis, editing, and turnover are sustained at different levels by a network of RNAbinding proteins (RBPs) that bind to the pre-mRNA molecule and combinatorially interact with each other. In 2004, a census of all the human proteins known to bind RNA or to be RNA-related identified ∼ 1,542 proteins (about 7.5% of all protein-coding genes) as potentially belonging to the RBP family (Gerstberger et al., 2014). This finding also reflects a great importance for these proteins during evolution, as a considerable number of ortholog of these human RBPs was also found in the lower organisms, such as Archea and Bacteria (Anantharaman et al., 2002). Furthermore, bioinformatics analysis of Saccharomyces cerevisiae and Drosophila melanogaster genomes have also highlighted that 5–8% and 2–3% of genes are predicted to act as RBPs, respectively (Keene, 2001). In addition to the evolutionary conservation the importance of RBPs in the regulation of RNA metabolism is also highlighted by the observation that highly complex tissues, such as brain, express a network of specific RBPs for regulating the RNA homeostasis (e.g., Hu/ELAV family) (De Conti et al., 2016).

The most abundant members of this family are called heterogeneous ribonucleoproteins (hnRNPs) and share several structural and functional properties (Gerstberger et al., 2014). These hnRNP proteins are highly conserved proteins that were originally described as a group of ∼20 major factors capable of forming high molecular-weight complexes transiently bound to the nascent heterogeneous nuclear RNA (hnRNA) transcribed by RNA polymerase II (Dreyfuss et al., 1993). Later on, many other proteins involved in controlling RNA processing have been seen to share hnRNP-like features and are now classified as members of this large family, thus including TAR-DNA Binding Protein 43 (TDP-43), CUGBP Elav-Like Family (CELF) proteins, Neuro-Oncological Ventral Antigen (NOVA) proteins, and Fused in Sarcoma (FUS) (Busch and Hertel, 2012). It is important to note that the number of RNA binding proteins capable of altering RNA processing is still growing steadily and a recent attempt at uncovering the number of RBPs that can be produced by HeLa cells has uncovered several hundred putative new proteins for which we still now very little about (Castello et al., 2012). For this reason, it is important to functionally characterize in a systematic manner all the major components of this family.

Structurally, all RBPs contain some common elements (Gerstberger et al., 2014). In particular, hnRNPs contain one or more RNA-binding domains (RBDs) and the majority of them also have arginine-glycine-glycine (RGG) boxes and auxiliary domains, such as acid-rich- and proline-rich domains. Most importantly, many RBPs also present different splicing isoforms and can undergo post-translation modifications as well as nucleocytoplasmic shuttling (Han et al., 2010).

In cells, the equilibrium of hnRNP proteins is finely regulated and alterations in their expression levels can often lead to numerous defects at the level of RNA processing. This can be particularly problematic for neurons that are characterized by a very adaptive and dynamic architecture. As a consequence, perturbation of the neuronal hnRNP levels may lead to neurodegenerative disorders, such as amyotrophic lateral sclerosis (ALS), fronto-temporal lobar dementia (FTLD), spinal muscular atrophy (SMA), and Alzheimer's disease (AD) (Neumann et al., 2006; Vance et al., 2009; Bebee et al., 2012; Berson et al., 2012). Very often, these perturbations are caused by the occurrence of aberrant aggregation of these proteins in the neurons of affected patients (Conlon and Manley, 2017). It is well-established, for example, that aggregation of TDP-43 and FUS in patient brains is a major feature of patients suffering from ALS/FTLD (Neumann et al., 2006; Kwiatkowski et al., 2009; Vance et al., 2009). Further evidences of the relationship between hnRNP and neurodegeneration are also provided by the identification of ALS/FTLD-associated mutations in other hnRNP proteins, including hnRNP A1 and A2/B1 (Kim et al., 2013), likewise to the finding of nuclear and cytoplasmic deposition of hnRNPA3 in the hippocampus of patients with C9orf72 hexanucleotide expansion mutations (Davidson et al., 2017).

Interestingly, from our previous work on this topic, we have observed that major hnRNP cellular proteins can modulate the gain- and loss-of-function effects of one of the major disease players, such as TDP-43 (Mohagheghi et al., 2016; Appocher et al., 2017). In particular, we found that a distinct set of hnRNPs is capable of powerfully rescuing TDP-43 toxicity in the fly eye (Hrb27c, CG42458, Glo, and Syp). From the point of view of RNA metabolism in ALS pathology, among these four proteins, Syp was particularly interestingly because of its well known connections with the nervous system development. In Drosophila melanogaster, in fact, this protein was found to regulate the localization of mRNAs driving axis specification and germline formation as well as that of mRNAs involved in the organization of the neuromuscolar junction (McDermott et al., 2012; Mcdermott et al., 2014). Intriguingly, in humans this protein can be found in two well conserved Syp-orthologs, hnRNP Q and hnRNP R (**Figure 1**), suggesting the occurrence of a progressive functional divergence of these two paralogs in mammalian cells.

The RNA binding protein hnRNP Q, also known as SYNCRIP, was first described in 1997 as a nucleocytoplasmic protein interacting with the synaptotagmin isoform II (Syt-II) C2AB domain in mouse brain lysate (Mizutani et al., 1997) and subsequently found in association with human survival of motor neurons (SMN) gene (Mourelatos et al., 2001). The protein hnRNP Q exists in different splicing isoforms, three of which are the most representative: hnRNP Q3, hnRNP Q2, and hnRNP Q1 (a schematic diagram of each isoform is shown in **Figure 1A**). The Q3 variant is very similar in sequence (∼83% homology) to hnRNP R (Mourelatos et al., 2001) that is also expressed in alternative splicing isoforms, although predominantly as a major isoform known as R1 (UniProt ID O43390-1).

Functionally, hnRNP Q and hnRNP R are already known to regulate different aspects of RNA maturation. More specifically, hnRNP Q was found to promote inclusion of SMN2 exon7 (Chen et al., 2008) and to inhibit C-to-U RNA editing of the apolipoprotein B mRNA (apoB) (Blanc et al., 2001). In addition, this factor can also affect mRNA transport, as demonstrated by the colocalization with ribosomal proteins and other RBPs in neuronal mRNA granules (Bannai et al., 2004; Kanai et al., 2004). Finally, hnRNP Q1 is also able to modulate neuronal morphogenesis and neurite branching in a mouse neuroblastoma cell line by interacting with different mRNAs related to Cdc42 signaling (Chen et al., 2012).

On the other hand, hnRNP R was first described in 1998 in the serum of patients with autoimmunity symptoms (Hassfeld et al., 1998) and subsequntly identified, like hnRNP Q, as a factor bound to the SMN mRNA (Rossoll et al., 2002). At present, it is known that hnRNP R is involved in the transcription and degradation process of c-fos mRNA in retinal cells (Huang et al.,

FIGURE 1 | Structure of and subcellular localization of endogenous hnRNP Q and hnRNP R. (A) Schematic representation of protein domains and major isoforms of *Drosophila melanogaster* CG17838/Syncrip and human hnRNP Q/ hnRNP R. AcD (acidic domain), RRM (RNA-recognition motif), NLS (nuclear localization signal), RGG (Arg-Gly-Gly)-box and Glutammine/Asparagine (Q/N)-rich domain are highlighted in colored boxes; relative sequence position and amino acids length of each isoform is also reported. *Drosophila melanogaster* CG17838/Syncrip isoform F contains a conserved AcD, three RRM (RRM1, RRM2, and RRM3) and two NLS. Regarding hnRNP Q three major isoforms are represented: hnRNP Q3 is the longest variant and contains an AcD domain, three RRMs, two NLS and an RGG-box; hnRNP Q2 lacks of 36 aa (1302-336) between RRM2 and RRM3 compare to the longest variant hnRNP Q3, while hnRNP Q1 lacks of the second NLS and RGG-box region (1549-623) from hnRNP Q3 and contains a unique C-terminal domain (VKGVEAGPDLLQ). The isoform 1 of hnRNP R (hnRNP R1) contains an AcD domain, three RRMs, two NLS, an RGG-box and a Q/N-rich domain at C-terminus. The low expressed and neuronal-specific isoform (hnRNP R2) lacking of the 41 aa (1129-166) between the AcD domain and the first RRM is also reported. Sequence identity and similarity were calculated using EMBOSS Needle with respect to the *Drosophila melanogaster* CG17838/Syncrip isoform F. We considered this fly isoform, according to its high expression in different stage of life cycle and in all adult *(Continued)*

FIGURE 1 | tissues, including brain (McDermott et al., 2012). (B) Immunofluorescence analysis of the endogenous of human hnRNP Q and hnRNP R (shown in green) in SH-SY5Y cells. Nuclei were visualized using DAPI staining. Scale bars: 17µm. (C) Nuclear and cytoplasmic fractions of endogenous human hnRNP Q and hnRNP R. α-p84 and α-tubulin were used as controls for nuclear and cytoplasmic fractions, respectively. Molecular weight of each isoform is reported. \*possible splicing variant of hnRNP Q and \*\* possible splicing variant of hnRNP R.

2008) and in the expression of immunity factors (Meininger et al., 2016; Reches et al., 2016).

Interestingly, functional rescue of TDP-43 alterations was found to be conserved in the human orthologs of Hrb27c (DAZAP1) and for only one of the human orthologs of Syp (hnRNP Q), but not for the second one (hnRNP R) (Appocher et al., 2017). Based on these results, we have therefore decided to focused the attention on these two members of the hnRNP family whose functions still remain not completely clear. In order to better characterize hnRNP Q and hnRNP R from a neuronal point of view, we have now assessed the cellular localization of these hnRNPs in SH-SY5Y cells and we have investigated changes in the whole transcritome status after their knockdown, looking for gene pathways particularly regulated by these two factors.

# MATERIALS AND METHODS

## Cell Culture and Gene Knockdown

Human neuroblastoma SH-SY5Y cell line (ATCC Microbiology, Manassas, VA) were cultured as described previously (Appocher et al., 2017). To achieve optimal knockdown efficiency, three rounds of silencing were performed on day 1, 2 using Hyperfectamine (Qiagen Inc, Gaithersburg, MD, USA), according to the manufacturer's instruction. The siRNA sense sequences used in this study were as follows: luciferase (siLUC), 5′ -uaaggcuaugaagagauac-3′ ; hnRNP Q (sihnRNP Q), 5 ′ -agacagugaucucucucau-3′ ; and hnRNP R (sihnRNP R),5′ cauuugggaucuacgucuu-3′ .

The mouse motor neuron NSC-34 cell line was cultured in Dulbecco's modified Eagle's medium (DMEM)–Glutamax-I (Gibco- BRL, Life Technologies Inc., Frederick, MD, USA) supplemented with 5% fetal,bovine serum (FBS) (SigmaAldrich, St Louis, MO, USA) and 1% Antibiotic-Antimycotic-stabilized suspension (Sig- maAldrich, St Louis, MO, USA) at 37◦C incubator with humidified atmosphere of 5% CO2. Cultures were used 5–15 passages.

For differentiation, NSC-34 cells were seeded to reach 70% confluence the day after and the proliferation medium was exchanged 24 h later to fresh differentiation medium containing 1:1 DMEM/F-12 Ham (SigmaAldrich, St Louis, MO, USA), 1% FBS (SigmaAldrich, St Louis, MO, USA), 1% modified Eagle's medium nonessential amino acids (NEAA) (SigmaAldrich, St Louis, MO, USA), 1% Antibiotic-Antimycotic-stabilized suspension (SigmaAldrich, St Louis, MO, USA) and 1µM alltrans retinoic acid (RA). Differentiation medium was changed every 2 days and cells were allowed to differentiate for up to 4–7 days.

NSC-34 cells, maintained on proliferation medium (DMEM, 5% FBS, 1% Antibiotic-Antimycotic suspension) represented the undifferentiated control group. Images of undifferentiated (control) and differentiated NSC-34 cells were acquired in light microscopy using a Leica DMIL LED mycroscope equipped with a 20X objective, a Leica DFC450 C camera (Leica Microsystems, Cambridge, UK) and LAS v.4.4.0 Software (Leica application suit). The average length of neurites in the differentiation media was compared to that in the proliferation media and quantified using Fiji NeuronJ (Meijering et al., 2004). Neurite length was analyzed by imaging a minimum of 5 cells per field. The mean of neurite length ± standard error is reported. Statistical significance was calculated using t-test (indicated as ∗∗∗ for P ≤ 0.001).

## RT-qPCR Analysis

Cells were harvested 48 h after the last siRNA transfection and were processed for RT-qPCR analysis. RNA extraction was performed using EuroGOLD TriFast (Euroclone, Milan, Italy), according to the manufacturer's instructions. One Microgram of total RNA was used for the reverse transcription carried out at 37◦C using random primers (SigmaAldrich, St Louis, MO, USA) and Moloney murine leukemia virus (M-MLV) Reverse Transcriptase (Gibco-BRL, Life Technologies Inc., Frederick, MD, USA). The resulting cDNA was diluted 1:10 and used for quantitative PCR (qPCR). The target gene sequences were the following: hnRNP Q forward 5′ -actgttgaatgggctgatcc-3′ , reverse 5′ -cctccaagtctttgccattc-3′ ; hnRNP R forward 5′ -gcaaggtgc aagagtccaca-3′ , reverse 5′ -cacgccagagtacacactgtc-3′ ; TNF forward 5 ′ -cctctctct aatcagccctctg-3′ , reverse 5′ -gaggacctgggagtagatgag-3 ′ ; ICAM1 forward 5′ -ggccggccagctt atacac-3′ , reverse 5′ -tag acacttgagctcgggca-3′ ; PENK forward 5′ -gtgcagctaccgcctagtg-3′ , reverse 5′ - tgcaggtttcccaaattttc-3′ ; TNFRSF9 forward 5′ -ttggat ggaaagtctgtgcttg-3′ , reverse 5′ -a ggagatgatctgcggagagt-3′ ; KLF4 forward 5′ -gcggcaaaacctacacaaag-3′ , reverse 5′ - ccccgtgtgtttacg gtagt-3′ ; KLHL4 forward 5′ -ttggagatgatggctgatga-3′ , reverse 5′ aagagtttgctctgcgtggt-3′ ; NRG3 forward 5′ -tattcaaaggtggaaaggcatc c-3′ , reverse 5′ -tgaaggcattcctatggagca-3′ ; RAB26 forward 5′ -tcatct ccaccgtaggcatt-3′ , reverse 5′ -ccggtagtaggcatgggtaa-3′ ; ARHGA36 forward 5′ -ttgaactgacagccacgatg-3′ , reverse 5′ -gccagactatccaca gacac-3′ ; CT55 forward 5′ -atgttgtgactggcaacgtg-3′ , reverse 5′ agcaccataaagatggcgag-3′ ; CARTPT forward 5′ - ccgagccctggacat ctact-3′ , reverse 5′ -atgggaacacgtttactcttgag-3′ ; FOSB forward 5′ accctctgccgagtctcaat-3′ , reverse 5′ -gaaggaaccgggcatttc-3′ ; JAG1 forward 5′ -atcgtgctgcctttcagttt-3′ , reverse 5′ -gatcatgcccgagtgaga a-3′ ; ICAM5 forward 5′ -ggctcttcggcctctcag-3′ , reverse 5′ -gca gttggtgctgcaattc-3′ ; DUOXA1 5′ -ccaagccaaccttcccgat-3′ , reverse 5 ′ -cccgatgaataagctggtcac-3′ ; HMOX1 forward 5′ -gccagcaacaaa gtgcaag-3′ , reverse 5′ -gagtgtaaggacccatcgga-3′ ; KCNAB1 5′ -gca aatcgaccggacagtaac-3′ , reverse 5′ -gccatgccttggtttatcacat-3′ , ACP5 forward 5′ -ctacccactgcctggtcaag-3′ , reverse 5′ -cacgccattctcatcttg c-3′ ; SDCBP2 forward 5′ -ccactacgtgtgtgaggtgg-3′ , reverse 5′ tgctcgtagatcacactggg-3′ , EFEMP1 forward 5′ -cgagcaaagtgaacacaa cg-3′ , reverse 5′ -gatatccaggagggcactga-3′ . Housekeeping gene Hypoxanthine phosphoribosyltransferase 1 (HPRT1) and RNA polymerase II subunit A (POLR2A) were used to normalize the results. The sequences of these primers are the following: HPRT1 forward 5′ -tgacactggcaaaacaatgca-3′ , reverse 5′ -ggtcctttt caccagcaagct-3′ ; RPII forward 5′ -gcccacgtccaatgacat-3′ , reverse 5 ′ -gtgcggctgcttccataa-3′ .

Quantitative PCR reaction was performed in the presence of iQ SYBR green supermix (BioRad, Hercules, CA, USA), using the following conditions for all the target genes, but KLF4 and KLHL4: 95◦C for 3 min, 95◦C for 10 s, 60◦C for 30 s, 95◦C for 10 s, 65◦C for 1 s. For KLF4 and KLHL4 genes the qPCR conditions were the following: 95◦C for 3 min, 95◦C for 10 s, 65◦C for 30 s, 95◦C for 10 s, 65◦C for 1 s. The relative gene expression levels were determined using the 2–11CT method (Schmittgen and Livak, 2008). The mean of relative expression levels ± standard error of three independent experiments is reported. Statistical significance was calculated using t-test (indicated as <sup>∗</sup> for P ≤ 0.05, as ∗∗ for P ≤ 0.01 and as ∗∗∗for P ≤ 0.001).

#### Immunofluorescence Analysis

SH-SY5Y cells (3 × 10<sup>5</sup> ) and NSC-34 (3.5 × 10<sup>5</sup> ) were plated in 6-well plates containing coverslips. For SH-SY5Y treated with siRNA against hnRNPs and NSC-34 we plated the corresponding number of cells in 6-well plates containing coverslips coated with poly-L-lysine solution at a final concentration of 0.01% (w/v) in H20 (SigmaAldrich, St Louis, MO, USA). After 24 h, cells were washed three times with PBS, fixed in 3.2% paraformaldehyde in PBS for 1 h at room temperature and permeabilized by using 0.3% Triton in PBS for 5 min on ice. Cells were then blocked with 2% BSA/PBS for 20 min at room temperature and immunolabeled with 1:200 rabbit polyclonal antibody antihnRNP Q (SigmaAldrich, St Louis, MO, USA) or 1:200 rabbit polyclonal antibody anti-hnRNP R (Abcam, Cambridge, UK) in 2% BSA/PBS overnight at 4◦C. Next day, cells were washed three times with PBS, incubated with 1:500 anti-rabbit Alexa-Fluor 488 (Invitrogen, Carlsbad, CA, USA) for 1 h at room temperature and coverslipped with Vectashield-DAPI mounting medium (Vector Laboratories, Burlingame, CA, USA). Each slide was analyzed at the microscopy facility of University of Trieste, using a Nikon Eclipse C1si confocal microscope system mounted on a Nikon TE-2000U inverted microscope with a 60X objective.

## Nuclear and Cytoplasmic Extraction and Western Blot Analysis

SH-SY5Y cells were seeded in p100 dishes to reach 90% confluence the day of nuclear and cytoplasm extraction. Cells from two dishes were pooled together and the resulting pellets were treated using NER-PER Nuclear and Cytoplasmic Extraction Reagents (ThermoFischer, Waltham, MA, USA) as described in the manufacturer's instructions. Evaluation of the presence/absence of hnRNP Q and hnRNP R in the nuclear and cytoplasm fractions was then carried out by Western blot analysis. Proteins extract (15 µg) for each sample was loaded on a 10% SDS-PAGE gel. The gel was then electroblotted on an Immobilon-P PVDF Membrane (Merck Millipore, Burlington, MA, USA), according to standard protocols and blocked with 4% BSA (SigmaAldrich, St Louis, MO, USA) prepared in 1× PBS with 0.1% Tween-20 (SigmaAldrich, St Louis, MO, USA). Proteins were incubated with 1:1000 rabbit polyclonal antibody anti-hnRNP Q (SigmaAldrich, St Louis, MO, USA) or 1:1000 rabbit polyclonal antibody anti-hnRNP R (Abcam, Cambridge, UK) and successively were incubated with 1:2000 HRPconjugated secondary antibody (Dako, Glostrup, Denmark). Protein detections were assessed with Luminata Classico Western HRP substrate (Merck Millipore, Burlington, MA, USA) and the images were acquired using Alliance 9.7 Western Blot Imaging System (UVItec Limited, Cambridge, UK). In-house made 1:1000 mouse polyclonal antibody anti-tubulin and 1:1000 mouse monoclonal antibody anti-p84 (Abcam, Cambridge, UK) were used as cytoplasmic and nuclear controls, respectively (Ayala et al., 2008).

# RNAseq and Analysis of Differentially Expressed Genes (DEGs)

Total RNA was extracted from luciferase (control), hnRNP Q and hnRNP R depleted SH-SY-5Y cells, as described previously. RNA sequencing was performed by Eurofins (www.eurofins. com) using Illumina HiSeq 2500 instrument. Data processing was carried out with the following software: HiSeq Control Software v2.0.12.0, RTA v1.17.21.3 and bcl2fastq-1.8.4. Alignment to human reference sequence was performed by BWA-MEM (version 0.7.12-r1039, http://bio-bwa.sourceforge.net/) and the raw read counts were created using featureCounts (http://bioinf. wehi.edu.au/featureCounts/). Only reads with unique mapping positions and a mapping quality score at least 10 were considered for read counting. Raw read counts were converted to Counts per million (CPM) values by Trimmed mean of M-values (TMM) normalization (edgeR package http://bioconductor.org/ packages/release/bioc/html/edgeR.html, (Robinson and Oshlack, 2010). Features had to have a counts-per-million value of more than one in at least three samples or were removed, resulting in the removal of 47,622 of the 64,769 features. Differential expression analysis was performed on the remaining 17,417 genes using edgeR package. GOseq package from R (Young et al., 2010) was also used for Gene ontology (GO) and KEGG pathway analysis. Categories significantly enriched (p-value < 0.05) were considered.

## Accession Numbers

The data discussed in this publication have been deposited in NCBI's Gene Expression Omnibus (Edgar et al., 2002) and are accessible through GEO Series accession number GSE114165. [The following secure token has been created to allow review of record GSE114165 while it remains in private status: kjubwgcqjjyljct].

# RESULTS

# Different Subcellular Localization of Human hnRNP Q and hnRNP R in the Neuroblastoma SH-SY5Y Cell Line

In our previous study, we have used Drosophila melanogaster as a model organism to study the effects of hnRNP depletion in a model of TDP-43/TBPH gain- and loss-of-function and we identified CG17838/Syncrip (Syp) (**Figure 1A**) as a potentially very powerful modulator of TDP-43 effects (Appocher et al., 2017). Interestingly, of the two human Syp orthologs (hnRNP Q and hnRNP R), only hnRNP Q was shown to be able to rescue missplicing effects due to TDP-43 silencing.

To better characterize the roles played by human hnRNP Q and hnRNP R in the neuronal-like cell line SH-SY5Y, we first investigated their subcellular localization by carrying out immunofluorescence (IF) staining for the endogenous proteins. In these cells, hnRNP Q showed punctate localization both in nucleus and cytoplasm whilst the hnRNP R IF signal was predominantly nuclear (**Figure 1B**). In particular, the presence of the cytoplasmic variant of hnRNP Q in granule-like structures further supports previous results showing the involvement of hnRNP Q in mRNA trafficking (Bannai et al., 2004; Chen et al., 2012).

Western blot analysis of nuclear and cytoplasmic fractions was also carried out to confirm these results and check for isoform production.

Regarding hnRNP Q, three major splicing isoforms of this protein have been so far previously reported (Mourelatos et al., 2001). These isoforms are characterized by the presence of two NLS in the hnRNP Q3 and hnRNP Q2 variants (with a molecular weight of 65 kDa and 70 kDa, respectively) and one NLS in the hnRNP Q1 (with a molecular weight of 62 kDa) (**Figure 1A**). Three immunoreactive bands (∼ 58, ∼66, and ∼ 75 kDa) were detected by Western blot analysis, differentially distributed between nucleus and cytosol (**Figure 1C**). The molecular weight of ∼ 75 kDa is consistent with that of hnRNP Q2/hnRNP Q3 isoforms and the molecular weight of ∼ 66k Da with that of hnRNP Q1 isoform. The apparent molecular weight of the lower band (∼58 kDa) cannot be associated to any known hnRNP Q isoform and could be corresponding to a further variant that still remains to be characterized.

The same analysis was repeated for hnRNP R, confirming its presence predominantly in the nuclear fraction (**Figure 1C**). The antibody used for staining the membrane (ab30930) detected three bands: ∼71, ∼75, and ∼80 kDa. According to literature the major hnRNP R isoform, also known as R1 (NP\_005817.1. NM\_005826.4. [O43390-1]) presents a molecular weight of ∼80 kDa whilst a second characterized variant, namely R2 (NP\_001284549.1, NM\_001297620.1 [O43390-3]) presents a molecular weight of ∼75 kDa (Hassfeld et al., 1998; Huang et al., 2005). According to these data, we concluded that our immunoreactive bands of ∼75 kDa and ∼80 kDa were R2 and R1, respectively. On the other hand, the band of ∼71 kDa could be another splicing variant of hnRNP R that still needs to be identified.

Subsequently, we tested if hnRNP Q or hnRNP R might change their cellular localization after neuronal differentiation. To this aim, considering the extremely high conservation of these two proteins in mouse (more than 99% identity and similarity), immunofluorescence experiments were carried out after inducing differentiation of the murine motoneuron-like NSC-34 cell line, due to their ability to differentiate in more neuron-like cells (**Figure 2A**). The staining showed that hnRNP Q and hnRNP R did not change their subcellular distribution after differentiation (**Figure 2B**).

Overall, the different localization of hnRNP Q and hnRNP R suggests that their nuclear-cytoplasmic distribution is differentially regulated and this could be reflected in a differential control of cellular pathways.

# Knockdown of hnRNP Q and hnRNP R Affects the Expression of Genes Related to Brain Functions, Neurodegeneration and Inflammatory Response

Following this immunolocalization analysis, we therefore decided to analyze the whole transcriptome status of SH-SY5Y silenced for these hnRNPs in order to identify the genes whose expression is commonly or differentially regulated by these proteins.

First of all, we checked the downregulation of hnRNP Q and hnRNP R using qPCR (**Figure 3A**, **4A**) and then we investigated if the silencing of hnRNP Q was able to affect the gene expression levels and cellular localization of hnRNP R and vice-versa (**Figure 5**). We observed a significant reduction of mRNA levels of hnRNP Q and hnRNP R after their silencing. Moreover, we observed no significant differences in the mRNA levels as well as in the endogenous localization of hnRNP Q after sihnRNP R and of hnRNP R after sihnRNP Q.

Then, we carried out an RNA-seq analysis of three independent knockdowns for each hnRNP and the putative differentially expressed genes (DEGs) were identified comparing the data of hnRNP Q or hnRNP R silencing to siLUC control samples. To identify up- and downregulated genes, the cutoff values used were the fold change (FC) value (upregulation cut-off: >1.3; downregulation cut-off: <0.7-FC) and the pvalue < 0.05. Following silencing of hnRNP Q, a total of 2,819 genes (out of the 17,147 analyzed genes) resulted to be differentially expressed. These included 1,380 (49%) upregulated and 1,439 (51%) downregulated genes (**Figure 3A**). On the other hand, following silencing of hnRNP R, 1517 genes (out of the 17,147 analyzed genes) were differentially expressed, 957 (63%) upregulated and 560 (37%) downregulated (**Figure 4A**).

In order to validate these RNA-seq data, we monitored by qPCR the expression (after hnRNP Q or hnRNP R silencing) of 10 genes selected among the top 100 differentially expressed genes (**Figures 3B**, **4B**). We selected these genes considering their potential involvement in neuron development/functions as well as neuroinflammation.

Regarding cells silenced for hnRNP Q, tumor necrosis factor (TNF), intercellular adhesion molecule 1 (ICAM1), proenkephalin (PENK), tumor necrosis factor receptor superfamily, member 9 (TNFRSF9), Kruppel-like factor 4 (gut) (KLF4), kelch-like family member 4 (KLHL4), and neurogulin 3 (NRG3) were found to be upregulated, while RAB26, member RAS oncogene family (RAB26), Rho GTPase activating protein 36 (ARHGAP36) and cancer/testis antigen 55 (CT55) were found downregulated (**Figure 3C**). On the other hand, concerning cells silenced for hnRNP R, CART

visualized using DAPI staining. Scale bars: 17µm.

prepropetide (CARTPT), FBJ murine osteosarcoma viral oncogene homolog B (FOSB), jagged 1 (JAG1), intercellular adhesion molecule 5, telencephalin (ICAM5), dual oxidase maturation factor 1 (DUOXA1) and heme oxygenase (decycling 1) (HMOX1) were found to be upregulated, while potassium voltage-gated channel, shaker-related subfamily beta (KCNAB1), acid phosphatase 5, tartrate resistant (ACP5), syndecan binding protein (syntenin) 2 (SDCBP2), and EGF containing fibulin-like extracellular matrix protein 1 (EFEMP1) were found to be downregulated (**Figure 4C**). In conclusion, the results of our

qPCR validation are consistent with those obtained with the RNA-seq analysis.

Volcano plots were also used to obtain a general overview the results obtained in both sihnRNP Q and sihnRNP R treated cells (**Figure 6**). Differentially expressed genes are highlighted in red (downregulated) and in green (upregulated) based on the p-value and FC variation with respect to the control treated cells (siLUC). In this diagram, we also report the position of the DEGs validated in **Figures 2C**, **3C** using RT-qPCR.

# Gene Ontology (GO) Enrichment and KEGG Pathway Analysis Reveal Different and Common Features Regulated by hnRNP Q and hnRNP R

We next carried out enrichment analysis to find which GO terms are over-represented in the genes regulated by hnRNP Q and hnRNP R, in order to highlight differences and similarities in specialization between these two orthologs. To this aim, we took advantage of the GOseq R Bioconductor package (Young et al., 2010) and considered for the final analysis only GO term of the "biological process" (BP), "molecular function" (MF), and "cellular component" (CC) categories reaching the p-value threshold < 0.05 for significance. For both hnRNP Q and hnRNP R, the top 25 GO terms of the major three categories were selected and sorted by their presence or absence in each hnRNPs. This approach led us to define categories specific for hnRNP Q or hnRNP R and categories commonly present in both two proteins (**Figure 7**).

Regarding hnRNP Q, out of 2819 DEGs used as input for GO analysis, we identified a total of 1,152 terms with significant gene enrichment. The top enriched GO categories were "membrane" (p-value = 3.17E-08), "bounding membrane to organelle" (p-value = 7.29E-07), "cell morphogenesis involved in differentiation" (p-value = 1.10E-06), "cell development" (pvalue = 1.42E-06) and "cell adhesion" (p-value = 3.17E-06). On the other hand, for hnRNP R, out of 1,517 DEGs used as input for GO analysis, we identified a total of 955 terms with significant gene enrichment. The top enriched GO categories were "system development" (p-value = 2.34E-08), "tissue development" (pvalue = 4.21E-08), "multicellular organism development" (pvalue = 3.39E-07), "cell differentiation" (p-value = 4.28E-07) and "signal transduction" (1.13E-06). Notably, when we looked at GO terms differentially enriched in hnRNP Q and hnRNP R DEGs, we found that "intrinsic component of membrane," "plasma membrane," "cell periphery," "integral component of membrane," and "membrane part" were particularly enriched in hnRNP Q depleted cells, while "signal receptor activity," "transmembrane receptor activity," "transmembrane signaling receptor activity," "receptor activity," and "molecular transducer activity" were more enriched in hnRNP R depleted treated cells. Furthermore, we also looked at the KEGG pathway analysis using GOseq package from R. We found 29 and 16 terms with significant gene enrichment (p-value < 0.05) for hnRNP Q and hnRNP R, respectively (**Figure 8**). In particular, we noticed that most of the pathways identified by KEGG pathways analysis for both these two proteins were related to inflammation. Indeed, "toll-like receptor signaling pathway" (p-value = 0.002), "ECMreceptor interaction" (p-value = 0.002), "adipocytokine signaling pathway" (p-value = 0.003), "toxoplasmosis" (p-value = 0.004) and "rheumatoid arthritis" (p-value = 0.007) were particularly enriched in DEGs obtained by sihnRNP Q silencing, whereas "cell adhesion molecules (CAMs)" (p-value = 4.58E-05), "ECMreceptor interaction" (p-value = 0.001), "cytokine-cytokine receptor interaction" (p-value = 0.01), "T cell receptor signaling pathway" (p-value = 0.02) and "malaria" (p-value = 0.02) were particularly enriched in DEGs obtained by sihnRNP R silencing.

In conclusion, this analysis shows that hnRNP Q and hnRNP R have presumably acquired different functional specialization during evolution. Our results suggest that hnRNP Q plays a role in the assembly of plasma membrane lipid layers and organelles as well as in the regulation of events associated with cell-cell and cell-extracellular matrix contacts. On the contrary, hnRNP R seems to be implicated mostly in processes associated with differentiation and development of cells/tissues, as well as cell signaling.

# DISCUSSION

The elucidation of the molecular mechanisms underlying RNA regulation in both physiological and pathological processes is hampered by the great complexity of RBP networks, that can occur through the establishment of highly specific or loosely-specific interactions (Liachko et al., 2010; Cohen et al., 2015) and their post-translational modifications (Dassi, 2017).

To fill this gap, we focused our attention on two prominent but less studied members of the hnRNP family, hnRNP Q, and hnRNP R.

Previous studies have shown that hnRNP Q has multiple functions in mRNA metabolism, ranging from pre-mRNA splicing to mRNA editing, stability control, transport, and translation (Blanc et al., 2001; Bannai et al., 2004; Chen et al., 2008; Weidensdorfer et al., 2009; Kim et al., 2010). On the other hand, hnRNP R, a highly hnRNP Q related hnRNP, seems to be implicated in processing and localization of β-actin mRNA by binding its 3′ UTR in motor axons (Rossoll et al., 2003). In addition, it has been suggested that hnRNP Q and hnRNP R cooperates in regulating cytoplasmic mRNA trafficking (Mourelatos et al., 2001; Rossoll et al., 2002).

More recently, it has been confirmed that hnRNP R interacts with the 3′ UTR of mRNAs (Briese et al., 2018) and that, along with its main interactor, the noncoding RNA 7SK, it seems to coregulate the axonal transcriptome of motoneurons (Briese et al., 2018).

Therefore, despite recent progresses in the understanding of hnRNP Q and hnRNP R functions, little is about their roles in regulation of gene expression and about the potential targets of their actions. Looking at the transcriptome status of SH-SY5Y cells silenced with siRNA against hnRNP Q and hnRNP R, we found distinctive and common features associated to DEGs in both these proteins. Moreover, in the top 100 DEGs of both proteins we identified an important subset of genes

that correlate with neurodegeneration and inflammation cellular pathways.

In general, regarding hnRNP Q, our study suggests that this factor can regulate predominantly the expression of genes potentially impacting the immune response and inflammation (**Figure 8A**). In fact, the two immune-related KEGG pathways "Rheumatoid arthritis" and "Toxoplasmosis" were found to be enriched in hnRNP Q DEGs. Indeed, it was observed that infection of Toxoplasma gondii is associated with neuronal impairment and inflammation in mice and humans (Carruthers and Suzuki, 2007) and the inhibition of TNF signaling in patients suffering from rheumatoid arthritis is protective against Alzheimer's disease (Steeland et al., 2018). Finally, it is worth noting that several lines of evidence support a role for Tolllike receptors (TLRs) in the pathogenesis of neurodegenerative diseases, such as ALS (Casula et al., 2011), Alzheimer's disease (Reed-Geaghan et al., 2009) and in multiple sclerosis (Prinz et al., 2006; Marta et al., 2008).

On the other hand, regarding hnRNP R, the KEGG pathway analysis suggests that this protein predominantly influences the expression of genes related with brain functions and inflammation (**Figure 8B**). In fact, two neuronal-related KEGG pathways ("Axon guidance" and "Neuroactive-ligand receptor interaction") and immune/inflammation-related KEGG pathways ("T receptor signaling", "Cytokine-cytokine receptor interaction", "Cell adhesion molecules" "ECM-receptor interaction," and "Arachidonic acid metabolism") were found to be enriched in hnRNP R DEGs.

Then, taking a closer look at the regulated genes, regarding hnRNP Q, we found KLF4 (Qin and Zhang, 2012), NRG3 (Zhou et al., 2018), and PENK (Ernst et al., 2010) up-regulated following hnRNP Q silencing. In particular, NRG3 and PENK

of each category are reported for both sihnRNP Q and sihnRNP R treated cells.

encode two neuronal proteins associated with synapse plasticity and neuronal disorders, respectively. The silencing of hnRNP Q was also able to down-regulate both RAB26 and RAB33B, that have been shown to bind to ATG16L1 in the fly neuromuscular junctions, thus suggesting the implication of Syncrip in recycling of synaptic vesicle proteins through the autophagy pathway (Binotti et al., 2015). This observation is particularly intriguing because of the role played by TDP-43 and its fly ortholog TBPH in the neuromuscular junction formation. In fact, a previously generated TBPH-null allele Drosophila ALS model showed specific alterations of neuromuscular junctions (Feiguin et al., 2009; Godena et al., 2011; Langellotti et al., 2016; Romano et al., 2018) and the hTDP43A315T transgenic mouse model of ALS presented a strong reduction of synaptic vesicles in the NMJs (Magrané et al., 2013).

Finally, neuronal expression has been reported for both KLHL4 and ARHGAP36, although their neuronal function or possible connection with diseases is still not fully elucidated (Braybrook et al., 2001; Rack et al., 2014).

Regarding hnRNP R, we observed up-regulation of CARTPT, FOSB, JAG1, DUOXA1 and HMOX1 and down-regulation of KCNAB1, SDCBP2, and EFEMP1. It is interesting to note that CARTPT encodes a prepropeptide acting as a neurotransmitter in association with GABA (g-aminobutyric acid) (Smith et al., 1999) and substance P (Hubert and Kuhar, 2005). Furthermore, the maturation factor of NADPH oxidase Dual oxidase 1 (DUOXA1) was found to promote neuronal-like differentiation of p19 embryonal carcinoma cells following p53 expression (Ostrakhovitch and Semenikhin, 2011). Interestingly, JAG1 and, in a more general view, the Notch signaling pathway are important for the spatial memory and their expression is altered in the hippocampus of people suffering from Alzheimer's disease (Marathe et al., 2017). In addition, HMOX1 expression is also up-regulated in neurons and astrocytes derived from hippocampus, cerebral cortex, and subcortical white matter of Alzheimer's patients (Schipper et al., 1995).

On the other hand, due to the increase importance of the inflammatory response in the pathogenesis of neurodegenerative disease (Wyss-Coray and Mucke, 2002; Block and Hong, 2005; Glass et al., 2010), it was particularly interesting to note that cells depleted by hnRNP Q and hnRNP R showed a prominent disruption of this pathway. In particular, the inflammatory proteins TNF, TNFRSF9, and ICAM1 were upregulated by the silencing of hnRNP Q, likewise to what observed with silencing of TDP-43 and DAZAP1 (Appocher et al., 2017). TNF is a pro-inflammatory cytokine that is expressed in the central nervous system and its soluble form can promote neuronal inflammation, occurring in neurodegenerative conditions such as ALS, multiple sclerosis, Alzheimer's and Parkinson's diseases (McCoy and Tansey, 2008). TNFRSF9, also known as CD137, is a member of the tumor necrosis factor receptor family that was demonstrated to promote the oligodendrocyte apoptosis when bound to its ligand, through the release of reactive oxygen species. Finally, ICAM1 was found to be overexpress in age-dependent neurodegeneration

and localized in amyloid plaques of Alzheimer's patients (Miguel-Hidalgo et al., 2007).

By contrast, in sihnRNP R treated cells, other two immunerelated proteins (namely, ICAM5 and ACP5) were found to be differentially regulated, while ICAM1, TNF, and TNFRSF9 were not significantly altered. In particular, ICAM5 (telencephalin) has been described to mediate the neuroprotective effects by inhibiting the pro-inflammatory cascade of ICAM1 (Tian et al., 2008). Regarding ACP5, the presence of this gene was detected in brain and spinal cord of rat and the connection with inflammation lies in the abnormal macrophage response to bacteria in mice lacking of this enzyme (Bune et al., 2001).

In conclusion, regarding the possible molecular mechanisms by which depletion of hnRNP Q and hnRNP R influence the gene expression profiles of SH-SY5Y, we are tempted to speculated that that hnRNP R and hnRNP Q might regulate the abundance of transcripts by affecting the mRNA stability through their interaction with the 3′UTR. This hypothesis is supported by the known ability to bind the 3′ UTRs of mRNAs. However, other mechanisms (such as alternative splicing associated to NMD or alternative splicing regulation of transcription factors) can be implicated and cannot be excluded at the present stage. Nonetheless, our study sheds light on the distinctive functions of hnRNP Q and hnRNP R in human neuronal cells and, in general, provides insights on the involvement of the hnRNP family in controlling neuronal and inflammatory pathways, strengthening the hypothesis that differential expression of these RBPs could play an essential role in modulating the onset and progression of neurodegenerative disorders.

## AUTHOR CONTRIBUTIONS

SC and MR performed the experiments. EB and MR designed the study. SC, MR, and EB analyzed the results and wrote the manuscript.

## REFERENCES


#### ACKNOWLEDGMENTS

International Scientific Co-operation Agreement between Italy and Israel (SCREENCELLS4ALS, Ministero Affari Esteri, MAE, Italy). Images in this paper were generated in the Optical Microscopy Center of the University of Trieste at the Life Sciences Department, funded as detailed at http://www.units.it/confocal. We thank David Elliott and Katherine James for helpful suggestions on RNA seq analysis.


by modulating Futsch/MAP1B levels and synaptic microtubules organization. PLoS ONE 6:e17808. doi: 10.1371/journal.pone.0017808


and analysis in fluorescence microscopy images. Cytometry 58A, 167–176. doi: 10.1002/cyto.a.20022


with dopaminergic afferents. J. Comp. Neurol. 407, 491–511. doi: 10.1002/ (SICI)1096-9861(19990517)407:4<491::AID-CNE3>3.0.CO;2-0


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Cappelli, Romano and Buratti. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# RNA Biology Provides New Therapeutic Targets for Human Disease

#### Lorna W. Harries\*

RNA-Mediated Mechanisms of Disease, College of Medicine and Health, The Institute of Biomedical and Clinical Science, Medical School, University of Exeter, Exeter, United Kingdom

RNA is the messenger molecule that conveys information from the genome and allows the production of biomolecules required for life in a responsive and regulated way. Most genes are able to produce multiple mRNA products in response to different internal or external environmental signals, in different tissues and organs, and at specific times in development or later life. This fine tuning of gene expression is dependent on the coordinated effects of a large and intricate set of regulatory machinery, which together orchestrate the genomic output at each locus and ensure that each gene is expressed at the right amount, at the right time and in the correct location. This complexity of control, and the requirement for both sequence elements and the entities that bind them, results in multiple points at which errors may occur. Errors of RNA biology are common and found in association with both rare, single gene disorders, but also more common, chronic diseases. Fortunately, complexity also brings opportunity. The existence of many regulatory steps also offers multiple levels of potential therapeutic intervention which can be exploited. In this review, I will outline the specific points at which coding RNAs may be regulated, indicate potential means of intervention at each stage, and outline with examples some of the progress that has been made in this area. Finally, I will outline some of the remaining challenges with the delivery of RNA-based therapeutics but indicate why there are reasons for optimism.

#### Edited by:

Subbaya Subramanian, University of Minnesota Twin Cities, United States

#### Reviewed by:

Tomas J. Ekström, Karolinska Institute (KI), Sweden Jernej Ule, University College London, United Kingdom

#### \*Correspondence:

Lorna W. Harries l.w.harries@exeter.ac.uk

#### Specialty section:

This article was submitted to RNA, a section of the journal Frontiers in Genetics

Received: 18 January 2019 Accepted: 26 February 2019 Published: 08 March 2019

#### Citation:

Harries LW (2019) RNA Biology Provides New Therapeutic Targets for Human Disease. Front. Genet. 10:205. doi: 10.3389/fgene.2019.00205 Keywords: mRNA processing, RNA editing, RNA export, RNA therapeutics, ncRNA, splicing, RNA epitranscriptomics, therapeutics

# INTRODUCTION

The fundamental importance of RNA not only as a messenger molecule, but as a regulator of genes in its own right is increasingly being recognized. The production of mature messenger RNA (mRNA) is dependent on a plethora of processing and regulatory steps involving a complicated repertoire of sequence elements, RNA binding proteins and other regulatory RNA species. Given the complexity of the regulatory machinery, defects in non-coding regions of genes and regulatory genomic regions are common in genetic disease, being present in up to 50% of cases (Yang et al., 2013; Beaulieu et al., 2014) and are also the most common site of genetic variation conferring susceptibility to common, complex disease (Manolio et al., 2008). There is, however, a silver lining. The complexity that causes errors in gene expression or mRNA processing to be such a common occurrence, also provides multiple and differential points of potential therapeutic intervention. Over the past decade, there have been a number of examples, where the specifics of RNA regulatory machinery have been harnessed to produce novel therapeutics that are now in phase III clinical trials [e.g., Patisiran for Familial amyloid polyneuropathy (Rizk and Tuzmen, 2017), Custirsen for

**23**

prostate cancer (Edwards et al., 2017) and AGS-003 for renal cell carcinoma (Figlin, 2015)]. This review aims to explore the potential for intervention in mRNA processing or posttranscriptional regulation with selected examples for future therapeutic benefit.

# THE LIFECYCLE OF A CODING RNA

The processes involved in the production of a mature mRNA, and its subsequent fate are multifaceted and complicated (**Figure 1**). The life of an RNA molecule starts upon transcription, which is controlled by tissue specific promoters and enhancers. The immature primary RNA transcript (heterogeneous nuclear RNA (hnRNA) or pre-mRNA) then undergoes a series of modifications that involve the addition of the 5<sup>0</sup> cap structure, removal of the intronic sequences by constitutive or alternative splicing and 3<sup>0</sup> end processing events that include the addition of the poly-A tail (Chen et al., 2017; Sperling, 2017; Zhang and Tjian, 2018). These processes are not a linear pipeline and occur co-transcriptionally (Beyer and Osheim, 1988; Bentley, 2002). Newly processed RNA may also undergo RNA editing, which is mostly A to G or A to I substitution in humans (Chen, 2013). RNAs may also undergo epitranscriptomic decoration, whereby different RNA modifications such as methylation of adenosine residues (m6A) may be added. Such modifications are added by a series of RNA readers, writers and erasers (Helm and Motorin, 2017). Mature mRNAs are then exported from the nucleus to the cytoplasm. This is an active and regulated process, and one of the primary safeguards against the translation of aberrant mRNAs (Williams et al., 2018). The spatial and temporal expression of newly exported RNAs can also controlled at the level of specific localization within the cell. This can be passive, or an active process involving transport on cytoskeletal tracks (Suter, 2018). Gene expression can also be controlled at the level of translation. This can occur by virtue of selective degradation of specific RNAs by mRNA surveillance pathways such as nonsense-mediated decay, no-go decay and non-stop decay (Harigaya and Parker, 2010; Klauer and van Hoof, 2012; Lejeune, 2017), or it can be by regulation of the rate of translation itself (Gorgoni et al., 2014). The half-life of any given mRNA is then determined by a number of RNA decay pathways, most of which involve successive decapping and deadenylation of RNA molecules, which then renders them susceptible to exonucleases (Wahle and Winkler, 2013; Borbolis and Syntichaki, 2015). Finally, the fate of the RNA may also be influenced by the action of both short and long non-coding RNAs and RNA binding proteins which can result in degradation or translational blocking (Fukao et al., 2015; Iadevaia and Gerber, 2015; Fukao and Fujiwara, 2017).

## POTENTIAL POINTS OF THERAPEUTIC INTERVENTION

Knowledge of the processes by which mature mRNAs are expressed, processed and regulated opens up the possibility of targeting the molecule with specific interventions for future therapeutic benefit.

# Therapeutic Modulation of Transcription

Therapeutic modulation of gene activity can be achieved through several mechanisms which include triplex-forming oligonucleotides (TPOs) synthetic polyamides (SPs) and artificial transcription factors (ATFs) (Uil et al., 2003). These approaches work by altering the expression level of a gene, rather than restoring its sequence per se. TPOs and SPs work by binding the major and minor groove, respectively, of the genomic DNA in specific regions of the gene, with the consequence of modulating gene activity at the level of transcription. This can be achieved by using steric hindrance to block transcription elongation for down-regulation of gene activity or conversely, blocking access to naturally occurring repressor molecules to bring about gene activation. ATFs are custom molecules designed with DNA binding domains specific to the gene in question, coupled to a trans-regulatory domain to produce the desired activity. Although there have been some promising in vitro studies, such as reactivation of the EPB41L3 gene, usually silenced by methylation, to promote tumor suppression in breast, ovarian, and cervical cell lines (Huisman et al., 2015), they have not yet reached prominence in the clinic.

#### Therapeutic Modification of Splicing

RNA splicing is controlled by a complex interplay between ribonucleoprotein complexes and sequence elements in the premRNA. The splicing process consists of two phosphodiester transfer reactions; the first being an interaction between the 5 0 splice site and the branch site, and the second comprising cleavage at the 3<sup>0</sup> splice site, and joining of the released exons. This occurs due to the action of a family of small nuclear ribonucleoproteins (snRNPs) named U1, U2, U4, U5, and U6, which together with a battery of approximately 80 other ancillary proteins form the core spliceosome and orchestrate the splicing process (Will and Luhrmann, 2011). The spliceosome is a dynamic machine that undergoes structural remodeling and conformational change to bring about the excision of introns and the joining of introns (Makarov et al., 2002). This machinery is necessary but sometimes not sufficient for splice site usage to occur; 98% of the genome produces multiple RNA transcripts in a process termed alternative splicing (Pan et al., 2008). The precise nature of transcripts produced under different circumstances is under tight spatial and temporal regulation. This is facilitated by the combinatorial control of a series of splice site activators and inhibitor proteins that together determine whether or not a given splicing event occurs in a given circumstance. Serine Arginine rich proteins (SRSF) splicing factors usually (but not exclusively) promote splice site usage, whereas heterogeneous nuclear ribonucleoproteins (hnRNPs) usually (but not exclusively) promote splice site silencing, as well as having roles in nuclear export and other aspects of RNA metabolism (Smith and Valcarcel, 2000; Cartegni et al., 2002). Splicing defects can arise from single base pair changes to the core and regulatory sequence elements, but can also arise from insertion or deletion events and frameshifts, or from activation of

cryptic splice sites by other sequence changes. Similarly, changes occurring in exon and intron splicing enhancer and silencer elements can elicit dysregulation of splicing patterns of specific genes (Blencowe, 2000). Dysregulation of the splicing regulatory machinery by cellular stress has been reported in more complex phenotypes such as cellular senescence (Holly et al., 2013; Latorre et al., 2017) and altered global alternative splicing profiles are a key characteristics of many complex diseases such as dementia, cancer and type 2 diabetes (Tollervey et al., 2011; Berson et al., 2012; Cnop et al., 2014; Love et al., 2015; Lu et al., 2015). The complexity of splicing regulation offers several points of potential intervention.

#### Moderation of the Core Spliceosome

The global dysregulation of splicing patterns that occur in complex disease may be addressed by targeting the core spliceosome. There are several compounds of bacterial origin that affect the function of the SF3B component of the U2 snRNP, which are showing promise as anti-cancer agents by causing stalling of the cell cycle at the G1/S or G2/M checkpoints (Nakajima et al., 1996). Although these approaches show promise, to date most remain some distance from the clinic.

#### Moderation of Splicing Regulation

It may be possible to globally restore splicing patterns by targeting the splicing regulatory proteins themselves. This could be done at the level of mRNA expression, or at the level of activation or cellular localization. Splicing factor expression has recently been described to be negatively regulated at the mRNA level in senescent primary human dermal fibroblasts by the constitutive activation of the ERK and AKT pathways. Targeted inhibition of either ERK or AKT, as well as gene knock down of their effector genes FOXO1 and ETV6 was associated with restoration of splicing factor expression and rescue from cellular senescence (Latorre et al., 2018). Similarly, splicing factor activity and localization is controlled at the protein level by the action of a series of kinases and phosphatases including SRPK1, SRPK2, CLK1 - CLK4, DYRK1-2, PIM1-2, and PRP4. The action of these regulators ensures the correct localization of splicing factors for action at the correct time and in the correct place. Several small molecule inhibitors of SRPK1 or SRPK2 are in development currently and show promise as anti-cancer agents for prostate malignancy in humans (Mavrou et al., 2015; Bates et al., 2017). Similarly, CLK protein kinase inhibitors have been demonstrated to suppress cell growth in human mammary tumor cell lines (Araki et al., 2015).

#### Moderation of Splice Site Choice

If monogenic disease is due to dysregulated splicing, in some cases it may be possible to correct or reverse the defect by restoration of correct splicing patterns. There are several means of accomplishing this, including antisense oligonucleotides (AONs), or steric hindrance agents such as morpholino

oligonucleotides or similar to occlude specific splicing regulatory sequences. This potential of this approach is best exemplified by novel treatments for spinal muscular atrophy (SMA) and Duchenne Muscular dystrophy (DMD) for which therapies for manipulation of splicing have been developed and are now licensed for clinical use. SMA is characterized by progressive neuromuscular disorder caused by mutations in the Survival Motor Neuron (SMN1) gene (Lefebvre et al., 1995; Lorson et al., 1999). These are often deletion events. The human genome contains a second SMN gene, SMN2, which due to the presence of a single C-to-T transition at codon 280 which disrupts a splicing enhancer site produces an unstable SMN transcript lacking exon 7 (SMN17). This transcript is present at only 10% of SMN1 levels (Lorson et al., 1999) but has potential to compensate for mutation-related reduced activity of SMN1. This has formed the basis for a novel therapeutic strategy whereby an AON (Nusinersen) has been designed to influence splicing patterns of SMN2. Nusinersen targets the N1 (ISS-N1) motif in SMN2, and promotes the inclusion of exon 7 and increases levels of compensatory SMN2. Several clinical trials have now been undertaken (Parente and Corti, 2018) and Nusinersen, also known as Spinraza, has now been approved by both US and EU regulatory authorities for clinical use.

Similar strategies have also been employed for Duchenne Muscular dystrophy, an X-linked neuromuscular disorder that affects 1:5000 newborn boys (Mendell et al., 2012), and is primarily caused by deletions, frameshift or nonsense mutations in the dystrophin (DMD) gene (Monaco et al., 1988). The majority of these mutations yield mRNAs containing premature termination codons, which trigger nonsense-mediated decay and degradation of affected DMD transcripts. Several strategies involving AONs targeted to specific splice sites have now been employed to bring about exon skipping to remove the offending exon(s) and lead to the production of a truncated, but still partially functional DMD protein (Aartsma-Rus, 2010; Niks and Aartsma-Rus, 2017). Similar approaches have been employed to modify the effects of duplication mutations in cell lines (Wein et al., 2017). Most AONs under assessment as DMD therapeutics are chemically modified 2 0 -O-methyl-phosphorothioate oligonucleotides (2OMePS) or phosphorodiamidate morpholino oligomers (PMOs) which can be administered systemically (Goemans et al., 2011). One of these, eteplirsen, a PMO which brings about skipping of exon 51, a hotspot for DMD mutations, has demonstrated promising results in a number of clinical trials and been designated 'reasonably likely to predict a clinical benefit' by the FDA (Goemans et al., 2011). Other approaches have employed 'readthrough' agents such as ataluren that allow bypass of the premature termination codon and are now in Phase III clinical trials (Namgoong and Bertoni, 2016).

# Therapeutic Moderation of Polyadenylation

Polyadenylation is an essential step in mRNA processing, with a pivotal role in maintenance of RNA stability and management of RNA turnover. Many genes contain more than one polyadenylation site and display alternative polyadenylation, producing mRNA transcripts with novel 3<sup>0</sup> untranslated regions. These may be differentially targeted by non-coding RNAs such as miRNAs or RNA binding proteins, or have differential translation efficiency (Elkon et al., 2013). Control of polyadenylation is mediated by a number of sequence elements such as the polyadenylation site itself, but also a series of upstream (U and UGUA rich) and downstream (U and GU rich) elements (Tian and Graber, 2012) that bind the protein complexes that orchestrate the process. These sequence elements bind the polyadenylation machinery that include the cleavage and polyadenylation specificity factors, the cleavage stimulation factors and the polyadenylate polymerase itself (Shi et al., 2009). Differential choice of polyadenylation site is linked to the proliferation and differentiation capacity of the cells; transcripts in highly proliferative cells tend to have shorter 30UTRs (Sandberg et al., 2008). Differential use of polyadenylation sites may also have impacts on mRNA stability, mRNA export and localization, translation rates and protein localization (Tian and Graber, 2012). Patterns of alternative polyadenylation are also regulated by differential binding of RNA binding proteins; CSTF2 and CFIm subunits of the main polyadenylation machinery have been shown to have effects on relative expression of alternatively polyadenylated isoforms (Zheng and Tian, 2014). Other RNA binding proteins such as HNRNPs H and I (Katz et al., 2010), as well as CPEB1 (Bava et al., 2013) have also been associated with alternative isoform choice. RBPs such as these may in the future form the basis of therapies to influence the 3<sup>0</sup> end processing of alternatively polyadenylated transcripts as therapeutic agents.

#### Therapeutic Modification of RNA Editing

RNA editing is a mechanism of generating further transcriptomic diversity and can impact the final sequence or structure of both encoded proteins and non-coding RNAs (ncRNAs) (Ganem and Lamm, 2017; Yablonovitch et al., 2017). RNA editing is an extremely common event, occurring in the many dynamically regulated mRNA transcripts and can comprise a variety of modifications, the most common of which is adenosine to inosine (A to I), which is eventually read as guanosine (Peng et al., 2012). RNA editing is especially prevalent in Small Interspersed Repetitive Elements (SINE) elements such as Alu, and also in transcripts in the brain (Jepson and Reenan, 2008; Osenberg et al., 2010). RNA editing events have been implicated in control of mRNA splicing and miRNA regulation (Farajollahi and Maas, 2010; Nishikura, 2010). RNA editing events are primarily mediated by a family of adenosine deaminases acting on RNA (ADARs), of which there are three major members; ADAR1, ADAR2, and ADAR3. The three ADARs have common functional domains, but differential structural features and some degree of site specificity (Nishikura, 2016). ADAR expression is itself regulated by transcription factors such as CREB and activated by kinases such as JNK1 (Peng et al., 2006; Yang et al., 2012). Dysfunction of ADAR1 is associated with diseases such as Aicardi-Goutières syndrome (Rice et al., 2012), with psychiatric disorders due to attenuated 5-HT2CR levels (Eran et al., 2013), and also with cancer (Ganem et al., 2017), whereas ADAR2 is linked to circadian rhythm and epilepsy (Gallo et al., 2017).

Although less advanced than therapies targeting splicing defects, strategies to target ADARs to influence RNA editing are beginning to be evaluated for future clinical benefit. ADAR1 has been demonstrated to target let7, a miRNA involved in many processes including control of cell cycle (Roush and Slack, 2008). Over-expression of ADAR1 and subsequent downregulation of Let7 has been shown to drive the self-renewal of leukemic stem cells in human blood, an observation that can be reversed by inhibition of ADAR1-mediated RNA editing (Zipeto et al., 2016). Very recently, techniques for directing ADARs to specific points of intervention have been developed. This system, named RESTORE, uses a plasmid-borne guide RNA coupled to an ADAR recruiting domain to deliver ADAR2 directly to the region of interest. This approach has been used to successfully edit phosphotyrosine residues in STAT1 with resultant changes to the activity of this signaling protein (Merkle et al., 2019). There are also newly emerging techniques based on modified Cas technologies utilizing catalytically inactive Cas13-ADAR2 fusion proteins to bring about RNA editing (Cox et al., 2017). These early observations suggest that in the future, targeting ADARs or other regulators of RNA editing may prove promising points of traction for neurodevelopmental disorders and for cancer.

# Modification of RNA Based Epitranscriptomics

Epitranscriptomic modification of DNA is well known, but it is now becoming increasingly evident that RNA is also epigenetically modified. RNA is subject to decoration with over 130 different modifications. Most of these map to very abundant RNAs such as rRNAs and tRNAs, but a subset are seen in mRNA, circRNA and lncRNA (Schaefer et al., 2017). The most common marks are N(6)-methyl-adenosine (m6A), 5-methylcytosine (m5C), 5-hydroxymethylcytosine (hm5C) and N1-methyladenosine (m1A), which have been shown to be widely present throughout the transcriptome by high throughput sequencing (Jung and Goldman, 2018). M6A is enriched in the last exon of genes and also occurs preferentially at 5<sup>0</sup> untranslated regions (UTRs) (Ke et al., 2015; Meyer et al., 2015), whereas m1A is enriched in promoters and 5<sup>0</sup> UTRs (Dominissini et al., 2016). m5C marks are often located at both 5<sup>0</sup> and 3<sup>0</sup> UTRs (Squires et al., 2012). RNA modifications can influence gene expression by a number of mechanisms, including influencing RNA structure, recruiting other regulatory proteins (e.g., splicing factors, RNA binding proteins involved in control of stability) or moderation of translation (Nachtergaele, 2017). RNA epitranscriptomic marks are added and removed by a series of writers (METTL3, METTL14, WTAP, KIAA1429, RBM15/15B, and METTL16) and erasers (FTO and ALKBH5) (Tong et al., 2018). Disruption of m6A disrupts RNA metabolism; m6A depleted transcripts have been reported to be unstable (Tang et al., 2018). Accordingly, mutations in the writer or eraser machinery have been associated with cancers such as hepatocellular carcinoma and acute myeloid leukemia (AML) (Vu et al., 2017; Chen et al., 2018), and with memory, fertility and metabolic phenotypes (Fischer et al., 2009; Zheng et al., 2013; Nainar et al., 2016). The RNA epigenomic writers and erasers are therefore promising future therapeutic targets. At present, the work in this area is mainly in cell and animal models. Silencing the METTL14 'writer' led to restoration of differentiation of myeloid cells in AML and inhibited AML cell survival and proliferation (Weng et al., 2018). Similar strategies targeting ALKBH5 have showed promise as anti-tumor agents in glioblastoma stem cells (Schonberg et al., 2015). Studies have suggested that small molecule inhibitors of FTO may have potential utility as anticonvulsants in mouse models of epilepsy in vivo, by suppression of 2-oxoglutarate (2OG) through altering m6A levels (Zheng et al., 2014).

## Modulation of RNA Export

The activity of genes is also dependent on the correct positioning of mRNAs within the cell. Once processed, RNAs are usually exported through the nuclear membrane into the cytoplasm ready to be translated. This is not a passive process; it is orchestrated by a portfolio of RNA export proteins which escort the RNA molecule through the nuclear pore. Messenger RNAs are primarily transported by Nxf1 and Xpo1, whereas miRNAs are exported by Xpot and Xpo5. The transcription Export complex 1 (Trex1) facilitates binding of Nxf1 to the processed mRNA, and together with a collection of other proteins such as karyopherins or importins causes the processed mRNA to associate with and transit through the nuclear pore (Viphakone et al., 2012). The nuclear pore itself is composed of a collection of nucleoporins, and comprises a multi-subunit structure consisting of a nuclear ring, a central transport channel and a basket-like structure (Kabachinski and Schwartz, 2015). Small molecules can diffuse across this barrier, but larger ones such an mRNA cannot. Some of the specificity of transport is achieved by the interaction of the nuclear transport machinery with specific signal sequences in the mRNA itself (Lee et al., 2006; Hutten and Kehlenbach, 2007), whereas other mRNAs rely upon adaptor proteins (Huang et al., 2017). The expression and localization of nuclear transporters is altered in certain cancers (Zhou et al., 2013; Talati and Sweet, 2018), and have been linked with some neurodegenerative disorders (Grima et al., 2017) and comprises important components of inflammatory and apoptotic response (Aggarwal and Agrawal, 2014; Kopeina et al., 2018). Individual components of the nuclear export machinery are currently under investigation as therapeutics. One of the most promising, Selinexor, targets exportin 1 (Xpo1) and is currently in pre-clinical trials and has shown efficacy against acute myeloid leukemia and multiple myeloma (Kashyap et al., 2016; Mahipal and Malafa, 2016).

# Therapeutic Modulation of Non-coding RNA Regulators of Gene Expression

The repertoire of genes expressed by any given cell in any given circumstances is influenced by non-coding RNA (ncRNA) regulators of gene expression. These ncRNA genes do not encode proteins, but rather encode RNAs that contribute to the regulation of other RNAs. They are classified into 2 broad classes, short ncRNAs such as microRNAs (miRNAs) and longer ncRNAs such as long non-coding RNAS (lncRNAs) and circular RNAs (circRNAs).

#### Modulation of Small Non-coding RNAs

fgene-10-00205 March 7, 2019 Time: 16:48 # 6

MicroRNAs (miRNAs) and siRNAs are short non-coding RNAs 20–25 bp in size. They interact with components of the RNA-induced silencing complex (RISC) to bring about translational blocking or RNA degradation. Each miRNA interacts with specific binding sites in the 3<sup>0</sup> UTR of its target genes, which are 6–8 nt in length and are commonly found in the genome; each miRNA is thus capable of targeting hundreds of mRNA target genes simultaneously (Carthew and Sontheimer, 2009). Several classes of miRNAs have been associated with disease; these include the 17/92 cluster, the miR-24 cluster or miR-3676, all of which are associated with chronic lymphocytic leukemia (CLL) (Van Roosbroeck and Calin, 2016). Other examples include miR-21, miR-10b, miR-155, and Let-7a, which are associated with breast cancer (Khalighfard et al., 2018), and miR-192, miR200c and miR-17 which are associated with colon cancer (Ast et al., 2018). Similarly miR-33a and miR-33 have been associated with metabolic disease and atherosclerosis (Marquart et al., 2010; Rayner et al., 2011) and miR-155 has links with inflammatory diseases (Dorsett et al., 2008). The use of miRNAs as antitumor therapeutics is currently receiving much interest. Specific miRNAs can target the tumor suppressor machinery, and are commonly referred to as onco-miRs, or they may target the controls of cell cycle and act as tumor suppressors in their own right. The small size and relative stability of miRNAs and siRNAs, together with the observation that they are readily taken up in endosomes and microvesicles (Rani et al., 2017) renders them excellent candidates for therapeutic modulation or use as biomarkers of disease. This can take the form of antagomiRs that can target and silence endogenous miRNAs or chemically modified miRNA mimics that can increase regulation of their specific targets (Khvorova and Watts, 2017). To date, 20 clinical trials have been undertaken that exploit miRNA biology (Chakraborty et al., 2017), the first of which miravirsen, which is targeted to miR-122, is in phase II clinical trials for Hepatitis C (Lindow and Kauppinen, 2012). In August 2018, the first siRNA-related therapy, patisiran, was approved by the FDA for the treatment of peripheral nerve disease by targeting an abnormal form of the transthyretin (TTR) gene.

#### Modulation of LncRNAs

Long non-coding RNAs comprise a heterogeneous class of non-coding RNAs, which are longer than 200bp in length. They do not encode proteins, and originate from all most genomic regions. They can originate from the locus that they regulate, usually from the antisense strand, and regulate their target in cis (Natural antisense Transcripts (NATs), or they can map to entirely different genomic regions form their targets (introns, pseudogenes, and non-coding DNA) and cause regulation in trans. LncRNAs can also be associated with promoters, enhancers or other regulatory regions and do not have a homogeneous mode of action. They can activate or repress their targets and can work by a number of mechanisms. They are commonly involved in genomic imprinting; one of the first lncRNAs discovered, XIST, coordinates X chromosome inactivation (Brown et al., 1991). Other lncRNAs can act as guides. This class of lncRNA includes ANRIL, which directs the polycomb repressive complex to the site of action in the case of the CDKN2A and CDKN2B genes (Kotake et al., 2011) and the lncRNA HOTAIR, which has roles in colorectal cancer (Kogo et al., 2011). They can also act as scaffolds, directing the assembly of specific protein or RNA complexes to their sites of action. For example, one function of the lncRNA NEAT1, a multifunctional lncRNA with several roles in tumorigenesis (Ghaforui-Fard and Taheri, 2018) is to bring together the microRNA biogenesis machinery to enhance primiRNA processing (Jiang et al., 2017), and the lncRNA LINP1, which regulates the repair of DNA double strand breaks in breast cancer by acting as a scaffold for the ku80 and DNAdependent protein kinase proteins (Zhang Y. et al., 2016). They can also repress expression by acting as decoys, coregulators and inhibitors of RNA polymerase II. For example, the lncRNA PANDA acts by sequestering its transcription factor target NF-YA away from its site of action (Hung et al., 2011). They have roles in regulators of subcellular compartmentalization; the lncRNA MALAT is responsible for localizing splicing factors to the nuclear splicing speckles where they can be stored and regulated by phosphorylation (Bernard et al., 2010).

In accordance with their pivotal role in regulating gene expression, lncRNAs have been reported to be associated with several diseases such as cancer (Huarte, 2015; Parasramka et al., 2016; Peng et al., 2016), diabetes (Akerman et al., 2017; He et al., 2017; Leti and DiStefano, 2017), neurodegenerative disease (Riva et al., 2016) and cardiovascular disease (Hou et al., 2016; Haemmig et al., 2017; Gangwar et al., 2018). LncRNAs may represent promising therapeutic targets; they are responsive to small molecule therapeutics; a recent study documented 5916 lncRNAs that responded to 1262 small molecule drugs (Yang et al., 2017). Although progress toward the clinic has been slow, perhaps because of the diverse modes of actions of lncRNAs, there are some promising candidates. Several lncRNAs have been reported to be dysregulated in osteoarthritis (OA), including HOTAIR, RP11-445H22.4, GAS5, PMS2L2, H19, and CTD-2574D22.4 (Xing et al., 2014). At the present time, the majority of studies have not progressed beyond cell or animal models, several potential future therapeutic candidates have emerged; the lncRNA PCGEM1 was demonstrated to inhibit synoviocyte apoptosis on OA by moderation of its target miR-770 (Kang et al., 2016). Similarly, many lncRNAs have been identified as potential therapeutic targets in cardiovascular disease or cancer, including GAS5, LIPCAR, SENCR, ANRIL, SMILR, and MALAT (Gomes et al., 2017). ASP and siRNA approaches to therapeutically manipulate MALAT levels are in development in human cancer cells and in animal models (Arun et al., 2016). Targeting lncRNAs is subject to more difficulty than miRNAs, because of their larger size and the heterogeneity of their mode of action, which may explain why their evaluation is not as advanced as that of miRNAs. Nevertheless, they have significant potential as future therapeutic targets.

#### Modulation of CircRNAs

fgene-10-00205 March 7, 2019 Time: 16:48 # 7

Circular RNAs (circRNAs) are a relatively newly discovered class of non-coding RNA regulators found in multiple species (Haque and Harries, 2017). They are formed from 'backsplicing' events of linear genes, and comprise circular molecules, which are therefore relatively immune to exonucleases (Cocquerelle et al., 1993; Schwanhausser et al., 2011; Jeck et al., 2013; Lan et al., 2016; Lasda and Parker, 2016). Like lncRNAs, circRNAs have been reported to influence gene expression by a variety of mechanisms including action as miRNA sponges or mRNA traps, as well as comprising modifiers of transcription. translation, or splicing (Haque and Harries, 2017). Circular RNAs have been suggested to have roles in many cellular processes, including embryonic development (Xia et al., 2016), metabolism (Xu et al., 2015), regulation of cell cycle (Zheng et al., 2016) and regulation of cellular stress (Burd et al., 2010). In accordance with this observation, dysregulated circRNA expression has been associated with multiple human diseases such as cancer (Yao et al., 2017), neurological disease (Khoutorsky et al., 2013), osteoarthritis (Liu et al., 2016), cardiovascular disease (Taibi et al., 2014; Wang et al., 2016), type 2 diabetes (Gu et al., 2017), pre-eclampsia (Zhang Y.G. et al., 2016) and impaired immune responses (Ng et al., 2016). Although the study of circRNAs is in its infancy compared with other ncRNAs, they too have potential as future therapeutic targets.

# REMAINING BARRIERS AND FUTURE PROSPECTS

This is an exciting time for RNA-based therapeutics, with several notable examples making it as far as license for clinical usage. Over the next decade, it is likely that there will be a large expansion in the breadth and scope of human disorders that can be treated using these, and similar approaches. Most developed at the present time, are interventions targeted at specific splice events and those involving small RNAs, but future work may harness the potential of targeting other parts of the RNA regulatory milieu (**Figure 2**).

Several barriers do, however, remain to the wide implementation of these opportunities which are focused mainly on delivery, specificity and duration of treatment. Firstly, delivery of specific molecules to their site of action may be challenging. For some applications, such as skin, which may be treated topically or lung, which may be treated via inhalation,

amount or nature of expressed RNA. Blue lines in the transcript refer to introns and untranslated regions, whilst exons are indicted by red lines. The 5<sup>0</sup> cap is indicated by a blue circle. Small yellow circles indicate epitranscriptomic decoration, whilst pale blue lines within the exons refer to RNA editing events. The nuclear envelope is indicated by a large dashed line. RNA binding proteins modifying stability are given by blue triangles, and miRNAs by green lines. The translating ribosome is indicated by beige circles. Nascent polypeptide is given by green interlocked circles. Each potential point of intervention is given by a red arrow. Degraded RNA is indicated by a gray dashed line.

therapeutic delivery of interventions may be easier. Delivery to internal organs such as brain, liver or pancreas will require different and systemic approaches. One reason why AONs, readthrough agents and small RNAs have been at the forefront of this emerging field is that their small size and relative stability means that they can be more easily introduced into cells. This may not be true of entities such as lncRNAs or large circRNAs, which may be large molecules with potentially challenging secondary or tertiary structure. Small molecules can readily be introduced into cells using lipid-mediated transfer agents, or endogenous structures such as endosomes or microvesicles, which could be harnessed to deliver cargoes. Secondly, there are questions of specificity. One feature of the therapies that are in clinic currently is their specificity to their sites of action. Gene expression and the regulation thereof is highly tissue specific, and genes may often be required to be expressed only at a specified time, or in response to specific circumstances. It may not be advantageous to produce changes in all tissues or at all times, and effects must of course be limited to their intended targets. Specificity of effect can be achieved by choosing targets that are only present at their sites of action, or by modifying delivery so that cargoes are only delivered to their intended place of action. For example, strategies are emerging now which allow selective delivery of senolytic cargoes to senescent cells only using galactosaccharide nanoparticles, which harness the observation that senescent cells harbor large quantities of lysosomal β-galactosidase (Munoz-Espin et al., 2018). Similarly, strategies could be developed that introduce therapeutic oligonucleotides under the control of gene regulatory elements expressed only in the intended target tissues. Lastly, one needs to consider the potential need

#### REFERENCES


for repeated treatments. The approaches discussed here differ from emerging "gene editing" technologies such as CRISPR, in that they are not transmitted to future generations, and may require repeated treatments. This can be considered both a caveat and an advantage. The need for repeated treatments may be burdensome for patients, but in reality, the vast majority of currently available treatments for human disorders fall into this category. Conversely, the need to deliver repeated doses introduces a degree of flexibility, and allows treatments to be quickly discontinued or changed if adverse effects occur. We are at a time of huge advances in our understanding of how our genome is curated and regulated and how our genes are expressed.

The multifactorial control of gene expression, and the complexity of this progress offers multiple points of potential intervention for therapeutic benefit. Over the coming decades, there is likely to be a huge increase in the number of therapies for human diseases that target not the genes themselves, but the expression and regulation of those genes. We are at the dawning of the era of genomic medicine, and the future looks bright.

#### AUTHOR CONTRIBUTIONS

LH planned and wrote the manuscript.

### FUNDING

The Harries lab is funded by the Dunhill Medical Trust (grant number R386/1114).



in the TLR4/LPS pathway. RNA Biol. 13, 861–871. doi: 10.1080/15476286.2016. 1207036



**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Harries. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Aberrant Phase Transitions: Side Effects and Novel Therapeutic Strategies in Human Disease

*Veronica Verdile1,2 , Elisa De Paola1,2 and Maria Paola Paronetto1,2 \**

*1University of Rome "Foro Italico", Rome, Italy, 2Laboratory of Cellular and Molecular Neurobiology, Fondazione Santa Lucia, Rome, Italy*

Phase separation is a physiological process occurring spontaneously when single-phase molecular complexes separate in two phases, a concentrated phase and a more diluted one. Eukaryotic cells employ phase transition strategies to promote the formation of intracellular territories not delimited by membranes with increased local RNA concentration, such as nucleolus, paraspeckles, P granules, Cajal bodies, P-bodies, and stress granules. These organelles contain both proteins and coding and non-coding RNAs and play important roles in different steps of the regulation of gene expression and in cellular signaling. Recently, it has been shown that most human RNA-binding proteins (RBPs) contain at least one low-complexity domain, called prion-like domain (PrLD), because proteins harboring them display aggregation properties like prion proteins. PrLDs support RBP function and contribute to liquid–liquid phase transitions that drive ribonucleoprotein granule assembly, but also render RBPs prone to misfolding by promoting the formation of pathological aggregates that lead to toxicity in specific cell types. Protein–protein and protein-RNA interactions within the separated phase can enhance the transition of RBPs into solid aberrant aggregates, thus causing diseases. In this review, we highlight the role of phase transition in human disease such as amyotrophic lateral sclerosis (ALS), frontotemporal dementia (FTD), and in cancer. Moreover, we discuss novel therapeutic strategies focused to control phase transitions by preventing the conversion into aberrant aggregates. In this regard, the stimulation of chaperone machinery to disassemble membrane-less organelles, the induction of pathways that could inhibit aberrant phase separation, and the development of antisense oligonucleotides (ASOs) to knockdown RNAs could be evaluated as novel therapeutic strategies for the treatment of those human diseases characterized by aberrant phase transition aggregates.

Keywords: RNA-binding proteins, phase separation, RNA therapeutics, neurodegenerative disease, low-complexity domain

# INTRODUCTION

Eukaryotic cells are characterized by morphologically distinct compartments displaying multiple roles in biological processes. The complementary use of light- and electron-microscopic imaging techniques has allowed to shape eukaryotic subdomains highlighting the presence of membrane-less organelles (MLOs), including paraspeckles, nuclear speckles, Cajal bodies,

#### *Edited by:*

*Emanuele Buratti, International Centre for Genetic Engineering and Biotechnology, Italy*

#### *Reviewed by:*

*Maurizio Romano, University of Trieste, Italy Serena Carra, University of Modena and Reggio Emilia, Italy*

*\*Correspondence: Maria Paola Paronetto mariapaola.paronetto@uniroma4.it*

#### *Specialty section:*

*This article was submitted to RNA, a section of the journal Frontiers in Genetics*

*Received: 06 November 2018 Accepted: 18 February 2019 Published: 22 March 2019*

#### *Citation:*

*Verdile V, De Paola E and Paronetto MP (2019) Aberrant Phase Transitions: Side Effects and Novel Therapeutic Strategies in Human Disease. Front. Genet. 10:173. doi: 10.3389/fgene.2019.00173*

**35**

stress granules (SGs), and processing bodies (P-bodies), in addition to the classical membrane-enclosed organelles (such as nuclei, mitochondria, endoplasmic reticulum, and Golgi apparatus) (Matera, 1999). These MLO compartments shape similarly, with analogous build up characteristics, but they differ in composition and sub-cellular localization. Indeed, the MLOs form sub-compartments both in the nucleus and in the cytosol (**Figure 1**), and contain nucleic acids and proteins necessary to accomplish their function, thus providing a spatiotemporal control of biological activities (Shin and Brangwynne, 2017). For this reason, these compartments must remain separated from cytoplasm and nucleus (Banani et al., 2017). The multiple components that concentrate within these subdomains render them a suitable interface for various cellular processes, such as transcription, RNA processing, mRNA transport, RNP assembly, ribosome biogenesis, translational repression, mRNA degradation, and intracellular signaling (Banani et al., 2017).

Several efforts have been devoted to understand the process of MLO formation and how phase separation is involved in promoting their assembly (Hyman et al., 2014). The relevance of these MLOs is demonstrated by the fact that changes in their organization are associated with disease phenotypes (Aguzzi and Altmeyer, 2016). Growing evidences suggest that these organelles are involved in the pathogenesis of neurodegenerative diseases, such as amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD) (Murakami et al., 2015; Rhoads et al., 2018), as well as in cancer (Aguzzi and Altmeyer, 2016; Bouchard et al., 2018).

This review explores individual MLOs, with emphasis on how they contribute to biological functions and how their dysregulation promotes the development of human disease. In this regard, we describe possible therapeutic strategies to monitor the correct formation of these compartments, and therapeutic approaches to selectively destroy aberrant MLOs.

# PHASE SEPARATION: MECHANISM, COMPARTMENTS, AND BIOLOGICAL FUNCTIONS

Phase separation is a physiological process by which macromolecules separate in a dense phase and in a dilute phase within cells, thus allowing the formation of distinct chemical environments (Shin and Brangwynne, 2017). In order to promote interaction between macromolecules, rather than between macromolecules and solvent, and to obtain a chemical equilibrium among the compartments just formed, high energy is required (Johansson et al., 1998), with a correspondent reduction of entropy (**Figure 1A**). The result is a higher concentrated phase, where the proteins and DNA/RNA cluster~10–100-fold more (Li et al., 2012), and a lower concentrated phase (Banani et al., 2017). In the phase separation process, macromolecule solubility is decreased (Alberti, 2017). Liquid–liquid phase separation (LLPS) occurs spontaneously in eukaryotic cells when a critical concentration or temperature threshold is exceeded, thus forming sub-compartments in a reversible fashion (Hyman et al., 2014; Shin and Brangwynne, 2017). The assembled MLOs must remain separated by the surrounding environment, both in the nucleus and in the cytoplasm. These organelles in turn support the transport of molecules in and out themselves, thus allowing chemical reactions inward (Hyman et al., 2014).

To obtain liquid–liquid demixing, several conditions are required, such as electrostatic and hydrophobic interactions, achieved by the presence of low-complexity sequences, and intrinsically disordered protein regions (IDRs) (Nott et al., 2015), that are involved in protein–protein (Lee et al., 2015), and protein-RNA interactions (Allain et al., 2000). RNA can initiate by itself phase separation *via* protein-RNA (Molliex et al., 2015) and RNA–RNA interactions (Jain and Vale, 2017; Van Treeck et al., 2018) and serves as a molecular seed that triggers liquid demixing (Molliex et al., 2015).

Eukaryotic cells harbor MLOs both in the nucleus and in the cytoplasm. The first MLO identified was the **P granule** of *Caenorhabditis elegans* embryos, involved in cytoplasmic polar partitioning (Strome and Wood, 1983). P granules have been defined as a liquid-like compartment. Indeed, they display a spherical morphology due to surface tension. They can fuse together with other P granules and be deformed by flows, then rapidly rearranging (Brangwynne et al., 2009). P granules contain mRNAs and RNA helicases and play a key role in the post-transcriptional regulation of mRNA in the germ cells (Voronina et al., 2011).

In the cytoplasm, eukaryotic cells can form processing bodies (P-bodies) and stress granules (**Figure 1B**). **P-bodies** are cytoplasmic ribonucleoprotein (RNP) granules (Franks and Lykke-Andersen, 2008). Fluorescence microscopy has shown that proteins and RNAs shuttle between cytoplasm and P-bodies and fuse, showing liquid-like properties (Kedersha et al., 2005). These RNP granules play a role in posttranscriptional regulation, by controlling mRNA translation and degradation (Parker and Sheth, 2007). Indeed, their assembly depends on the loading of mRNAs into polysomes. When mRNAs are associated with ribosomes, P-bodies decrease in abundance and size (Sheth and Parker, 2003; Teixeira et al., 2005); whereas when mRNAs dissociate from ribosomes, as a result of translation inhibition, P-bodies increase in dimensions (Teixeira et al., 2005; Koritzinsky et al., 2006). Indeed, mRNA decay can occur also in absence of P-bodies (Eulalio et al., 2007), thus assuming that P-bodies act by segregating mRNAs rather than degrading them (Parker and Sheth, 2007).

In addition to P-bodies, cytoplasm of eukaryotic cells harbors other types of RNA granules, such as **stress granules (SGs)** (Collier and Schlesinger, 1986; Arrigo et al., 1988). Fluorescence microscopy (Jain et al., 2016) and electron-dense regions' micrographs (Souquere et al., 2009) revealed that SGs have a highly concentrated *core* made up by proteins and mRNA, and a surrounding structure which is less dense and more dynamic (Protter and Parker, 2016). Phase separation is extremely sensitive to changes in chemical conditions, thus playing an important role in stress adaptation. In fact, stress is a transient phenomenon and SGs are transient structures rapidly disassembling upon removal of the stress condition. For instance, after removal of the adverse condition, SGs disassemble, protein synthesis is reactivated after the translation inhibition induced by the stress, and RBPs can either go back to the nucleus or remain in the cytoplasm to carry out their functions (Gilks et al., 2004; Panas et al., 2016).

The formation of these bodies occurs when stress conditions block translation initiation, by phosphorylating eIF2α or inactivating eIF4A (Mokas et al., 2009). In general, this block induces sudden increase in non-polysomal mRNAs, masked by translating ribosome, that is free to interact with RBPs, such as TIA-1,TIA-R, and G3BP (Kedersha et al., 1999; Tourrière et al., 2003). These proteins interact with other proteins through specific domains, thus promoting SG formation (Protter and Parker, 2016). Phase transition can be promoted and largely affected by post-translational modifications of SG-associated protein, such as methylation, phosphorylation, and glycosylation, that alter protein–protein interaction (Tourrière et al., 2003; Ohn et al., 2008; Nott et al., 2015).

In general, SGs help cells to respond to adverse conditions, such as oxidative stress, heat shock, and DNA damage (Kedersha and Anderson, 2007; White and Lloyd, 2012). Upon DNA damage, Moutaoufik and colleagues observed that UV-induced SGs were smaller and less numerous than the SGs induced by other stressors, such as arsenite or heat treatment (Moutaoufik et al., 2014). The cellular response to UVC irradiation involves several steps that allow cells to identify the damage and to repair the DNA. These processes include the surveillance of genome integrity, the recognition of damaged DNA, the activation of the DNA repair program, including cell signaling events and cell cycle arrest, thus allowing cells to repair the DNA before resuming proliferation (Polo and Jackson, 2011). Indeed, it has been shown that upon UV-induced DNA damage, SG assembly occurs in a cell cycle-dependent fashion (Pothof et al., 2009), with cyclin A-positive S phase cells and γH2AX-positive cells negative for SG-specific staining (Pothof et al., 2009). These studies demonstrate that SG formation occurs only in a time window of G2-M transition or after exit from mitosis. In fact, as cells prepare for cell division, most MLOs disassemble and then start to reassemble during the late stages of cytokinesis (Andrade et al., 1993; Hernandez-Verdun et al., 2002; Fox et al., 2005; Aizer et al., 2013). In particular, the kinase activity of DYRK3 plays an essential role during mitosis to prevent the formation of aberrant LLPS condensates composed by nuclear and cytoplasmic proteins and RNA, by keeping the condensation threshold of its substrates high (Rai et al., 2018).

During stress, fluctuations in cytosolic pH can promote widespread condensate formation and a core group of nucleating RBPs is sufficient to initiate formation of the stress granule. For instance, in budding yeast, cells can enter into a quiescent state upon removal of nutrients that causes a shift in the cytosolic pH from 7.4 down to ~6.0. This increase in proton concentration triggers a phase transition; as soon as conditions improve, yeast fluidize the cytoplasm by using proton pumps and neutralizing the pH, thus restoring normal conditions (Munder et al., 2016).

Collectively, phase separation offers a suitable architecture to regulate and compartmentalize biochemical processes inside cells.

Several MLOs form within the cell nucleus (**Figure 1B**). A typical example of MLO with liquid-like properties is the **nucleolus** that forms around ribosomal DNA loci in the cell nucleus (Shaw and Jordan, 1995; Brangwynne et al., 2011). Nucleoli are RNA-protein compartments displaying a key role in the ribosome biogenesis (Andersen et al., 2002). Taking advantage of the electron microscopy, it has been shown that this process occurs in three distinct sub-regions of the nucleolus, formed as a result of LLPS (Boisvert et al., 2007). The transcription of rDNA starts in the fibrillar centers sub-region, that is enriched in RNA polymerase I (RNAPI). Then, this process continues in the dense fibrillar components sub-region, where also processing and modification of pre-rRNA transcripts occur. The assembly of the ribosome is accomplished in the granular components, enriched in proteins (Boisvert et al., 2007). This organization into sub-compartments resembling a multi-layer structure has been found also in other liquid-like MLOs in the cell nucleus, including paraspeckles (Feric et al., 2016).

**Paraspeckles** are nuclear bodies involved in the control of gene expression and DNA repair, and characterized by the presence of RBPs, including the *Drosophila* behavior/human splicing (DBHS) family of splicing proteins (the paraspeckle protein 1 PSPC1, RBM14, and NONO), FUS and TDP-43 proteins (Fox et al., 2002). It has been shown that the formation of paraspeckles is driven by RNA, in particular by the long non-coding RNA (lncRNA) *NEAT1* (Souquere et al., 2010). Knockdown of *NEAT1* leads to the disintegration of paraspeckles (Clemson et al., 2009). Indeed, paraspeckles form their *core* around the central part of *NEAT1*, whereas the surrounding structures (the shell and the patch) form around its 5′ and 3′ ends (Souquere et al., 2010)*.* Interestingly, it has been shown that FUS localizes in the *core*, whereas TDP-43 concentrates in the shell (Hennig et al., 2015). This different localization reflects distinct roles displayed by the two RBPs in paraspeckles formation. In *Fus−/−* mice, *Neat1* accumulated at its transcription sites but did not form the *core-*shell structure; moreover, it was found diffused throughout the nucleoplasm (West et al., 2016). Interestingly, in *Fus−/−* mice, the *core* group proteins SFPQ, NONO, and PSPC1 accumulated at the *Neat1* transcription sites, indicating that FUS protein was not essential for their association with *Neat1*. On the contrary, the patch protein BRG1 and RBM14 were not enriched at *Neat1* transcription sites, indicating an essential role for FUS in stabilizing the interaction of these proteins with nascent *Neat1* transcripts (West et al., 2016). On the other hand, TDP-43 downregulation induced the accumulation of *NEAT1* transcripts and the formation of paraspeckles (Shelkovnikova et al., 2018). This accumulation was probably due to the protective role displayed by paraspeckles against defective miRNA pathway, caused by TDP-43 depletion (Shelkovnikova et al., 2018).

Electron microscopy has allowed the identification of other MLOs in the nucleus, such as the **Cajal body** (Gall et al., 1999). Cajal bodies show a coiled structure and form on active snRNA loci (Frey and Matera, 2001). They share the same properties of other MLOs (Handwerger et al., 2003; Kato et al., 2012). Their assembly is initiated by small nuclear RNAs (snRNAs), including small nucleolar RNAs (snoRNAs) and small Cajal body-specific RNAs (scaRNAs), that interact with RBPs, which in turn recruit other proteins (Machyna et al., 2013). An essential aggregation factor of these structures is the protein coilin, which through its multi-modular domains gathers RBPs and RNAs, leading to the formation of Cajal bodies (Machyna et al., 2014). These bodies maintain structural integrity during interphase (Carmo-Fonseca et al., 1993), and are implicated in the small nuclear ribonucleoprotein (snRNP) biogenesis, spliceosome formation, telomere maturation, and maintenance (Machyna et al., 2013).

In addition to MLOs, a phase separation model has been recently proposed to explain basic mechanisms of the transcriptional regulation, such as **super-enhancers** (Hnisz et al., 2017; Sabari et al., 2018). Super-enhancers are clusters of transcriptional enhancers assembled by simultaneous binding of master transcription factors, transcriptional co-activators, RNAPII and RNA, that drive the expression of genes involved in defining cell identity (Whyte et al., 2013). Many molecules bound at enhancer-regions can undergo reversible chemical modifications (e.g., acetylation, phosphorylation, methylation) at multiple sites. Upon such modifications, these molecules change their interactome, thus promoting changes in the overall charge and in the affinities of the interacting molecules, and obtaining a high-density assembly of biomolecules at active sites (Hnisz et al., 2017; Sabari et al., 2018). In this way, super-enhancers are susceptible to perturbation and their activity is fine-tuned by internal and external cues. During the process of tumor pathogenesis, chromosomal translocation or overexpression of oncogenic transcription factors favors the formation of super-enhancer at sites of oncogenes (Hanahan and Weinberg, 2011), thus driving aberrant gene expression programs. Phase separation can also be promoted by proteins involved in the proteasome-degradation pathway (Li et al., 2014). In the case of the tumor suppressor SPOP (speckle-type POZ protein), involved in ubiquitination and proteasomal degradation of substrates (Li et al., 2014), target proteins drive SPOP-mediated separation process, and when cancer-associated mutations of *SPOP* gene occur, substrate binding and phase separation are displaced (Bouchard et al., 2018). SPOP localizes in various nuclear bodies including speckles and DNA-damage *loci* (Nagai et al., 1997; Marzahn et al., 2016). Cancer mutations in SPOP negatively regulate the LLPS process between SPOP and substrates and prevent their ubiquitination, leading to upregulation of these proteins and impaired proteostasis (Bouchard et al., 2018).

Thus, phase transitions in MLOs and at super-enhancers allow the accomplishment of gene expression programs both in healthy and diseased cellular states.

## PHASE SEPARATION PROMOTED BY LOW-COMPLEXITY RNA-BINDING PROTEINS

A well-defined protein structure is essential to accomplish protein functions within the cell. However, many protein portions lack a well-defined structure still remaining functional. These segments are indicated as intrinsically disordered regions (IDRs) and proteins harboring them are named intrinsically disordered proteins (IDPs) (Li et al., 2012). Unlike globular proteins, IDPs use only a subset of the 20 amino acids, with low content of hydrophobic amino acids. To achieve phase separation, the exact amino acid sequence of IDPs is not important, while the overall composition and charge pattern are extremely relevant. Initially considered as passive segments linking structured domains, IDRs actively participate in different cellular functions, and their activity is fine-tuned by post-translational modifications (Iakoucheva et al., 2004; Collins et al., 2008).

In order to obtain the separation from nucleoplasm and cytoplasm, MLOs contain IDPs harboring low-sequence complexity domains (LCDs) (Gilks et al., 2004; Han et al., 2012; Kato et al., 2012; Toretsky and Wright, 2014). These domains are also present in yeast prion proteins (Alberti et al., 2009), from which the term "prion-like" is derived. Prions are infectious protein conformers capable of self-replication (Shorter and Lindquist, 2005). Prion-like domains (PrLDs) are a type of LCD with a tendency to self-assemble and form aggregates. Their ability to form amyloid is dependent on the PrLD rich in glycine and uncharged polar amino acids to reduce the solubility of the proteins (Alberti et al., 2009). Deletion of the prion domain precludes the formation of the prion conformer (Masison et al., 1997), while the addition of this region to a given protein is sufficient to confer prion behavior (Li and Lindquist, 2000). Remarkably, mutations in PrLD-containing proteins cause devastating protein-misfolding diseases, characterized by the formation of solid aggregates (Kim et al., 2013; Li et al., 2013; Ramaswami et al., 2013).

Proteins involved in RNA processing display high phase separation propensities (Vernon et al., 2018). Indeed, as mentioned above, MLOs contain several RBPs harboring PrLDs. Remarkably, of the 240 human proteins harboring predicted PrLDs, 72 (30%) are involved in RNA metabolism (March et al., 2016). These proteins are recently emerging in the pathology and genetics of human neurodegenerative diseases (King et al., 2012).

ATXN1 and ATXN2 were the first RBPs with a putative PrLD to be linked to the pathogenesis of neurodegenerative diseases, causing, respectively, the type 1 and type 2 spinocerebellar ataxia (Banfi et al., 1994; Lorenzetti et al., 1997), a neurodegenerative disorder characterized by an expansion of a trinucleotide CAG repeat within the coding region of the *SCA1* and *SCA2* genes (Banfi et al., 1994; Lorenzetti et al., 1997). In physiological conditions, ATXN1 is located both in the nucleus and in the cytoplasm (Servadio et al., 1995), and is able to shuttle between these two compartments. However, ATXN1 dynamics is altered by the expansion; in fact, while mutated ATXN1 is still able to enter the nucleus, its ability to be transported back into the cytoplasm is dramatically reduced (Irwin et al., 2005). On the contrary, ATXN2 is mainly localized into the cytoplasm, associated to translating polysomes or into stress granules and P-bodies, where it is involved in the regulation of translation, mRNA storage, or degradation (Orr, 2012).

Almost 10 years later, the transactive response (TAR) DNA-binding protein 43 kDa (TDP-43) was associated with a neurodegenerative disease (Arai et al., 2006; Neumann et al., 2006). TDP-43 is a RBP containing a PrLD (amino acids 277–414, **Figure 2**), is localized in the cell nucleus, but shuttles to the cytoplasm displaying roles in transcriptional and post-transcriptional RNA processing (Buratti and Baralle, 2008, 2010). TDP-43 misfolding has been connected to the pathology of ALS and frontotemporal lobar degeneration with ubiquitin-positive inclusions (FTLD-U) (Neumann et al., 2006; Chen-Plotkin et al., 2010; Da Cruz and Cleveland, 2011). In these disorders, TDP-43 displays a cytoplasmic localization and forms aggregates; moreover, it is depleted from the nucleus of diseased neurons (Chen-Plotkin et al., 2010; Da Cruz and Cleveland, 2011).

The PrLD of FUS harbors amino acids 1–238 (**Figure 2**). Like TDP-43, FUS is mainly localized in the nucleus, but shuttles to accomplish functions in transcriptional and posttranscriptional regulation, RNA processing, and miRNA biogenesis (Bertolotti et al., 1996; Zinszner et al., 1997; Paronetto, 2013; Svetoni et al., 2016). Mutations in FUS cause familial ALS (Kwiatkowski et al., 2009; Vance et al., 2009; Da Cruz and Cleveland, 2011) and FTLD-U (Neumann et al., 2009; Mackenzie et al., 2010; Da Cruz and Cleveland, 2011; Drepper and Sendtner, 2011; Svetoni et al., 2016). In these pathologies, FUS is localized in cytoplasmic aggregates of the degenerating neurons (Mackenzie et al., 2010). FUS belongs to the FET family of RBPs, composed by FUS/TLS, EWS, and TAF15, and involved in multiple steps of RNA metabolism (Svetoni et al., 2016). FET proteins share a common domain architecture (Paronetto, 2013). Upon inhibition of transcription (Zinszner et al., 1997) or DNA damage (Paronetto et al., 2011), they translocate into the nucleoli forming dense nuclease-resistant aggregates. In Ewing sarcoma, the replacement of the RNA-binding domains of FET proteins with an ETS transcription factor due to chromosomal translocations alters their nucleic acid-binding affinities and activities, thus causing activation of a transcriptional program leading to cancer transformation (Paronetto, 2013). As mentioned, in neurodegenerative diseases, point mutations in the genes encoding FET proteins affect their localization and aggregation propensity, strongly supporting the hypothesis that phase transition contributes to the development of pathological conditions (Svetoni et al., 2016).

In 2011, mutations in the gene encoding TAF15 have been identified in ALS and FTLD-U patients (Couthouis et al., 2011; Neumann et al., 2011; Ticozzi et al., 2011). Interestingly, both TAF15 and EWS harbor a prominent N-terminal PrLD (amino acids 1–149 in TAF15; amino acids 1–280 in EWS; **Figure 2**) enriched in glutamine residues, which might enhance the formation of toxic oligomeric structures (Halfmann et al., 2011).

hnRNPA1 and hnRNPA2 are prototypical hnRNPs formed by two folded RNA recognition motifs (RRMs) in the N-terminal part of the protein and a PrLD in the C-terminal (amino acids 185–341 in hnRNPA2; amino acids 186–320 in hnRNPA1; **Figure 2**) (Kim et al., 2013), involved in the interaction with TDP-43 (Buratti et al., 2005). Missense mutations in the PrLD of hnRNPA1 and hnRNPA2 have been identified in ALS patients (Kim et al., 2013). HnRNPA2 and hnRNPA1 are prone to fibrillization, which is enhanced by disease-causing mutations (Kim et al., 2013). Notably, hnRNPA2B1 and hnRNPA1 mutations have been identified in families presenting multisystem proteinopathy (MSP) (Benatar et al., 2013; Le Ber et al., 2014), a rare complex phenotype which involves perturbation of SG dynamics and autophagic protein degradation, affecting muscle, brain, and bone (Benatar et al., 2013). MSP phenotype associates different disorders, such as frontotemporal lobar degeneration (FTLD), Paget disease of bone (PDB), inclusion body myopathy (IBM), and ALS (Benatar et al., 2013).

Aromatic residues play important roles in IDR interactions; they mediate short-range, aromatic interactions and promote LLPS, whereas hydrophilic residues control the solubility of IDRs and counteract LLPS (Kato et al., 2012; Xiang et al., 2015). For instance, tyrosine mutations block recruitment of hnRNPA2 and FUS IDRs into phase-separated liquids as well as into RNA granules (Kato et al., 2012; Xiang et al., 2015). The numerous tyrosine residues in FUS contribute to LLPS, and their recruitment into LLPS is controlled by phosphorylation (Lin et al., 2017). In fact, phosphorylation enables rapid transitions within the IDRs and controls the assembly/disassembly of the RNP granules (Lin et al., 2017).

Electrostatically driven phase separation can be also promoted by the interaction of arginine/glycine-rich domains with RNA. The RGG/RG repeats, often present in RBPs, usually occur in LCD. Compared to sequences from ordered proteins, these IDRs typically exhibit high levels of a subset of specific amino acids, that promote phase separation on their own. In particular, the polyvalent interactions between arginines and RNA achieve the phase separation process (Romero et al., 2001). RGG-containing regions mediate RNA binding (Kiledjian and Dreyfuss, 1992) and can be methylated by PRMTs (Bedford and Richard, 2005; Bedford and Clarke, 2009). Methylation is a post-transcriptional modification that negatively influences the capability of RBPs to bind RNA. For instance, methylation of the RGG/RG motif of FMRP reduces its affinity for RNA (Stetler et al., 2006) and its recruitment on polysomes (Blackwell et al., 2010). Moreover, methylarginines within the RG motifs of the RBP Sam68 negatively influence its affinity for the SH3 domains (Bedford et al., 2000). Arginine methylation also decreases the interaction of FUS with transportin, thus affecting its nuclear import (Dammer et al., 2012). To this regard, methylation of the RGG/RG motifs affects the localization of FUS proteins harboring ALS-linked mutations (Tradewell et al., 2012). Notably, mutations in the RGG/RG domains of FUS have been identified in familial cases of ALS (Kwiatkowski et al., 2009; Hoell et al., 2011).

The presence of RGG/RG motifs within a given protein can be regulated by alternative splicing choices (Blackwell and Ceman, 2011). For instance, isoform 12 of *FMR1* excludes exons 12 and 14, thus leading to a truncated FMRP isoform defective of the RGG domain encoded by exon 15. This FMRP isoform displays reduced localization to dendritic RNA granules (Mazroui et al., 2003; Blackwell and Ceman, 2011).

Chromosomal translocations between the *EWSR1* gene and genes encoding ETS transcription factors can cause aggressive pediatric tumors designated as Ewing sarcomas (Araya et al., 2005; Paronetto, 2013). The translocations result in chimeric proteins

**B**


FIGURE 2 | Phase Separation by RBPs. (A) Schematic representation of protein domains of RBPs involved in FTD, ALS, and cancer. ATXN1 and ATXN2 contain two predicted prion-like domains (ATXN1: aa197–256; ATXN2: aa1131–1,223; March et al., 2016). The FET proteins FUS, EWS, and TAF15 combine two types of low-complexity domain (LCD): prion-like domains (PrLDs) and RGG-rich domains. The two types of LCDs cooperate to drive the dynamic phase separation. FET proteins are frequently translocated in human cancers, with the resulting fusion proteins (e.g. EWS-FLI1) lacking either significant parts of the RGG-rich LCD or the RNA recognition motif (RRM) but containing the DNA-binding domain (DBD) of an ETS transcription factor (e.g. FLI1). HNRNPA1 and HNRNPA2 combine two RRM motifs with a PrLD, together with an RG domain. TDP43 contains two RRM motifs and a PrLD. Protein domains are indicated in different colors (see legend). AXH = Ataxin-1 and HMG-box protein domain; Lsm = Like RNA splicing domain Sm1 and Sm2; LsmAD = Like-Sm-associated domain; PAM2 = poly (A)-binding protein interacting motif. The location of each protein domain and the association with human pathologies are indicated in (B). SCA1 = spinocerebellar ataxia type 1; SCA2 = spinocerebellar ataxia type 2; ALS = Amyotrophic Lateral Sclerosis; FTD = Frontotemporal Dementia.

harboring the N-terminal activation domain of EWS, comprehending the PrLD, fused to the DNA-binding domain of an ETS transcription factor, but lacking the C-terminal domain of EWS containing the RNA-binding regions (the RGG motifs and the RRM motif; **Figure 2**). Notably, the RGG/RG motifs of EWS display an inhibitory activity toward the DNA activation domain, thus decreasing its oncogenic potential (Li and Lee, 2000). Interestingly, the translocated PrLD deriving from EWS promotes the aggregation of EWS-FLI1 in foci (Boulay et al., 2017). The presence of aromatic residues affects the aggregation propensity of EWS-FLI1. In fact, replacement of the 37 tyrosine residues with serines removes its ability to form aggregates, and this tyrosine-replaced variant is unable to assemble active enhancers (Boulay et al., 2017). Since RNA helicases are extensively involved in phase separation dynamics (Nott et al., 2015), it is possible that the interaction between EWS-FLI1 and the DNA–RNA helicase DHX9 plays an essential role in the formation of these aggregates; thus, blocking this interaction could be a strategy to limit EWS-FLI1 oncogenic potential (Fidaleo et al., 2016). Since the ability of the EWS-FLI1 PrLD to phase separate is closely linked to its oncogenic activity, preventing or reverting phase separation properties could have therapeutic utility in Ewing sarcoma.

# THERAPEUTIC APPROACHES

Phase separation displays a crucial role in neurodegenerative disorders. Several proteins involved in neurodegenerative diseases are components of MLOs and dysregulation in the formation or conservation of these components leads to pathological aggregates (Li et al., 2013; Ramaswami et al., 2013). Therefore, the development of novel therapeutic strategies to control cellular phase transition could be instrumental for the treatment of those human diseases characterized by aberrant aggregates (**Figure 3**).

In addition to RBPs, several ALS mutations have been identified in genes encoding members of the protein quality control system (PQC), including chaperones, components of the ubiquitin/proteasome, or autophagolysosomal system (Robberecht and Philips, 2013; Capponi et al., 2016; Alberti et al., 2017). These systems display a crucial role in the control of protein aggregation. Chaperones recognize SGs containing misfolded aggregated proteins (Mateju et al., 2017). When a specific chaperone mechanism is compromised, misfolded proteins and defective ribosomal products accumulate into SGs, thus altering SG dynamics and causing defects in SG disassembly (Ganassi et al., 2016). Ganassi and collaborators identified the HSPB8-BAG3-HSP70 chaperone complex as a key regulator of SG surveillance. The incidence of aberrant defective ribosomal products-containing SGs in normal conditions is very low, suggesting that the PQC is highly efficient in preventing aberrant SG formation (Ganassi et al., 2016). On the same line, Mateju and collaborators demonstrated that SGs containing ALS-associated SOD1 aggregates engage increased number of chaperones, including HSP27 and HSP70, suggesting their specific enrollment to avoid aberrant SGs (Mateju et al., 2017). Treatment with a chemical inhibitor of HSP70 increases the number of SGs containing misfolded proteins, suggesting that HSP70 hampers the accumulation of misfolded proteins and facilitates a rapid disassembly of SGs in the recovery phase (Mateju et al., 2017). Thus, surveillance of SGs by chaperones is critical for the maintenance of their normal composition and dynamics, and the stimulation of the chaperone machinery could be a useful target to disassemble MLOs. In this context, the development of potentiated chaperones could also be a suitable approach to optimize therapeutic efficacy against neurodegenerative diseases (Shorter, 2008; Jackrel et al., 2014; Yasuda et al., 2017). To this regard, Jackrel and collaborators were able to potentiate HSP104 variants from yeast (Jackrel et al., 2014; Jackrel and Shorter, 2014). In particular, the developed enhanced chaperone was able to revert TDP-43 and FUS aggregation, thus suppressing their toxicity and eliminating protein aggregates in yeast (Jackrel et al., 2014; Jackrel and Shorter, 2014). Moreover, Yasuda and collaborators demonstrated for the first time that engineered HSP104 variants are able to dissolve cytoplasmic ALS-linked FUS aggregates in mammalian cells (Yasuda et al., 2017).

Another approach to stimulate the chaperone machinery is to develop drugs able to upregulate the expression levels of heat shock proteins (HSPs). Arimoclomol (BRX-345) is a hydroxylamine derivative that facilitates the formation of chaperone molecules by enhancing the expression of heat shock genes (Vigh et al., 1997, Kieran et al., 2004). Arimoclomol treatment in *SOD1G93A* mice was shown to upregulate HSP70 and HSP90 expression, leading to a significant delay in the progression of the disease (Kieran et al., 2004). Remarkably, a phase II/III randomized, double-blind, placebo-controlled clinical trial is currently underway in familial SOD1-ALS patients (NCT00706147). Furthermore, arimoclomol treatment has been shown to induce a reduction in the pathological markers *in vitro* and amelioration of pathological and functional deficits *in vivo* of the sporadic inclusion body myositis (sIBM), a severe myopathy characterized by protein dys-homeostasis (Ahmed et al., 2016).

Since the disruption of the ubiquitin-proteasome-system and autophagy are central events in ALS, current research is now focusing on the development of drugs able to upregulate the signaling pathways involved in PQC (Barmada et al., 2014). For instance, new compounds stimulating autophagy are able to improve TDP-43 clearance and localization, thus mitigating neurodegeneration (Barmada et al., 2014). Stimulation of autophagy also enhances survival of human-induced pluripotent stem cells (iPSC)-derived neurons and astrocytes from patients with familial ALS (Barmada et al., 2014). Furthermore, great effort has recently been devoted to improving HSPB8 function and to promote the autophagy-mediated removal of misfolded mutant SOD1 and TDP-43 fragments from ALS motor neurons. To this regard, colchicine treatment enhances the expression of HSPB8 and of several autophagy players, while blocking TDP-43 accumulation in neurons (Rusmini et al., 2017). Based on these premises, a phase II randomized, double-blind, placebocontrolled, multicenter clinical trial has been activated to test the efficacy of colchicine in ALS (NCT03693781).

Recently, Guo and collaborators showed the relevance of nuclear-import receptors (NIRs) in the disaggregation of diseaselinked RBPs with a nuclear localization signal (NLS) (Guo et al., 2018). The binding of Karyopherin-β2, also named transportin-1, to the PY-NLSs is sufficient to revert FUS, TAF15, EWS, hnRNPA1, and hnRNPA2 fibrillization, while Importin-α, in complex with Karyopherin-β1, reverts TDP-43 fibrillization (Guo et al., 2018). Karyopherin-β2 avoids the aberrant accumulation of RBPs containing PY-NLSs into the SGs and re-establishes proper RBP nuclear localization and function, thus rescuing the degeneration caused by mutated FUS and hnRNPA2 (Guo et al., 2018). Thus, NIRs might contribute to the setup of novel therapeutic strategies to restore RBP homeostasis and moderate neurodegeneration.

Recent evidences highlight the role of specific kinases in the regulation of the dissolution of SGs and other MLOs. For instance, during recovery from stressful conditions, the kinase activity of DYRK3 is required for SG dissolution and restoration of mTORC1 activity (Wippich et al., 2013). DYRK3 binds MLO proteins and phosphorylates their LCDs (Wippich et al., 2013), thus affecting the electrostatic properties of these domains and the condensation threshold of the proteins harboring them (Rai et al., 2018). Importantly, DYRK3 has been shown to act as a dissolvase of liquid-unmixed compartments (Rai et al., 2018). In fact, upon overexpression, recombinant DYRK3 was able to dissolve MLOs both in the nucleus and in the cytoplasm in a kinase-activity-dependent fashion (Rai et al., 2018). Similar to DYRK3, Casein kinase 2 (CK2) was recently found to cause SG disassembly *via* phosphorylation of the SG nucleating protein G3BP1 (Reineke et al., 2017). Thus, identification of kinases, such as DYRK3 and CK2, able to modulate MLO dissolution and/or drugs that regulate their functions may represent another interesting therapeutic approach (**Figure 3**).

Finally, a suitable approach to downregulate key regulators involved in aberrant phase transitions is the use of antisense oligonucleotides (ASOs). ASOs can be used to target pathological proteins in different mouse models (Schoch and Miller, 2017). In case of essential proteins, ASO-strategy could be engineered to target non-essential partners, involved in the regulation of phase transition (Boeynaems et al., 2018). This is the case of TDP-43 and ataxin-2. Ataxin-2 is an RBP with multiple roles in RNA metabolism, such as regulation of SG assembly (Elden et al., 2010; Kaehler et al., 2012). Reduction of ataxin-2 by ASOs affects SG dynamics and decreases recruitment of TDP-43 to SGs (Becker et al., 2017). A single administration of ASOs targeting ataxin-2 into the central nervous system is sufficient to increase the lifespan and improve motor function of TDP-43 transgenic mice (Becker et al., 2017). Since alterations in TDP-43 have been found in 97% of ALS cases and about 50% of FTD cases (Ling et al., 2013), the reduction of ataxin-2 could be used as therapeutic strategy for ALS and FTD treatment (Wils et al., 2010).

To conclude, the stimulation of chaperone machinery, the induction of pathways triggering PQC, and the development of ASOs could be exploited to set up novel therapeutic approaches for the treatment of those human diseases characterized by aberrant phase transition aggregates.

#### CONCLUDING REMARKS

Membrane-less subcellular organization plays a pivotal role in cellular homeostasis. LCDs, including PrLDs and RGG domains, behave as general scaffolds in assisting MLO's activity and mediating the dynamics of RNP granules. Persistence of RNP granules caused by either failure of granule removal, mutated PrLD-containing RBPs, or granule-associated misfolded proteins can lead to pathological protein aggregates, that contribute, at least in part, to the pathogenesis of neurodegenerative diseases. At the same time, chromosomal translocation can promote the formation of aberrant chimeric proteins formed by PrLD fused to transcription factors; these translocated PrLDs promote phase separation and activate transcriptional programs driving transformation (Boulay et al., 2017). On the other hand, disruption of membrane-less organization by mutations in the tumor suppressor SPOP can cause solid tumors (Bouchard et al., 2018).

Given the relevance to human health, determining how LCD proteins organize cellular compartments could be instrumental to expand our understanding of compartment formation, thus

#### REFERENCES


providing significant insight into neurodegenerative pathologies and cancer. Recent studies document how pernicious misfolding can be reversed by protein disaggregases (Shorter, 2008; Torrente and Shorter, 2013; Jackrel and Shorter, 2015), opening the path to novel promising therapeutic applications both in cancer treatment and in the cure of neurodegenerative disease. Yet, the exact MLO dynamics has not been completely unraveled, neither in physiological or pathological conditions.

#### AUTHOR CONTRIBUTIONS

VV, EDP and MPP analyzed the literature and wrote the manuscript.

#### FUNDING

This work was supported by grants from the Associazione Italiana Ricerca sul Cancro (AIRC) [IG17278 to MP], and from Ministry of Health "Ricerca Corrente" and "5x1000 Anno 2017" to Fondazione Santa Lucia.


TDP-43 loss of function in amyotrophic lateral sclerosis. *Mol. Neurodegener.* 13:30. doi: 10.1186/s13024-018-0263-7


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2019 Verdile, De Paola and Paronetto. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# DRUM: Inference of Disease-Associated m6A RNA Methylation Sites From a Multi-Layer Heterogeneous Network

Yujiao Tang1,2, Kunqi Chen1,3, Xiangyu Wu1,3, Zhen Wei 1,3, Song-Yao Zhang<sup>4</sup> , Bowen Song1,3, Shao-Wu Zhang<sup>4</sup> , Yufei Huang5,6 and Jia Meng1,3 \*

*<sup>1</sup> Department of Biological Sciences, Research Center for Precision Medicine, Xi'an Jiaotong-Liverpool University, Suzhou, China, <sup>2</sup> Institute of Integrative Biology, University of Liverpool, Liverpool, United Kingdom, <sup>3</sup> Institute of & Chronic Disease, University of Liverpool, Liverpool, United Kingdom, <sup>4</sup> Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an, China, <sup>5</sup> Department of Epidemiology and Biostatistics, University of Texas Health San Antonio, San Antonio, TX, United States, <sup>6</sup> Department of Electrical and Computer Engineering, University of Texas at San Antonio, San Antonio, TX, United States*

#### Edited by:

*Emanuele Buratti, International Centre for Genetic Engineering and Biotechnology, Italy*

#### Reviewed by:

*Zhixiang Zuo, Sun Yat-sen University, China Jernej Ule, University College London, United Kingdom*

> \*Correspondence: *Jia Meng jia.meng@xjtlu.edu.cn*

#### Specialty section:

*This article was submitted to RNA, a section of the journal Frontiers in Genetics*

Received: *15 November 2018* Accepted: *11 March 2019* Published: *03 April 2019*

#### Citation:

*Tang Y, Chen K, Wu X, Wei Z, Zhang S-Y, Song B, Zhang S-W, Huang Y and Meng J (2019) DRUM: Inference of Disease-Associated m6A RNA Methylation Sites From a Multi-Layer Heterogeneous Network. Front. Genet. 10:266. doi: 10.3389/fgene.2019.00266* Recent studies have revealed that the RNA *N* 6 -methyladenosine (m6A) modification plays a critical role in a variety of biological processes and associated with multiple diseases including cancers. Till this day, transcriptome-wide m6A RNA methylation sites have been identified by high-throughput sequencing technique combined with computational methods, and the information is publicly available in a few bioinformatics databases; however, the association between individual m6A sites and various diseases are still largely unknown. There are yet computational approaches developed for investigating potential association between individual m6A sites and diseases, which represents a major challenge in the epitranscriptome analysis. Thus, to infer the disease-related m6A sites, we implemented a novel multi-layer heterogeneous network-based approach, which incorporates the associations among diseases, genes and m6A RNA methylation sites from gene expression, RNA methylation and disease similarities data with the Random Walk with Restart (RWR) algorithm. To evaluate the performance of the proposed approach, a ten-fold cross validation is performed, in which our approach achieved a reasonable good performance (overall AUC: 0.827, average AUC 0.867), higher than a hypergeometric test-based approach (overall AUC: 0.7333 and average AUC: 0.723) and a random predictor (overall AUC: 0.550 and average AUC: 0.486). Additionally, we show that a number of predicted cancer-associated m6A sites are supported by existing literatures, suggesting that the proposed approach can effectively uncover the underlying epitranscriptome circuits of disease mechanisms. An online database DRUM, which stands for disease-associated ribonucleic acid methylation, was built to support the query of disease-associated RNA m6A methylation sites, and is freely available at: www.xjtlu.edu.cn/biologicalsciences/drum.

Keywords: disease, RWR, random walk with restart, m6A modification, Co-expression, network analysis

# INTRODUCTION

Epigenetic regulation, such as, RNA methylation, DNA methylation and post-translational modification (PTM), participates in a variety of important cellular processes, including embryonic development, maintenance of chromosome stability and X-chromosome inactivation (Wu and Zhang, 2014). Over the past decade, DNA methylation has been considered to play a critical key role in gene expression regulation to moderate various biological functions. It has been found that dysregulated DNA methylation is associated with various diseases. For example, epigenetic defects, like the global genomic hypo-methylation or locus-specific hyper-methylation is one of the cancer hallmarks (Gopalakrishnan et al., 2008). To date, there have been a number of works seeking to unveil the functional relevance of epigenetic modifications to various diseases. DiseaseMeth (Xiong et al., 2016) contains aberrant DNA methylation in 679602 disease-gene association collected from 32701 samples; MethyCancer (He et al., 2007) and MethHC (Huang et al., 2014) supports the query of cancer and disease related DNA methylation profiles. ActiveDriverDB (Huang et al., 2014), CaspNeuroD (Kumar and Cieplak, 2016), dbPTM (Huang et al., 2019) and PTMSNP (Kim Y. et al., 2015) investigated human disease mutations that potentially functional through post-translational modifications. Recently, Xu and Wang investigated the disease-associated phosphorylation sites of protein from a multi-layer heterogeneous network using the random walk algorithm (Xu and Wang, 2016). These studies greatly advanced our understanding of the role epigenetic modifications play in disease pathology. However, the study of biochemical modifications have been dominated by DNA methylation and post-translation protein modifications, until recently, RNA methylation emerged as important layer for gene expression regulation.

Firstly identified more 40 years ago (Wei et al., 1976), more than 100 different types of RNA modifications have also been discovered in cell as epigenetic mark recognized by other regulators for modulating the genetic information (Cantara et al., 2011; Boccaletto et al., 2017), among which, N 6 methyladenosine is the most abundant in mRNA (Fu et al., 2014; Meyer and Jaffrey, 2014). A series of studies reveal that, RNA methylation plays a crucial role in the regulation of circadian clock (Fustin et al., 2013), RNA stability (Wang et al., 2014), cell differentiation (Geula et al., 2015), translation efficiency (Wang et al., 2015), as well as DNA damage response (Xiang et al., 2017) and cortical neurogenesis (Yoon et al., 2017). It has been shown that RNA methylation may be central in disease pathology especially in various cancers, including breast cancer (Cai et al., 2018), myeloid leukemia (Barbieri et al., 2017; Kwok et al., 2017; Li Z. et al., 2017; Vu et al., 2017), liver cancer (Chen M. et al., 2017), carcinoma (Li et al., 2017a), glioma (Visvanathan et al., 2017; Zhang et al., 2017), etc. (Hsu et al., 2017; Stojkovic and Fujimori, ´ 2017; Wang S. et al., 2017). Recent studies revealed the impacts of m6A modification on specific diseases. E.g., N6-methyladenosine (m6A) modification of mRNA plays a role in regulating the self-renewal and tumorigenesis of glioblastoma stem cell (GSC). Studies report the knockdown of RNA methyltransferase complex METTL3 or METTL14 can dramatically decrease abundance of m6A methylation and alter mRNA expression of genes (e.g., ADAM19, EPHA3, KLF4), thereby promoting human GSC growth (Cui et al., 2017). Meanwhile, the up-regulation of RNA m6A demethylase ALKBH5 can also induce the proliferation of GSCs (Zhang et al., 2017). It is found that FOXM1, the cell cycle regulator, is the downstream target of m6A modification through inhibition of ALKBH5 by shRNA. Importantly, the hypo-methylation of target mRNA promotes the binding of RNA binding protein HuR, resulting in increased FOXM1 expression and the development of glioma (Zhang et al., 2017). Additionally, the RNA m6A demethylase FTO is found to be an oncogene of the Acute Myeloid Leukemia (Li Z. et al., 2017). studies show that reduced m6A levels in some mRNA transcripts, such as ASB2 and RARA, can enhance leukemic oncogenemediated cell transformation, leukemogenesis, and inhibit AML cell differentiation (Li Z. et al., 2017). Furthermore, Zhang et al. found that the breast cancer cells stimulated by hypoxia can cause upregulation of m6A demethylase ALKBH5 expression, which is mediated by hypoxic induction factor (HIF). Consequently, it results in the demethylation of the multipotent factor NANOG's mRNA, and hypomethylation increases the stability of mRNA so as to causes high expression of NANOG, further inducing the maintenance and metastasis of tumor stem cells (Zhang et al., 2016a).

Despite the growing interests in m6A RNA modification and its potential regulatory role in various diseases, the study of m6A methylation under the context of diseases has been restricted. The experimental approaches are mostly limited to the study of m6A mediator genes, i.e., the RNA methyltransferase (writer), demethylase (eraser) and RNA binding protein (reader). For instance, the RNA m6A demethylase FTO is also found to play an important role in neurogenesis, as well as in learning and memory. Hence, m6A modification is regarded to be related to Alzheimer's disease (Li L. et al., 2017). And, another study reports RNA m6A demethylase ALKBH5 can relate to the major depressive disorder in Chinese Han population (Du et al., 2015). These studies are often less detailed in genomic resolution and could not unveil the disease relevance of a specific RNA methylation site. Comparing with the research dedicated to the experimental investigation of m6A site regulatory functions, bioinformatics is a possible method to identify the putative disease association of the m6A sites, thereby urgently needed at present. Till this day, the computational approaches for studying the association between m6A methylation and diseases have been limited to the disease-associated mutations that may potentially disrupt or form an m6A-containing motif, which may be regulated through epitranscriptome layer. Works of this category include m6AVar (Zheng et al., 2018), which contains a number of functional variants involved in m6A modification, and m6ASNP (Jiang et al., 2018; Mo et al., 2018; Zhang et al.,

**Abbreviations:** m6A, N 6 -methyladenosine; MeRIP-Seq: methylated RNA immunoprecipitation sequencing; IP, immunoprecipitation; DRUM, diseaseassociated ribonucleic acid methylation ; ROC, receiver operating characteristics; AUC, area under the ROC curve; Pcc, Pearson correlation coefficient.

2019), which is a tool for annotating genetic variants from the perspective of impact on m6A modification. Although generated fruitful results (Mo et al., 2018,a,b), SNP-based approaches are limited to existing GWAS analysis results and cannot predict previously unknown novel associations between m6A sites and diseases. Other disease association study of the epitranscriptome focuses on a specific mediator gene of the epitranscriptome, which could cover the disease association of the epitranscriptome for only a limited number of diseases (Zhang et al., 2016b, 2019), but not yet an arbitrary disease.

The accumulation of epitranscriptome high-throughput sequencing data has provided numerous possibilities for epitranscriptome analysis. Nowadays, the most widely used approach for profiling transcriptome-wide RNA methylation is methylated RNA immunoprecipitation sequencing (m6A-seq or MeRIP-seq) (Wan et al., 2015), and the technique has been used in various studies to profile the condition-specific RNA methylation (Liu H. et al., 2018; Xuan et al., 2018). The m6A RNA methylation sites has been more accurately identified in human, mouse and other species with the machine learning approaches. It is possible and solely needed to develop computational approaches for understanding the disease relevance of individual RNA methylation sites by taking advantage of the large amount of epitranscriptome data accumulated from existing studies (Chen X. et al., 2017; Chen et al., 2019). Random walk on a multilayer network has been used previously to uncover the important role of RNA molecules under a pathologic context, including disease-related long non-coding RNAs (lncRNA) (Zhou et al., 2015) and miRNAs (Mendell and Olson, 2012). In the field of epitranscriptome analysis, random walk with start (RWR) algorithm has been implemented to study the functional proteinprotein network driven by RNA methylation enzymes through the regulation of epitranscriptome layer (Zhang et al., 2016b).

In this work, we for the first time extracted diseaseassociated m6A sites through a multi-layer heterogeneous network using random walk with restart (RWR) algorithm, and provided with a more specific regulatory circuit that functions at epitranscriptome layer. Specifically, a novel multi-layer heterogeneous network was constructed from gene expression and RNA methylation data. The nodes of the network are corresponding to the diseases, the genes and the m6A RNA methylation sites. The network contains both cross-layer associations, such as gene-m6A site association, disease-gene association, as well as the with-layer associations, i.e., gene-gene association, m6A site-m6A site association and disease-disease association. Depending on the known gene-disease network and gene-m6A site network that link the m6A site and disease layers together, the potential relationships of the m6A sites and diseases are both implicated (Tong et al., 2008). The withinlayer association networks (e.g., disease-disease association) can further enhance the confidence of interactions.

To evaluate the performance of the proposed approach, a 10-fold cross-validation was implemented. Our RWR-based predictor achieved a reliable prediction performance and the area under the receiver operating characteristic curve (AUC) is equal to 0.83, compared with an alternative hypergeometric test-based approach (AUC: 0.73) and a random predictor (AUC: 0.48). A website DRUM, which stands for **d**isease-related **r**ibon**u**cleic acid **m**ethylation, is built to support the query of the RNA methylation sites most probable related to 705 diseases. The DRUM website is freely available at: www.xjtlu.edu.cn/ biologicalsciences/drum.

#### MATERIALS AND METHODS

To infer disease-associated RNA methylation site, a multilayer heterogeneous network was constructed, which consists of three types of nodes, i.e., the diseases, genes and m6A sites, and five types of associations, i.e., gene-gene association, gene-disease association, gene-m6A site association, diseasedisease association, and m6A site- m6A site association (see **Figure 1**). The network was constructed by integrating the RNA methylation profiles, the RNA expression profiles and genedisease associations, which will be detailed in the next.

# RNA Methylation Data

The locus information of 477,452 m6A RNA methylation sites in human was extracted from RMBase V2 (Xuan et al., 2018), which collected the m6A RNA methylation sites reported by multiple techniques including m6A-seq, miCLIP, m6A-CLIP, and PA-m6A-seq (Li et al., 2017b). In the site filtering stage, 182,358 sites, which are supported by more than 10 experiments, are kept. To further select the most robust m6A methylation signal, we selected the methylation sites with average methylation level within the 70 percentile. Additionally, the m6A sites with the variance of methylation level ranked in the top 80 percentiles were retained, which represent the most actively regulated set of m6A sites, whose functional relevance may be more reliably inferred. In the end, 28278 RNA methylation sites were retained for further analysis.

Although there exists base-resolution m6A profiling techniques, technique either cannot be used for methylation level quantification (e.g., miCLIP and m6A-CLIP), or the limited number of available samples is insufficient to infer reliably the associations (e.g., PA-m6A-seq). Instead of using data generated from base-resolution techniques, the RNA methylation levels of each m6A sites were estimated from MeRIP-seq data, which profiled the m6A epitranscriptome under 38 different experimental conditions (see **Table 1**). The raw data was downloaded from GEO and aligned to human reference genome hg19 with HISAT2 (Kim D. et al., 2015). The reads associated with each RNA methylation sites were counted under R enrironment, and the methylation status were quantified using the M-value, which is essentially the log2 fold change of reads in the IP sample compared to the input control sample of MeRIP-seq data, as is shown in (1):

$$\text{M-value} = \log\_2\left(\frac{\text{RPKM}\_{IP} + 0.1}{\text{RPKM}\_{Input} + 0.1}\right) \tag{1}$$

where, RPKMIP and RPKMInput represent the reads abundance of a specific m6A site (101 bp flanked region) in the IP and Input control sample of MeRIP-seq data, respectively. The reads abundance was measured in terms of the Reads Per Kilobase of

transcript per Million mapped reads (RPKM). When multiple biological replicates from the same experimental conditions were available, they were merged during the data processing stage. Quantile normalization was then performed to remove potential batch effect.

# Gene Expression Data

The gene expression profiles under the same 38 experimental conditions, (matched with the RNA methylation data) were extracted from the input control samples of the MeRIP-seq data, which measures the expression level of genes. Similar to the processing of RNA methylation data, the gene expression levels were measured in RPKM, multiple biological replicates were merged, and the quantile normalization was performed to reduce batch effect.

# Disease-Gene Association

The human gene-disease associations used in our analysis were directly collected from OUGene, which collects the over- and under-expressed genes under a specific disease condition (Pan and Shen, 2016). A total of 41,269 associations between 705 human diseases and 1080 genes from OUGene were integrated into our multi-layer heterogeneous network.

# Disease-Disease Similarities

Since similar diseases are often associated with similar gene sets, the association between diseases was also considered (Xu and Wang, 2016). The disease-disease similarity network was constructed based on MeSH (medical subject headings vocabulary) terms (Lowe and Barnett, 1994), and the diseases share significant number of MeSH terms are considered more associated. Specifically, the similarity of two diseases Vij is denoted by the number of shared MeSH terms panelized by the total number of terms in their disease titles, as shown in the following

$$V\_{i\bar{j}} = \frac{\left| d\_{\bar{i}} \cap d\_{\bar{j}} \right|}{\left| d\_{\bar{i}} \cup d\_{\bar{j}} \right|},\tag{2}$$

where, d<sup>i</sup> and d<sup>j</sup> strand for all the MeSH terms of the disease i and j , respectively. And |∗| denotes the total number of terms. Please note that the OUGENE database does not contain the MeSH terms information. The MeSH terms associated with various diseases was extracted from the semantically integrated database of disease SIDD (Liang et al., 2013). No additional cutoff threshold was further applied. All the pair-wise associations between diseases were kept for the analysis.

# Association Between m6A Sites

The association between m6A RNA methylation sites was inferred from RNA methylation profiles. We speculate that the functions of two m6A sites are related if their methylation profiles are highly correlated across different experimental conditions.


*The MeRIP-seq data used in the analysis profiled the epitranscriptome under 38 different experimental conditions.* \**SYSY and NEB are anti-m*6*A antibodies made by two different companies.*

Fisher's asymptotic test was implemented to calculate the Pearson correlation coefficient (Pcc) P-Values for each m6A site pairs, and then Bonferroni multiple test correction was used for adjusting the P-Values. Only the m6A site pairs with the adjusted P < 0.05 cut-off and the homologous Pcc value ranked in the top or bottom 10 percentile were considered as associated in our network (Liao et al., 2011). Positive and negative correlations were not distinguished in the association network, which is because that the regulatory impact of m6A RNA methylation is complex. It may both enhances or decreases transcriptional expression level for different genes, making it difficult to distinguish the functional consequences of positive or negative correlation at epitranscriptome layer.

# Gene-Gene Association

We constructed the gene-gene association networks from RNA expression data. The genes that exhibit strong positive or negative correlation are considered functionally related in our multi-layer heterogeneous network. And it followed the same procedure of building the associations between m6A RNA methylation sites.

# Association Between m6A Sites and Genes

Similar to gene-association or m6A site-m6A site association, the association between m6A sites and genes was constructed from the correlation of their expression and methylation levels. If the methylation level of an m6A site and the expression level of a gene are highly correlated across different experimental conditions, we assume that the two are functionally related. The construction of gene-m6A site network follows the same procedure of m6A site-m6A site network.

## The Multi-Layered Heterogeneous Network

As shown in **Figure 1**, the multi-layer heterogeneous network incorporates three types of nodes and five types of associations, from which, it is possible to infer disease-associated m6A RNA methylation sites. We use D{d1, d2, · · · , dN},S{s1,s2, · · ·sM} and G{g1, g2, · · · , gT} successively to represent three types of nodes within network: the diseases, the m6A sites and the genes. And N , M and T denote the total number of diseases, m6A sites and genes, respectively. The associations within the disease, the gene and the site layer can then be represented by DD{dij : i, j = 1, 2, · · · , N}, GG{gi,<sup>j</sup> : i, j = 1, 2, · · · , T}and SS{sij : i, j = 1, 2, · · · , M}, respectively. While the other two types of connection between different types of nodes are represented by DG{dg : i = 1, 2, · · · , N;j = 1, 2, · · · , T} and SG{sgij : i = 1, 2, · · · , M;j = 1, 2, · · · , T}. Please note that the missing information of m6A site-disease association is substituted by DS{dsij : i = 1, 2, · · · , M;j = 1, 2, · · · , N}, which is a null network and used to complement the integrity of the adjacency matrix of the multi-layer heterogeneous network.

# Construct the Adjacency Matrix of the Overall Network

In RWR algorithm, the multi-layer heterogeneous network is represented by the W matrix. It is a column-normalized adjacency matrix and comprises of nine sub matrixes, which respectively reflects diverse relationships among the nodes (i.e., disease, gene, and m6A site). Among them, MDS, MSG,and MDG strands for the probabilities of nodes transmitting between different type of nodes, and their transpose matrixes are denoted by MSD,MGS, and MGD, respectively. While MDD, MSS and MGG represent the transition probabilities among the same type of nodes. MGS,MGD,MDD, MSS, and MGG were estimated previously; while MSD is set to be **0**, as it is unknown. Due to the different weights used in various types of networks, the adjacency matrix were further normalized with

$$W = \begin{bmatrix} \frac{1}{2} \times M\_{DD} & \frac{1}{3} \times M\_{GD} & \mathbf{0} \\ \frac{1}{2} \times M\_{DG} & \frac{1}{3} \times M\_{GG} & \frac{1}{2} \times M\_{SG} \\ \mathbf{0} & \frac{1}{3} \times M\_{GS} & \frac{1}{2} \times M\_{SS} \end{bmatrix} \tag{3}$$

where, all the 5 sub networks were assigned with the equal weight, despite that their relative importance may be further optimized (Xu and Wang, 2016).

#### Random Walk With Restart (RWR) Algorithm

Random walk with start (RWR) algorithm, as an iterative network propagation method, was used for inference of disease-associated RNA methylation site on our multi-layer heterogeneous network. RWR algorithm is defined that a random walker starts from a specific node and iteratively transmits to its neighbor nodes. The pump flow of random workers is proportional to the weights of edge, and it is synchronously recycled to the initial position with the certain proportion. Compared to the conventional random walk approach, RWR algorithm allows the return of the random walkers, so that it can avoid all random walkers assembling at a single node location. When applied to multi-layer heterogeneous networks, another notable strength of RWR is that it does not restrict movement of the random walker among nodes of the same type, and allows walking among all the three layers of the network via the five types of edges. In the end, when the terminated condition is satisfied, all the reachable positions can obtain a steady-state probability, and the nodes are ranked according to the proportion that random walker reaches. Here, we assume the P<sup>s</sup> is the stopping probability of random walker at each position after the s-th iteration, which can be calculated as following:

$$P\_{\mathbb{S}+1} = (1-r) \times W \times P\_{\mathbb{S}} + r \times P\_0 \tag{4}$$

$$P\_{\mathbb{S}+1} - P\_{\mathbb{S}} \le 10^{-10} \tag{5}$$

where, r is the restart probability, indicating the proportion of random walkers being recycled at step, and is set to 0.75 arbitrarily. And P<sup>0</sup> refers to the initial probability vector of seed node and W is a matrix that consists of transition probabilities of movement through different types nodes (discussed in the next). Here, the stopping criterion for iteration is the difference of probabilities between the (S + 1)-th iteration and its prior iteration falls below a predefined threshold 10−10. We can have the disease node d<sup>i</sup> as the seed node with initial probability 1, while the remaining disease nodes are assigned with an initial probability of 0. With the implementation of RWR algorithm, we can rank the disease-associated m6A sites according to the

stable probability that the random walker d<sup>i</sup> reaches each m6A site node.

The overall RWR algorithm is summarized in the following (**Figure 2**).

## Evaluate the Statistical Significance of Prediction by Random Permutation

In general, of interests are the nodes with highest probabilities in RWR result, as they are regarded as highly accessible from the initial node, and thus denotes the association. To evaluate the statistical significance of the prediction results, a randomization-based estimation (Jia and Zhao, 2014) is implemented. Specifically, we generated 100 random networks by building random edges within the multi-layer heterogeneous network but still maintaining its original topology characteristics (Liao et al., 2011). This randomization chose two arbitrary edges (e.g., a-b and c-d) and exchanged them (e.g., with a-d and c-b), if the new links generated not already exist in the network after the node exchange. Then, for each of 100 random networks, RWR algorithm is applied and ranks all the m6A sites according to the probabilities of association to the disease. These probabilities represent the observed probabilities of a negative association between a disease and an m6A site, with which the statistical significance of a prediction from the real network can be assessed (Jia and Zhao, 2014).

# DETERMINE THE DIRECTION OF THE PREDICTED ASSOCIATION

Given an m6A site is predicted to be associated with a disease, we would like to know whether we should expect a hyper or hypomethylation of this site under disease condition. Conceivably, if the methylation level of this site is positively correlated to the genes that are overexpressed under disease condition, or anti-correlated to genes that are under expressed under disease condition, the site is likely to be hyper-methylated under disease condition; and vice versa. The median of the correlations of this site to all the disease-associated genes was used to infer the direction of the association, and has been provided at our website.

# An Alternative Approach for Performance Comparison

To evaluate the performance of this approach, we also considered a naïve hypergeometric test-based approach, which assesses the association between a disease and an m6A sites by checking whether they are simultaneously linked to a significant number of genes in the constructed multi-layer heterogeneous network (see **Figure 3**). The statistical significance (P-Value) of the association can be assessed with a hypergeometric test, with

$$p\left(Y \ge y\right) = 1 - \sum\_{i=0}^{\mathcal{I}-1} \frac{C\_{m-x}^{n-i}}{C\_m^n} \tag{6}$$

where, m denotes the total number of genes in the analysis. n denotes the number of genes linked to a specific disease in the gene-disease association sub network, x denotes the number of genes linked to a specific m6A site in the gene-m6A site sub network; and y denotes the number of genes associated with both the disease and the m6A site. With the P-Values, it is then possible to predict the disease-associated RNA methylation sites given a specific significance level. Please note that the above alternative approach takes advantage of only two out of the five types of associations: the gene-m6A site associations and disease-gene associations.

#### RESULT

# Constructed Multi-Layer Heterogeneous Network

Utilizing the aforementioned approaches, a multi-layer heterogeneous network was constructed to incorporate three types of nodes (m6A site, gene, and disease) and five types of associations. The numbers of nodes and edges in each layer of the network were summarized in **Table 2**.


#### Performance Evaluation

We employed the 10-fold cross-validation to evaluate the performance of the proposed RWR algorithm. During each iteration, 10% of disease-gene associations were deleted from the original multi-layer heterogeneous network and reserved as the testing data, while the remaining 90% of associations were used as training dataset.

The proposed approach was also compared to a random predictor, which is constructed by random permutation of the multi-layer heterogeneous network, and an alternative hypergeometric test-based approach.

To compare the performances of the different methods, the receiver operating characteristics (ROC) curve was implemented to illustrate the true positive rate (TPR) vs. the false positive rate (FPR) at different stringency cut-offs, and the performance of different methods can be measured by the area under the ROC curve (AUC).

As is shown in **Figure 4**, the RWR method achieved an AUC of 0.827, outperformed the hypergeometric test-based approach (AUC: 0.733) and the random predictor (AUC: 0.550), which is close to the theoretical random performance (**Figure 4A**). Additionally, we also calculated the AUCs of each individual disease. As is shown in **Figure 4B**, RWR algorithm achieved superior performance on most of the diseases (average/median AUC: 0.867/0.913), compared to the other two methods: Hypergeometric test-based approach (average/median AUC: 0.723/0.772) and random predictor (average/median AUC: 0.486/0.479). This suggested that the multi-layer network model coupled with RWR algorithm could effectively predict the disease-m6A site associations, or potentially unveil the disease circuits regulated at epitranscriptome layer.

The prediction results are relatively reliable on the following diseases (**Table 3**), and they may be more relevant to epitranscriptome regulation.

random predictor (AUC: 0.550); (B) RWR algorithm achieved superior performance on most of the diseases (average and median AUC: 0.867 and 0.913), compared to the other two methods: Hypergeometric test-based approach (average and median AUC: 0.723 and 0.772) and random predictor (average and median AUC: 0.486 and 0.479).

# Case of Study: Cancer-Related m6A Sites

We further examined the prediction performance of several common diseases. For top 100 predictions, the proposed approach achieved reasonable performance in all the 5 diseases tested (**Table 4**). As is shown in **Figure 5**, the cancer-related m6A site prediction achieved relatively steady performance. Indeed, recent studies suggest that m6A RNA methylation plays a crucial role in the pathologies of breast cancer, myeloid leukemia, liver


TABLE 4 | Number of hits for top 50 predictions of a disease.


*The p-values are calculated from binomial test.*

FIGURE 5 | Prediction accuracy of five common m6A site-associated diseases. Figure shows the accuracy of disease associated m6A sites for five common diseases, including cancer (AUC: 0.832), diabetes (AUC: 0.717), hypertension (AUC: 0.812), obesity (AUC: 0.828) and tumors (AUC: 0.825), respectively. Among them, the prediction of cancer-related m6A sites achieved relatively stable performance.

cancer, carcinoma, glioma, etc. (Hsu et al., 2017; Stojkovic and ´ Fujimori, 2017; Wang S. et al., 2017). Additionally, the model works better on cancer may partially due to the samples used are mostly related to cancer and tumor (see **Table 1**). As cancer samples were used, cancer-specific functions are more easily inferred from the data available. However, the samples were collected unbiasedly from all the published studies. The collection only reflects that most existing m6A-seq studies are either based on cancer cell lines or related to cancer. It suggests that inferring cancer-associated m6A sites may be more feasible than other diseases with the data cumulated from existing studies. We thus used cancer-related m6A sites in the next for a case study by checking whether our predictions are supported by existing literatures. Interestingly, many of our predicted associations are supported (see **Table 5**).

Additionally, there are cases when dysregulated RNA methylation status is observed but does not lead to RNA level

TABLE 5 | Cancer-associated m6A sites supported by literature.


#### TABLE 6 | Epitranscriptome layer association with diseases.


*The RNA transcripts of these genes are differentially methylated under the disease condition, but are not differentially expressed (at transcriptional level) according to the OUGene database (Pan and Shen, 2016). Their associations to tumors and cancers were correctly predicted by the proposed approach.*

differential expression. Such associations may still be predicted by the proposed approach. DRUM works directly with RNA methylation data, and can thus detect associations that are observable at epitranscriptome layer only (see **Table 6**).

To gain more insights, the m6A-seq data from Non-Small Cell Lung Cancer (NSCLC) cell line (A549) and the normal control cell line (H1299) were obtained (Lin et al., 2016). Differential RNA methylation analysis and differential expression analysis were performed using exomePeak R/Bioconductor package and the Cuffdiff software, respectively, with their default settings. The results are then compared to the predictions from the proposed approach. In the end, 9 sites predicted to be associated with NSCLC were validated (Please see **Supplementary Materials** for more details), including one site located on the gene ING5) that shows no differential expression (log2 fold change = 0.07, FDR=0.999) but significant differential methylation (log2 fold change = 0.762 and FDR = 0.027) (see **Figure S3**).

Altogether, our case studies indicate that the proposed method is effective in uncovering putative disease-m6A site associations, especially cancer-related m6A sites. The approach we developed may be useful to unveil the molecular pathologies regulated at epitranscriptome layer and provide potentially new perspective for effective therapeutic strategies of cancer and other diseases.

#### DRUM: Database for Disease-Associated RNA Methylation

To facilitate the exploration and direct query of our predicted results by the research community of RNA epigenetics, we developed an online database DRUM, which stands for **d**iseaseassociated **r**ibon**u**cleic acid **m**ethylation. The website hosts the top 100 m6A sites predicted to be associated to 705 diseases at significance level of 0.1, and supports queries that may be a disease or the host gene of m6A site (see **Figure 6**). Additionally, the prediction results can be downloaded in batch for large-scale automated analysis such as result comparison. The DRUM website is freely available at: www.xjtlu.edu.cn/ biologicalsciences/drum.

# CONCLUSIONS

Investigation of N 6 -methyladenosine (m6A) RNA modification over the past 4 decade, especially since 2012, has uncovered its critical biological functions in various cellular processes. It has been clearly shown that RNA modifications directly or indirectly contribute to disease development and play a critical role in the many diseases such as cancers

FIGURE 6 | DRUM Database. DRUM is a public online database for disease-associated m6A sites. It integrates the m6A sites predicted to be associated with 705 diseases. The statistical significance of the prediction was assessed by random permutation. Users can access the data via the name of disease or the hosting gene of m6A site. It also supports the download of the entire prediction results for automated large-scale analysis.

(Deng et al., 2017; Wang S. et al., 2017) and virus infections (Gokhale and Horner, 2017). It is solely needed to cover the epitranscriptome perspective of disease pathology or unveil the regulatory circuit of diseases regulated from epitranscriptome layer.

We presented here a multi-layer heterogeneous network model coupled with the RWR algorithm, which effectively incorporated five types of association among the diseases, genes and m6A sites, to unveil the disease association of individual m6A RNA methylation sites. To evaluate the performance of the proposed approach, a ten-fold cross-validation was performed. Superior performance is achieved by our approach (overall AUC: 0.827, average AUC 0.867) compared with the hypergeometric test-based approach (overall AUC: 0.7333 and average AUC: 0.723) and the random predictor (overall AUC: 0.550 and average AUC: 0.486). And a number of cancer-related RNA methylation sites are validated from existing studies. At last, an online database DRUM was constructed to enable the query of top m6A sites related to 705 different diseases.

It is worth noting that, as indicated in equation (1), the calculation of RNA methylation profiles partially relies on the expression data, which inevitably induces dependency between them. Ideally, we want to use independent datasets that profile RNA methylation and expression, respectively. Additionally, the detectability of methylated molecule depends on the abundance of transcripts, i.e., if the expression level of a specific transcript is low, it is not possible to accurately determine the methylation level (M-value) of the m6A sites on it. The current formulation of methylation level, as shown in equation (1), will penalize those very lowly expressed transcripts, and reports an M-value close to 0, which may induces additional bias to methylation profiles (as shown in **Figure 7A**).

Nevertheless, dispute of the bias and dependency that may be induced to the data, we didn't observe linear correlation (or anti-correlated) between the expression of the methylation level of an m6A site and the expression level of its host gene. The methylation level of an m6A site is not more linearly correlated (or anti-correlated) to the expression level of its host gene than a random gene (see **Figure 7B**). As suggested by a previous study, the epitranscriptome regulation changes only the percentage of methylated molecule, while transcriptional regulation changes only the abundance (Meng et al., 2014). Although slightly affecting each other, the two regulation mechanisms are observed to be largely independent and simultaneously regulate the

framework aims to predict disease-associated m6A sites. It is possible that multiple sites located on the same transcripts are associated to different diseases. Compared to general disease-gene association prediction, the proposed framework provides a more specific circuit of disease mechanism that functions at epitranscriptome layer.

FIGURE 7 | (A) Distribution of RNA methylation level (M-value). The estimated methylation levels are not strictly centered around 0, suggesting that the formation of M-value, which penalize lowly expressed transcripts as suggested by equation (1), may induce bias to the estimated methylation profiles on very lowly expressed transcripts. (B) Little linear correlation is observed between gene expression and RNA methylation profiles. The red line indicates the self-gene Pearson correlation coefficients, which are the correlation between the methylation level of a site and the expression level of its hosting gene. The gray lines indicate the Pearson correlation between the methylation level of a site and the expression level of a random gene under the 38 experimental conditions, when the methylation data and expression data are strictly separated, and thus independent from each other. A total of 1,000 gray lines were obtained from 1,000 random permutations, and serve as a null model of Pearson correlation distribution. The methylation level of an m6A site is not more linearly correlated (or anti-correlated) to the expression level of its host gene than a random gene.

transcriptome and epitranscriptome, which is consistent with our observation. As little linear correlation is observed between RNA methylation and gene expression profiles, and the association network was built based on Pearson correlation that relies on linear correlation (see section **Materials and Methods**), the predicted patterns associated with m6A sites are not likely to be dominated by gene expression profiles.

It is also worth noting that, by starting from the methylation profiles of individual m6A sites, our work focused specifically on the disease circuits that are potentially regulated at epitranscriptome layer at the resolution of individual m6A sites (see **Figure 8**). This work is substantially different from general disease-gene association prediction, where the gene and disease may interact at any layer of gene expression regulation, such as at transcriptional or post- transcriptional layer (Chen and Yan, 2013). The work is also quite different from existing works (Zhang et al., 2016b, 2019) that aimed to predict diseases that may be significantly regulated at epitranscriptome layer, because these studies unveiled only the potential association between diseases and m6A RNA methylation, but didn't answer specifically which m6A sites are involved in the regulation process. Compared to existing works, our computational framework provided a more specific resolution for the study of disease mechanism functions at the epitranscriptome layer.

This presented computational scheme can be easily extended in the future by incorporating additional data sources such as disease-related functional variants involved in m6A modification (Jiang et al., 2018; Zheng et al., 2018), the regulatory specificity of the RNA methyltransferases and demethylases (Liu H. et al., 2018), or the associations between m6A site to other biomolecules (Xuan et al., 2018), so as to further improve prediction accuracy. Additionally, the method can be conveniently applied to other

## REFERENCES


RNA modification types such as m1A (Dominissini et al., 2016) and Pseudouridine (Cabili et al., 2011) as well in other species such as mouse and yeast when sufficient amount of data is available.

#### DATA AVAILABILITY

Publicly available datasets were analyzed in this study. This data can be found here: http://www.csbio.sjtu.edu.cn/ bioinf/OUGene/.

## AUTHOR CONTRIBUTIONS

JM and YT conceived the idea and initialized the project. ZW, YT, and BS processed the raw data. YT and XW constructed the network. YT implemented prediction, the performance evaluation and the case studies. KC built the website. YT drafted the manuscript. All authors read, critically revised, and approved the final manuscript.

## FUNDING

This work has been supported by National Natural Science Foundation of China [31671373]; Jiangsu University Natural Science Program [16KJB180027]; XJTLU Key Program Special Fund [KSF-T-01]; Jiangsu Six Talent Peak Program [XYDXX-118].

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene. 2019.00266/full#supplementary-material


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Tang, Chen, Wu, Wei, Zhang, Song, Zhang, Huang and Meng. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Suppression of Conditional TDP-43 Transgene Expression Differentially Affects Early Cognitive and Social Phenotypes in TDP-43 Mice

*Pablo R. Silva, Gabriela V. Nieva and Lionel M. Igaz\**

*IFIBIO Bernardo Houssay, Grupo de Neurociencia de Sistemas, Facultad de Medicina, Universidad de Buenos Aires-CONICET, Buenos Aires, Argentina*

Dysregulation of TAR DNA-binding protein 43 (TDP-43) is a hallmark feature of frontotemporal dementia (FTD) and amyotrophic lateral sclerosis (ALS), two fatal neurodegenerative diseases. TDP-43 is a ubiquitously expressed RNA-binding protein with many physiological functions, playing a role in multiple aspects of RNA metabolism. We developed transgenic mice conditionally overexpressing human wild-type TDP-43 protein (hTDP-43-WT) in forebrain neurons, a model that recapitulates several key features of FTD. After postweaning transgene (TG) induction during 1 month, these mice display an early behavioral phenotype, including impaired cognitive and social function with no substantial motor abnormalities. In order to expand the analysis of this model, we took advantage of the temporal and regional control of TG expression possible in these mice. We behaviorally evaluated mice at two different times: after 2 weeks of post-weaning TG induction (0.5 month group) and after subsequent TG suppression for 2 weeks following that time point [1 month (sup) group]. We found no cognitive abnormalities after 0.5 month of hTDP-43 expression, evaluated with a spatial working memory task (Y-maze test). Suppression of TG expression with doxycycline (Dox) at this time point prevented the development of cognitive deficits previously observed at 1 month post-induction, as revealed by the performance of the 1 month (sup) group. On the other hand, sociability deficits (assessed through the social interaction test) appeared very rapidly after Dox removal (0.5 month) and TG suppression was not sufficient to reverse this phenotype, indicating differential vulnerability to hTDP-43 expression and suppression. Animals evaluated at the early time point (0.5 month) postinduction do not display a motor phenotype, in agreement with the results obtained after 1 month of TG expression. Moreover, all motor tests (open field, accelerated rotarod, limb clasping, hanging wire grip) showed identical responses in both control and bigenic animals in the suppressed group, demonstrating that this protocol and treatment do not cause non-specific effects in motor behavior, which could potentially mask the phenotypes in other domains. Our results show that TDP-43-WT mice have a phenotype that qualifies them as a useful model of FTD and provide valuable information for susceptibility windows in therapeutic strategies for TDP-43 proteinopathies.

Keywords: TDP-43, frontotemporal dementia, amyotrophic lateral sclerosis, transgenic mice, behavior, animal model, proteinopathy

#### *Edited by:*

*Emanuele Buratti, International Centre for Genetic Engineering and Biotechnology, Italy*

#### *Reviewed by:*

*Mauricio Fernando Budini, Universidad de Chile, Chile Adam Keith Walker, University of Queensland, Australia*

> *\*Correspondence: Lionel M. Igaz lmuller00@yahoo.com*

#### *Specialty section:*

*This article was submitted to RNA, a section of the journal Frontiers in Genetics*

*Received: 22 October 2018 Accepted: 08 April 2019 Published: 24 April 2019*

#### *Citation:*

*Silva PR, Nieva GV and Igaz LM (2019) Suppression of Conditional TDP-43 Transgene Expression Differentially Affects Early Cognitive and Social Phenotypes in TDP-43 Mice. Front. Genet. 10:369. doi: 10.3389/fgene.2019.00369*

# INTRODUCTION

Neurodegenerative diseases are incurable and debilitating conditions that arise as a consequence of the progressive degeneration and/or death of nerve cells. This heterogeneous group of disorders is characterized by behavioral changes that differ according to the disease entity. Among these, most forms of amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD) show clinic-pathological overlap. These diseases may represent a single pathological entity with diverse clinical manifestations (Geser et al., 2010; Burrell et al., 2016), included within the heterogeneous group of "TDP-43 proteinopathies." TDP-43 pathology frequently associates with other disorders, including Alzheimer's disease, dementia with Lewy body, hippocampal sclerosis, and chronic traumatic encephalopathy, among others (Kovacs, 2016). The umbrella term "TDP-43 proteinopathies" was coined shortly after the discovery that most forms of ALS and around 45% of FTD cases have TDP-43-positive neuronal and glial inclusions as a major pathological hallmark (Arai et al., 2006; Neumann et al., 2006; Cairns et al., 2007). Roughly, another 45% of FTD cases is characterized with tau pathology and 5–10% with FUS accumulation, while a small percentage of ALS cases is associated with abnormalities in SOD1, FUS, or other proteins (Nguyen et al., 2018).

TDP-43 is a highly conserved and widely expressed RNA-binding protein (RBP) that normally resides predominantly in the nucleus of all cells. It has been described to be involved in different cellular processes, most conspicuously RNA metabolism, including RNA translation, splicing, and transport (Ratti and Buratti, 2016). Dysregulation of RNA metabolism can occur at multiple levels of RNA processing including transcription, splicing, mRNA transport, stability, and translation (Coyne et al., 2017). This, in turn, will have numerous implications for the generation of biochemical, pathological, and behavioral phenotypes. Although several animal models of FTD/ALS disease have been developed in the past few years, an important caveat is that none exactly mimic the pathophysiology and the phenotype of human FTD/ALS (Alrafiah, 2018). However, studying the behavioral impact of modulating FTD/ALS-related RBPs shows that they recapitulate different clinical presentations in patients, representing an array of behavioral domains that include motor, cognitive, and social symptoms (Philips and Rothstein, 2015; Nolan et al., 2016; Ahmed et al., 2017; Tan et al., 2017; Ittner et al., 2018).

Given that multiple (although not all) forms of both sporadic and familial FTD/ALS cases of different genetic origin (i.e., mutations in C9orf72, progranulin, TARDBP, etc.) converge in a common pathological presentation that involves TDP-43 dysregulation (Seelaar et al., 2011; Renton et al., 2014), there is a growing need in the field for understanding the pathological and behavioral consequences of these abnormalities.

Using a mouse model of TDP-43 proteinopathies that conditionally overexpresses the wild-type human TDP-43 protein (hTDP-43-WT) in forebrain neurons and reproduces neuropathological changes of the FTD/ALS spectrum (Igaz et al., 2011), we have recently shown that they display early behavioral phenotypes in the cognitive and social domains (Alfieri et al., 2016). Interestingly, these animals also exhibit progressive motor abnormalities after prolonged transgene expression (Alfieri et al., 2016). These behavioral features provide an interesting correlate to human disease, starting with a more "pure" FTD-like phenotype and later evolving into a FTD with motor neuron disease presentation (expressing some ALS-like features). A major advantage of using inducible transgenic systems is the possibility to prevent or reverse specific phenotypes after transgene suppression. This, in turn, may provide information on the differential susceptibility of the diverse phenotypical manifestations of TDP-43 manipulation, with implications for understanding pathological onset and progression in human disease and defining time windows with therapeutic relevance.

In this work, we aimed to investigate how short-term transgene suppression affects the early behavioral phenotypes displayed by conditional TDP-43-WT mice (Alfieri et al., 2016). In particular, we studied (1) if the cognitive and social phenotypes previously described were present after a shorter period (0.5 month) of transgene expression, and (2) the effect of the suppression on both affected domains and in motor behaviors, which were preserved after 1 month of transgene expression. Assessment of the different behavioral domains both before and after transgene suppression indicates that sociability is rapidly impaired while cognitive and motor performance is preserved after 0.5 month of hTDP-43 expression. Moreover, suppression of transgene expression prevented the development of cognitive deficits and had no effect on motor behavior. Remarkably, social behavior remained compromised after transgene suppression, indicating differential vulnerability to TDP-43 manipulation in the neuronal circuits underlying diverse behaviors.

These results emphasize the need to comprehensively evaluate the behavioral phenotypes in multiple disease models and suggest that the timing of treatment could widely influence the outcome of the different clinical manifestations of TDP-43 proteinopathies.

# MATERIALS AND METHODS

#### Animals

This study was carried out in accordance with the recommendations of the National Animal Care and Use Committee of the University of Buenos Aires (CICUAL). The protocol was approved by the CICUAL. Mice were kept under a 12-h light/dark cycle, with controlled temperature (23 ± 2°C) and humidity (40–60%) and had *ad libitum* access to food and water. To produce hTDP-43 transgenic lines, as described previously (Igaz et al., 2011), pronucleus of fertilized eggs from C57BL/6J × C3HeJ F1 matings were injected with a vector containing hTDP-43-WT cDNA. Monogenic tetO-TDP-WT12 mice were bred to Camk2a-tTA mice (Mayford et al., 1996; Jackson Laboratory) generating non-transgenic, tTA monogenic, single tetO-TDP-43 transgenic mice (non-TDP-43 expressing control mice) and bigenic mice expressing hTDP-43-WT12 (hereinafter referred to as tTA/WT12).

Mice were treated with 0.2 mg/ml Dox (Doxycycline Hyclate, sc-204734A, Santa Cruz Biotechnology) in drinking water to avoid prenatal and postnatal developmental effects of transgene expression. hTDP-43 expression was induced by switching mice to regular drinking water (without Dox) at weaning (postnatal day 28). Mice were analyzed at different time points (**Figure 1A**).

Mice were screened for the presence of the transgene using genomic DNA isolated from ear biopsies. PCR amplification was done with the following primers: TDP-forward (TTGGTAATAGCAGAGGGGGTGGAG), MoPrP-reverse (TCCCCCAGCCTAGACCACGAGAAT), Camk2a-tTA-forward (CGCTGTGGGGCATTTTACTTTAG), and Camk2a-tTA-reverse (CATGTCCAGATCGAAATCGTC) as previously described (Igaz et al., 2011; Alfieri et al., 2014). To homogenize genetic background and minimize variability, the TDP-43-WT12 transgenic line used in these experiments was established by crossbreeding with C57BL/6J mice for >10 generations. For both non-transgenic and transgene-expressing groups, animals of either sex were included in all experimental groups.

#### Transgene Suppression Protocol

For suppression experiments, mice were treated again with 0.2 mg/ml Dox in drinking water at 0.5 month after weaning to suppress transgene expression for 2 weeks, as indicated in **Figure 1A**. Animals were analyzed before the transgene suppression at 2 weeks after weaning (0.5 month mice) and

weeks. The behavioral analysis (motor, cognitive, and social) was performed at 0.5 month post-weaning (0.5 month mice) and at 1 month post-weaning [1 month (sup) mice]. The results on these mice were compared with mice in which transgene expression was preserved until 1 month post-weaning (1 month mice). (B–E) Expression of human TAR DNA-binding protein 43 (TDP-43) in tTA/WT12 mice. (B) Schematic diagrams (adapted from Paxinos and Franklin, 2008) showing different hTDP-43-expressing brain areas (indicated by the labeled boxes) such as somatosensory cortex (SSC), hippocampal cornus ammonis 1 (CA1), and dentate gyrus (DG). (C–E) Double immunofluorescence of total TDP-43 (human + mouse TDP-43) and human TDP-43 in representative coronal brain sections of control, tTA/WT12 1 month and tTA/WT12 1 month (sup) mice. High power micrographs of boxed areas in B are shown: SSC (C), hippocampal CA1 region (D), and DG (E). Scale bar: 50 μm.

after transgene suppression, at 1 month after weaning [1 month (sup) mice]. These mice were also compared with animals in which transgene expression was maintained until 1 month (1 month mice) after weaning.

#### Behavioral Studies

All behavioral tasks were performed during the light phase (lights on at 7 a.m.; lights off at 7 p.m.) with the exception of the Y-maze spontaneous alternation, which was conducted during the initial dark phase (7:00 p.m to 9:00 p.m.) to maximize exploratory behavior and consistently obtain a high number of arm visits. Animals were allowed to habituate in the experimental room (with attenuated light and sound) for at least 1 h before the tests. All tests were recorded through a video camera mounted above the experimental room (unless noted) and mouse position was analyzed by automatic video tracking software (ANY-maze, Stoelting Co.). All mazes and objects used in behavioral analysis were cleaned with 10% ethanol between test sessions and sanitized with 70% ethanol at the end of the day.

In agreement with what we previously demonstrated in experiments with TDP-43-∆NLS and TDP-43-WT transgenic mice (Alfieri et al., 2014, 2016), all non-bigenic offspring (non-transgenic and both single transgenic mice) exhibited similar behavioral responses. Thus, for all subsequent behavioral tests and other experimental analyses, we grouped these genotypes under the control group to compare against bigenic (tTA/WT12) mice.

#### Y-Maze Spontaneous Alternation Test

The horizontal Y-shaped maze consisted of three identical arms of transparent Plexiglas (43 cm × 4 cm × 12.5 cm) placed at 120° angles to each other (Belforte et al., 2010; Alfieri et al., 2014). The test was performed in a room with visual clues and controlled illumination (30 lux), as previously described (Alfieri et al., 2014, 2016). Animals were placed at the end of one arm facing the center and allowed to freely explore the maze for 8 min without training, reward, or punishment. All activities were recorded with a computerlinked video camera mounted above the maze. Mouse position was detected by automatic video tracking (ANY-maze, Stoelting). An alternation was defined as consecutive entrances into each of the three arms without repetition. The percentage of spontaneous alternation was calculated as the number of alternations divided by the possible alternations [(# alternations)/ (total arm entries − 2) × 100]. Total entries were scored as an index of ambulatory activity in the Y maze and mice with scores below 12 were excluded from this test. All mice were tested between 7:00 p.m. and 9:00 p.m. (dark phase) to maximize exploratory behavior (Belforte et al., 2010).

#### Social Interaction Test

The test apparatus comprised a black Plexiglas rectangular box (40.6 cm × 15 cm × 23 cm) divided into three interconnected chambers. The floor of the apparatus was covered with clean bedding. The task was performed as previously described (Alfieri et al., 2014, 2016). For the habituation phase, two identical cylinders of transparent Plexiglas (7 cm diameter, 14 cm tall) with multiple small holes (0.5 cm diameter) to allow olfactory, visual, and auditory interaction, were placed in each one of the end chambers. Then, the test mouse was placed in the central chamber for 5 min and allowed to explore the entire social interaction apparatus. During the test phase, a black object (non-social stimulus) was placed into one cylinder on the "non-social" chamber and a stimulus mouse (a 21–26 days old C57BL/6J male mouse) was placed into the cylinder on the "social" chamber. The test mouse was placed again in the central chamber and allowed to freely explore the apparatus for 10 min. Sniffing time for the social and non-social stimulus was manually scored. Mouse position and time spent in each chamber were analyzed by automatic video tracking (ANY-maze, Stoelting).

#### Open Field Test

To analyze general locomotion and exploratory behavior in a novel environment, we performed the open field test as previously described (Alfieri et al., 2014, 2016). The open field apparatus consisted of a transparent Plexiglas (40 cm × 40 cm × 40 cm) arena with a white floor virtually divided into two zones: periphery and center (comprising 50% of the total area centered). The test mouse was able to explore the novel environment for 20 min. Total distance and center distance traveled by the animal were analyzed. Time bin analysis (every 5 min) was also used. Room illumination was kept at 50 lux. Mouse position was determined by automatic video tracking (ANY-maze, Stoelting).

#### Accelerated Rotarod

For the assessment of motor coordination and balance, we used a rotarod apparatus (Ugo Basile, model 7600). The test was performed as previously described (Alfieri et al., 2016). Briefly, the accelerating rotarod test was set at 4–40 rpm over 300 s, and four trials per test were performed, with a 2-min interval between trials. The latency to fall off from the rotarod was automatically quantified. Mice that rotated passively were removed from the apparatus and scored as fallen.

#### Clasping Phenotype

The presence of clasping was evaluated as previously described (Igaz et al., 2011; Alfieri et al., 2014). hTDP-43-WT12 transgenic mice and age-matched control mice were suspended by the tail 30 cm over an open cage for 30 s. A positive clasping posture was noted for mice that clasped their limbs within 5 s of suspension while maintaining the clasping posture until lowered to the cage.

#### Hanging Wire Grip Test

Grip strength was studied using a standard wire cage and performed as previously described (Alfieri et al., 2014, 2016). Briefly, the mouse was placed on the top of the lid, then the lid was shaken lightly to cause the mouse to grip the wires and next turned upside down. The upside-down lid was held at a height of 20 cm, and the latency to fall off the wire lid was registered. A 60-s cutoff time was used.

#### Brain Tissue Collection

All animals were deeply anesthetized with intraperitoneal administration of 5% chloral hydrate (1 ml/30 gr). Next, mice were perfused transcardially with ice-cold PBS (0.1 M, pH 7.4) supplemented with 10 U/ml heparin. The brains were immediately extracted and fixed overnight by immersion in 4% paraformaldehyde (PFA), and then cryoprotected in 10 and 30% sucrose in PBS for immunofluorescence analyses.

#### Immunofluorescence

Fixed frozen hemispheres were cryosectioned (50 μm) on a sliding freezing microtome (SM 2010R; Leica). The brain slices were stored at −20°C in cryoprotecting solution (50% glycerol, 50% PBS). Double immunofluorescence was performed as follows: the coronal free-floating sections were washed 2 × 5 min with PBS, permeabilized with 1% Triton X-100 in PBS for 1 h, and blocked for 1 h with 0.3% Triton X-100 and 5% goat serum in PBS. The primary antibodies (diluted in 0.3% Triton X-100 and 3% goat serum in PBS) were incubated overnight at 4°C with the indicated dilutions: polyclonal rabbit anti-TDP-43 (as described previously in Igaz et al., 2008) 1:30,000 and monoclonal mouse anti-hTDP-43 (60019-2; Proteintech) 1:10,000. After washing 2 × 5 min with PBS, secondary antibodies conjugated with Alexa Fluor 488 (Invitrogen) and rhodamine (Jackson Laboratories) diluted in 0.3% Triton X-100 and 5% goat serum in PBS were incubated for 4 h at room temperature. Nuclear counterstaining was performed with Hoechst 33342 (2 μg/ml; Sigma). Sections were mounted using 30% glycerol in PBS on gelatin-coated slides. The images were obtained by a Zeiss Axio Imager 2 microscope equipped with APOTOME.2 structured illumination, using a Hamamatsu Orca Flash 4.0 camera.

## Statistical Analysis

Statistical tests were performed as follows, as described in the text and figure legends for each dataset. Student's *t* test was used when comparing only two groups on one behavioral measure. Repeated measures (RM) two-way analysis of variance (ANOVA) followed by Newman-Keuls multiple comparison *post hoc* test, when comparing three or more groups in the social interaction experiments. RM-ANOVA followed by Bonferroni's multiple comparisons *post hoc* test was used for accelerated rotarod. Fisher exact test was performed for clasping analysis. When non-parametric tests were required (hanging wire grip test), Mann-Whitney *U* test was used. Statistical analysis of behavioral tests was performed using PRISM 6 (Graph Pad software) or Statistica 7 (Stat Soft). Data are presented as mean values ± SEM. A *p* < 0.05 was considered statistically significant.

# RESULTS

In a recent work, we performed a detailed characterization of the behavioral changes occurring in hTDP-43-WT mice (Alfieri et al., 2016). Using an induction protocol that avoids developmental and early postnatal deficits (Igaz et al., 2011), we showed that post-weaning expression of hTDP-43 in forebrain neurons for 1 month led to social and cognitive phenotypes in the absence of clear motor abnormalities. The main goals of this work were to understand whether these social and cognitive changes occur earlier following hTDP-43 expression and to define if the suppression of transgene expression might prevent or reverse the behavioral deficits.

tTA/WT12 bigenic mice or control littermates were induced at weaning (postnatal day 28) by removing Dox from their water supply (**Figure 1A**). Paralleling our studies using mice that express a cytoplasmic form of TDP-43 (hTDP-43-∆NLS) under the same promoter system (Alfieri et al., 2014), we defined a protocol that induces transgene expression for 2 weeks, and then, we suppressed hTDP-43 expression by treating mice for two additional weeks with Dox. We analyzed these animals at two time points, thus defining two experimental groups: mice after 2 weeks of Dox removal (0.5 month) and mice with subsequent transgene suppression due to treatment with Dox for additional 2 weeks, termed 1 month (sup) (**Figure 1A**). In this way, we can study both the installment/phenotype development at very early time points and the effect of transgene suppression on behavioral performance. This analysis, combined with our data from tTA/WT12 mice induced for 1 month (**Figure 1A**; Alfieri et al., 2016), allows us to study and interpret the susceptibility of different behavioral domains to transgene suppression.

In order to assess proper regulation of transgene expression within our experimental timeline, we performed double immunofluorescence studies (**Figures 1C**–**E**). We used a polyclonal antibody (referred as total TDP-43) that reacts to both endogenous (mouse) and transgenic (human) forms of TDP-43, and a monoclonal antibody that only recognizes the human isoform of the TDP-43 protein (in this case, human TDP-43-WT), termed human TDP-43 (hTDP-43). Representative micrographs from forebrain regions (see diagram in **Figure 1B**), including the somatosensory cortex (SSC, **Figure 1C**), hippocampal cornus ammonis 1 (CA1) region (**Figure 1D**), and Dentate Gyrus (DG, **Figure 1E**), show that bigenic tTA/WT12 mice correctly express transgenic human TDP-43 in the nucleus after 1 month of Dox removal (1 month group). On the contrary, bigenic mice from the 1 month (sup) group are almost completely devoided of hTDP-43 immunoreactivity. As expected, mice from the control group only show signal for the total TDP-43 antibody but no hTDP-43 staining (**Figures 1C**–**E**). Interestingly, total TDP-43 staining in the 1 month (sup) animals (but not in the control or 1 month groups) showed that a subset of cortical and hippocampal cells display altered nuclear morphology and/or nuclear TDP-43 distribution, suggesting that the mechanisms for coping with hTDP-43 overexpression and recovery might differ. These results demonstrate both robust nuclear expression of the transgene after Dox removal and proper suppression of hTDP-43 expression following re-installment of Dox treatment.

#### Suppression of TDP-43-WT Expression Prevents Installment of Early Cognitive Deficits

We have recently demonstrated that post-weaning overexpression of hTDP-43-WT during 1 month leads to cognitive deficits in our inducible tTA/WT12 mouse model (Alfieri et al., 2016). These phenotypes include alterations not only in object recognition memory but also in spatial working memory, as assessed by the object recognition test and the Y-maze spontaneous alternation test, respectively. These cognitive tasks rely on cortical (perirhinal, prefrontal) and hippocampal functional integrity (Lalonde, 2002; Warburton and Brown, 2010), and these areas widely express the transgene (**Figure 1**; Igaz et al., 2011). Since typical clinical features of FTD patients include alterations of prefrontal-dependent executive functions (Seelaar et al., 2011), we used the Y-maze task to monitor the impact of short-term hTDP-43 overexpression in this type of behavior (**Figure 2A**). The Y-maze test relies on the animal's preference to explore a new arm of the maze rather than returning to a previously visited arm (Lalonde, 2002). When tTA/WT12 mice were tested after 0.5 month of induction, control mice showed spontaneous alternation percentages avoiding the previously visited arms and bigenic mice were indistinguishable from control mice, indicating normal working memory in both groups (*t*(21) = 0.8096, *p* = 0.4273; **Figure 2B**, left panel). Importantly, locomotion (estimated by the number of arm entries) was similar between groups (*t*(21) = 0.2947, *p* = 0.7711; **Figure 2B**, right panel). These data, together with our results showing that tTA/WT12 mice displayed altered performance in the Y-maze test 1 month post-induction (Alfieri et al., 2016), indicate that cognitive deficits (specifically, spatial working memory) begin to occur in the 0.5–1 month time window of transgene expression in this mouse model.

To evaluate if short-term (2 weeks) transgene suppression could prevent the installment of this deficit, we reintroduced Dox in the drinking water of these animals after the 0.5 month time point and assessed Y-maze performance 1 month postweaning, in a group termed 1 month (sup) (see timeline in **Figure 1A**). Student's *t* test analysis established that bigenic animals from the 1 month (sup) group did not show significant differences compared with control mice (*t*(20) = 1.658, *p* = 0.1128; **Figure 2C**, left panel). Again, the total number of arm entries was similar between groups (*t*(20) = 0.3583, *p* = 0.7238; **Figure 2C**, right panel). In summary, these data demonstrate that short-term suppression of transgene expression can prevent cognitive abnormalities in young TDP-43-WT mice.

#### TDP-43-WT Mice Develop Sociability Abnormalities Very Rapidly After hTDP-43 Induction and They Persist After Transgene Suppression

Within the spectrum of clinical presentations of TDP-43 proteinopathies, deficits in social behavior are a conspicuous feature of FTD patients (Desmarais et al., 2017). Moreover, we and others have shown that altered sociability is a feature of different animal models of FTD, including those based on manipulations of TDP-43, tau, fused in sarcoma (FUS), progranulin, and CHMP2B (Ghoshal et al., 2012; Filiano et al., 2013; Alfieri et al., 2014, 2016; Koss et al., 2016; Vernay et al., 2016; Shiihashi et al., 2017). Specifically, we demonstrated that both inducible transgenic mice expressing either cytoplasmic (TDP-43-∆NLS) or nuclear (TDP-43-WT) form of human TDP-43 show altered early social phenotypes, 1 month postinduction (Alfieri et al., 2014, 2016). At this time point, bigenic tTA/WT12 mice display decreased performance in the threechamber social interaction test (**Figure 3A**) respective to control animals (Alfieri et al., 2016), although to a lesser degree than TDP-43-∆NLS mice (Alfieri et al., 2014).

To determine whether this phenotype was detectable right before starting the suppression protocol, we studied sociability in tTA/WT12 mice 0.5 month after hTDP-43 induction. As expected, control mice spent more time interacting with the demonstrator mouse (S, social stimulus) than with the inanimate object (NS, non-social stimulus). Bigenic mice showed a deficit in social interaction evidenced by a significant decrease in time spent in direct social exploration of a conspecific demonstrator (sniffing time, RM-ANOVA, interaction *F*(1,24) = 6.32, *p* = 0.019; **Figure 3B**). This reduced social activity results from a

1 month (sup) mice, respectively).

significant reduction in the mean sniffing time per contact (RM-ANOVA, interaction *F*(1,24) = 4.60, *p* = 0.042; **Figure 3C**) with no significant differences in the number of social contacts among social conditions (RM-ANOVA, S-NS preference *F*(1,24) = 46.98, *p* < 0.0001; **Figure 3D**). Decreased social interaction cannot be attributed to an altered motor function (non-significant differences in the total number of chamber entrances, RM-ANOVA, S-NS preference *F*(1,24) = 0.57, *p* = 0.46, genotype *F*(1,24) = 0.54, *p* = 0.47, interaction *F*(1,24) = 0.51, *p* = 0.48, **Figure 3E**, and total traveled distance, 16.95 ± 2.57 vs. 14.80 ± 1.77 m for control and bigenic mice, respectively; *t*(24) = 0.7137, *p* = 0.4823, Student's *t* test) or to general

shown. The heat map was constructed based on body position. (B) Total distance traveled. (C) Total distance traveled in time segments of 5 min. (D) Relative center distance. No significant differences were found between controls and bigenic animals in locomotion or exploration. (E) Motor coordination and balance were not affected after suppression of hTDP-43-WT expression. Accelerated rotarod performance (4–40 rpm/5 min): four trials per test were performed during the test day with a 2-min interval between trials. Latency to fall off the apparatus was recorded. (F–G) No signs of spasticity or motor strength deficits are detected in suppressed hTDP-43-WT mice. (F) The absence of clasping phenotype in both 0.5 month and 1 month (sup) mice. Percentage of animals positive for abnormal clasping and number of animals positive/total tested are shown. (G) Hanging wire grip test. Grip strength was assessed using a standard wire cage turned upside down. The latency to fall off the wire lid was quantified; a 60-s cutoff time was used. No significant differences were found between control and bigenic animals [*p* > 0.05 Student's *t* test in (B,D), Mann-Whitney *U* test in (G), repeated-measures ANOVA in (C,E), Fisher's exact test in (F)]. *n* = 12, 15 for control and bigenic 0.5 month mice, respectively; *n* = 12, 14 for control and bigenic 1 month (sup) mice, respectively. Data represent mean ± SEM.

deficits in novelty exploration (both groups interact similarly with the novel object, compare non-social control vs. non-social bigenic groups in **Figures 3B**–**D**).

Next, we assessed the effect of turning off hTDP-43 expression on social behavior. Notably, transgene suppression in the 1 month (sup) group did not cause any improvement in social interaction in bigenic mice respective to controls (RM-ANOVA, interaction *F*(1,21) = 6.41, *p* = 0.019; **Figure 3F**). Moreover, the social phenotype showed a tendency to worsen after transgene suppression, since both the mean contact duration (RM-ANOVA, S-NS preference *F*(1,21) = 27.76, *p* < 0.0001; genotype *F*(1,21) = 5.60, *p* = 0.027; **Figure 3G**) and the total contact number (RM-ANOVA, interaction *F*(1,21) = 4.75, *p* = 0.041; **Figure 3H**) now showed significantly lower values in bigenic mice respective to controls. The number of entrances was equivalent in both groups in both social and non-social sides (RM-ANOVA, S-NS preference *F*(1,21) = 2.76, *p* = 0.11; genotype *F*(1,21) = 0.21, *p* = 0.64; interaction *F*(1,21) = 0.02, *p* = 0.88; **Figure 3I**), as was the interaction with the novel object (non-social control vs. bigenic comparison in **Figures 3F,H**) and the total traveled distance (16.57 ± 1.64 vs. 11.75 ± 1.74 m for control and bigenic mice, respectively; *t*(21) = 1.967, *p* = 0.063, Student's *t* test). These results show that TDP-43-WT mice very rapidly develop alterations in social behavior, and these cannot be ameliorated by short-term suppression of transgene expression. Moreover, they support the idea that diverse behavioral domains present differential susceptibility to hTDP-43 overexpression and subsequent renormalization of TDP-43 levels.

#### Preserved Motor Behavior in TDP-43-WT Mice After Short-Term Transgene Overexpression and Upon Dox Treatment

Although TDP-43 proteinopathies include the FTD/ALS spectrum of disorders, only a subset of FTD cases present with motor abnormalities. In case of tTA/WT12 mice, we described a progressive motor phenotype that slowly develops with increased time of hTDP-43 expression (Alfieri et al., 2016). In this mouse model, motor deficits are virtually absent at 1 month postinduction but gradually emerge and are clearly present after 12 months of transgene expression.

We assessed general motor function and exploratory activity in the 1 month (sup) group using the open field test (**Figures 4A**–**D**). In agreement with the data from 1 month induced mice (Alfieri et al., 2016), bigenic tTA/WT12 mice that underwent the suppression protocol traveled the same distance as control animals (*t*(24) = 0.6150, *p* = 0.5444; **Figures 4A,B**). A more detailed analysis of this parameter, dividing the session in 5 min time bins, indicated that there was no averaging effect and both groups performed similarly in each of the time segments (**Figure 4C**). Moreover, relative center distance in this task was also indistinguishable from controls (*t*(24) = 0.7907, *p* = 0.4369; **Figure 4D**). Other parameters showed no differences (Student's *t* test) between bigenic and control groups, including average speed (4.07 ± 0.33 vs. 3.83 ± 0.21 cm/s for control and TDP-43-WT12 mice, respectively; *t*(24) = 0.6294, *p* = 0.5351), maximum speed (60.2 ± 8.9 vs. 54.7 ± 4.2 cm/s for control and TDP-43-WT12 mice, respectively; *t*(24) = 0.5888, *p* = 0.5615), and percentage time in center (18.83 ± 1.67 vs. 16.81 ± 1.68% for control and TDP-43-WT12 mice, respectively; *t*(24) = 0.8465, *p* = 0.4057).

Next, we evaluated motor coordination and balance in TDP-43-WT animals using the accelerated rotarod test. Both bigenic and control mice behaved similarly in the 1 month (sup) paradigm (repeated-measures ANOVA, *F*(1,24) = 0.9200, *p* = 0.3470 for group; *F*(3,72) = 44.09, *p* < 0.0001 for trial; *F*(3,72) = 2.342, *p* = 0.0804 for interaction; **Figure 4E**). We also performed an analysis of limb clasping reflex phenotype. Both control and bigenic animals extended their limbs normally when being suspended by their tails, after 0.5 month of induction and subsequently after transgene suppression (**Figure 4F**). In addition, TDP-43-WT mice displayed indistinguishable latencies to fall in a hang wire test in the 0.5 month induction and 1 month (sup) groups, indicating intact grip strength (**Figure 4G**).

These data indicate that: (1) motor abnormalities are not present after 0.5 month of transgene expression, consistent with our results after 1 month of hTDP-43 overexpression (Alfieri et al., 2016) and (2) the suppression protocol had no unspecific effects on motor behavior, since locomotor/exploratory behavior, motor coordination, limb clasping, and grip strength were indistinguishable from controls.

In summary, we present evidence that defines the early time course of behavioral impairments and establish the susceptibility of different behavioral domains to transgene suppression in tTA/WT12 mice (**Figure 5**).


FIGURE 5 | Summary of the behavioral phenotypes observed in tTA/WT12 mice after 0.5 or 1 month (from Alfieri et al., 2016) of hTDP-43 expression and in the 1 month (sup) group. NS: non-significant differences respective to control mice.

# DISCUSSION

Animal models are invaluable tools to understand the pathological basis of neurodegenerative diseases such as FTD and ALS. In this study, we explored the behavioral consequences of suppressing hTDP-43 expression in our inducible tTA/WT12 mice.

The identification of disease-related molecules, the discovery of pathogenic pathways, and the targeting of pathogenic proteins are all critical steps to establish therapeutic strategies for incurable diseases, including neurodegenerative conditions. A major breakthrough was the discovery of TDP-43 as a pathological hallmark of ALS and FTD (Arai et al., 2006; Neumann et al., 2006). Mutations in more than 25 genes have been shown to cause ALS and FTD, and it is noteworthy that most of these mutant proteins participate in two intracellular machineries: the RNA and protein quality control systems (Ito et al., 2017).

TDP-43 belongs to a large family of RNA-binding proteins (RBPs) that have been shown to have multiple links with disease pathogenesis. Of specific relevance for neurodegenerative diseases, a prominent theory in the field states that neurons are particularly vulnerable to disruption of RBP dosage and dynamics (Conlon and Manley, 2017; Cookson, 2017). In particular, recent progress investigating the genetics of the FTD/ALS disease spectrum has shown that at least seven RBPs have been identified with disease-related mutations. These include TDP-43, FUS, the heterogeneous nuclear ribonucleoproteins (hnRNPs) hnRNPA1 and hnRNPA2B1, T cell intracytoplasmic antigen (TIA1), TATA box-binding protein-associated factor 15 (TAF15), and Ewing sarcoma breakpoint region 1 (EWSR1) (reviewed in Ito et al., 2017). This avalanche of information resulted in the development and characterization of multiple animal models of FTD/ALS based on the altered expression of these proteins (Lutz, 2018). These include TDP-43 models in both invertebrate (*C. elegans*, *Drosophila*) and vertebrate (mouse, rat) organisms (reviewed in Picher-Martel et al., 2016). In addition to traditional transgenic and knockout technology, approaches such as application of CRISPR/Cas9 and viral transgenesis provide exciting alternative avenues to explore.

Inducible animal models based on the tet-tTA system have been used to assess the neuropathological and behavioral impact of expression of different neurodegeneration-associated proteins, including TDP-43, tau, APP, α-synuclein, SCA1, SCA3, and huntingtin (Yamamoto et al., 2000; Zu et al., 2004; Jankowsky et al., 2005; Nuber et al., 2008; Boy et al., 2009; Sydow et al., 2011; Walker et al., 2015). However, only a few studies provided evidence for selective behavioral vulnerability to transgene suppression in neurodegenerative disease models.

Transgenic mice expressing TDP-43 carrying a pathogenic A315T mutation in CNS neurons display early motor and anxietylike phenotypes that are reversible on Dox treatment, while memory impairments persist after transgene suppression (Ke et al., 2015). Our previous work in a mouse model expressing a cytoplasmic form of TDP-43 (TDP-43-∆NLS) demonstrated a differential response to transgene suppression, reversing the early motor and cognitive defects but having no effect of the sociability abnormalities developed by these mice (Alfieri et al., 2014). In light of these recent examples, the evaluation of differential behavioral susceptibility to TDP-43-related dysregulation arises as a relevant parameter for understanding the progression of neurodegenerative disease manifestations, particularly in TDP-43 proteinopathies.

Our results show that cognitive performance, as evaluated in the Y-maze test, remains unaltered after 2 weeks of wildtype hTDP-43 expression (0.5 month group). By contrast, we previously reported that expression of TDP-43-∆NLS using the same promoter system and induction protocol leads to a clearcut decrease in Y-maze alternation, reaching levels indistinguishable from chance and thus indicating severe deficits in spatial working memory (Alfieri et al., 2014). While those animals recover their cognitive capacity when suppressed in the 1 month (sup) group, we interpret the normal levels of tTA/WT12 mice Y-maze performance in 1 month (sup) mice as evidence that we are preventing the development of the working memory phenotype described in hTDP-43-WT mice with continuous expression for 1 month (Alfieri et al., 2016). However, we want to stress here that we are not comparing the degree of cognitive dysfunction between the different expression time points (0.5 and 1 months) and the 1 month (sup) groups, but qualitatively assessing if a phenotype is present. The different onset time for cognitive deficits in our two inducible models highlights the fact that TDP-43-WT mice have a milder behavioral phenotype than TDP-43-∆NLS. This is also substantiated by the analysis of social and motor behavior.

In terms of social phenotype, our conditional hTDP-43-WT animal model recapitulates deficits that constitute a core feature of the clinical FTD/ALS spectrum, particularly in several subtypes of FTD (Shinagawa et al., 2006; Sado et al., 2009; Christidi et al., 2018). Decreased sociability is a recurrent feature of FTD patients, and the three-chamber social interaction test used here demonstrates that TDP-43-WT mice develop social deficits very rapidly, after only 2 weeks of transgene expression (0.5 month group). Contrary to what happened with cognitive performance, sociability cannot be rescued or preserved in these mice after short-term transgene suppression. Interestingly, TDP-43-∆NLS showed the same dynamics of social deficits, although the abnormalities were more profound in those mice (Alfieri et al., 2014). Altogether, these data indicate that the neuronal circuits underlying these two different behavioral domains are differentially affected in TDP-43-WT mice, which, when considered in the context of a similar result from TDP-43-∆NLS animals, suggest an exquisite vulnerability for TDP-43-elicited changes in social function.

Although motor abnormalities develop after at least 6 months of transgene expression in TDP-43-WT mice (Alfieri et al., 2016), we sought to establish that (1) motor deficits at 0.5 month of expression cannot explain the phenotypes in other behavioral domains evaluated (in addition to the internal controls within each test) and (2) re-introduction of Dox treatment did not alter motor performance, eliminating thus the possibility that the treatment necessary for transgene suppression could be interfering in the proper assessment or interpretation of behavioral tests performed in the 1 month (sup) group. This is an important point, since it has been reported that Dox treatment may have non-specific effects on certain behaviors, although the strain more resistant to these effects was C57BL/6J (Han et al., 2012) and our TDP-43 inducible mouse models are backcrossed to C57BL/6J for >10 generations to homogenize genetic background.

A limitation (but also an advantage) of this model is that, due to the pattern of expression provided by the driver transgenic line (CamKII promoter), which results in forebrain-enriched neuronal hTDP-43-WT expression, part of the phenotype can be restricted due to sparing of other regions (including spinal cord). However, there are other rodent models available with pan-neuronal TDP-43 expression (summarized in Picher-Martel et al., 2016) and comparison of our results with those can provide important clues on regional involvement.

Our results showing differential behavioral susceptibility to transgene suppression in our inducible TDP-43-WT mice stimulate further questions regarding the underlying mechanisms behind these differences. Additional research is warranted, exploring selective regional neurodegeneration and neuroinflammation, as well as the potential role of non-neuronal cells (i.e., astrocytes and microglia) as contributors to this phenotype.

In summary, we show here that not only TDP-43-WT mice recapitulate several core behavioral features of FTD/ALS spectrum of human pathology, but also these behavioral domains display a different time course of onset and sensitivity to transgene suppression. This information is particularly relevant to understand both the time windows of efficacy for potential treatments and the selectivity/sensitivity to TDP-43 dysregulation of the neural circuits underlying the clinical phenotypes displayed by FTD/ALS patients. We also consider that our inducible model is especially relevant to explore the etiology of TDP-43-related FTD, due to the predominant social/cognitive phenotype with early sparing of motor symptoms (which only appear after long-term expression of the transgene; Alfieri et al., 2016). The information provided from this and future studies using this animal model might shed light into the pathological mechanisms of TDP-43 proteinopathies and help devise potential therapeutic avenues for these devastating conditions.

#### REFERENCES


#### ETHICS STATEMENT

This study only involved animal subjects and no human subjects. The information regarding animal subject involvement is included in the manuscript (in the Materials and Methods section) as required.

#### AUTHOR CONTRIBUTIONS

PS and LI planned the design of the experiments and wrote the article. PS and GN carried out the experiments. PS and LI analyzed the data. All authors edited the manuscript. LI conceived and supervised all aspects of the project.

## FUNDING

This work was supported by research grants to LI from the Agencia Nacional de Promoción Científica y Tecnológica (ANPCyT) (PICT 2011-1727 and PICT 2015-0975), the International Brain Research Organization (IBRO Return Home Fellowship), CONICET (PIP 0186), the Fundación Florencio Fiorini, Fundación Alberto Roemmers, and the University of Buenos Aires (UBACyT). LI is a member of Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET). PS and GN were supported by doctoral fellowships from CONICET.

## ACKNOWLEDGMENTS

We would like to thank Dr. Virginia M.-Y. Lee and Dr. John Q. Trojanowski (University of Pennsylvania) for the kind gift of TDP-43-WT mice (these mice were developed through support by NIH grants AG032953 and AG-17586). The authors also thank P. Bekinschtein, J. Belforte, S. Charif, F. Kazanetz, J. Medina, F. Morici, and N. Weisstaub for helpful discussion of the manuscript. We also thank Analía López Díaz and Graciela Ortega for technical help and Jesica Unger and Verónica Risso for husbandry support.

in frontotemporal lobar degeneration and amyotrophic lateral sclerosis. *Biochem. Biophys. Res. Commun.* 351, 602–611. doi: 10.1016/j. bbrc.2006.10.093


frontotemporal lobar degeneration: consensus of the consortium for Frontotemporal lobar degeneration. *Acta Neuropathol.* 114, 5–22. doi: 10.1007/ s00401-007-0237-2


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2019 Silva, Nieva and Igaz. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

digital media

of impactful research

article's readership