The Ever-Evolving Concept of the Gene: The Use of RNA/Protein Experimental Techniques to Understand Genome Functions

The completion of the human genome sequence together with advances in sequencing technologies have shifted the paradigm of the genome, as composed of discrete and hereditable coding entities, and have shown the abundance of functional noncoding DNA. This part of the genome, previously dismissed as “junk” DNA, increases proportionally with organismal complexity and contributes to gene regulation beyond the boundaries of known protein-coding genes. Different classes of functionally relevant nonprotein-coding RNAs are transcribed from noncoding DNA sequences. Among them are the long noncoding RNAs (lncRNAs), which are thought to participate in the basal regulation of protein-coding genes at both transcriptional and post-transcriptional levels. Although knowledge of this field is still limited, the ability of lncRNAs to localize in different cellular compartments, to fold into specific secondary structures and to interact with different molecules (RNA or proteins) endows them with multiple regulatory mechanisms. It is becoming evident that lncRNAs may play a crucial role in most biological processes such as the control of development, differentiation and cell growth. This review places the evolution of the concept of the gene in its historical context, from Darwin's hypothetical mechanism of heredity to the post-genomic era. We discuss how the original idea of protein-coding genes as unique determinants of phenotypic traits has been reconsidered in light of the existence of noncoding RNAs. We summarize the technological developments which have been made in the genome-wide identification and study of lncRNAs and emphasize the methodologies that have aided our understanding of the complexity of lncRNA-protein interactions in recent years.

FIGURE 1 | Major steps in the evolution of the concept of gene.
The existence of genetic traits inherited as discrete entities was first intuited by (Darwin, 1859) and (Mendel, 1866) in the mid-nineteenth century. However, the term "gene" was first used by the plant physiologist and geneticist von Wilhelm Johannsen in his book "Elemente der exakten Erblichkeitslehre." He refers to "conditions, foundations and determiners which are present in unique, separate and thereby independent ways by which many characteristics of the organisms are specified (. . . ) precisely what we wish to call genes" (Johannsen, 1909), thus also ascribing to genes the responsibility for phenotypes. The term was inspired by "pangene, " which was used by Hugo de Vries for entities involved in Pangenesis (Heimans, 1962). This abstract concept was given concrete substance in 1910 by Thomas Hunt Morgan, who showed that genes are subcellular particles residing on specific structures, the chromosomes, which are transmitted through cellular replication (Morgan et al., 1915). Later in 1941, George Wells Beadle and Edward Lawrie Tatum demonstrated that mutations in genes caused errors in specific steps of metabolic pathways. This made it possible for the first time to link the concept of gene to the synthesis of enzymes, yielding the "one gene-one enzyme" paradigm, which was later rephrased into "one gene-one polypeptide" (Beadle and Tatum, 1941). Subsequently, first Oswald Avery and then Alfred D. Hershey and Martha Chase made the association between DNA and genetic material through their studies on bacteriophages (Avery et al., 1944;Hershey, 1955). However, the "geneticbiochemical" conception of the gene had its turning point in 1952, when Rosalind Franklin and Raymond Gosling provided an extremely clear x-ray diffraction of DNA helices. In 1953, James D. Watson and Francis Crick finally uncovered the molecular structure of DNA (Watson and Crick, 1953) and inferred that "the specific pairing (. . . ) immediately suggests a possible copying mechanism for the genetic material." Overall these discoveries led to the realization that the genetic material is made up of a chain of polynucleotides, called DNA, and to the establishment of the central Dogma, which states that polypeptides are translated from RNA which is transcribed from DNA. Together with the Watson-Crick double helix model, the relation between DNA and polypeptide synthesis provided a mechanistic model of gene and gene activity and inaugurated the molecular biology era.
In eukaryotes, the discovery of the diverse modalities of RNA maturation evolved the awareness of colinear relationship of DNA, RNA, and polypeptides. The existence of alternative splicing, 5' and 3'-ends alternative maturation and RNA editing processes allow a single gene to produce multiple proteins by means of a single act of transcription. That only the minority of the transcribed genes in higher eukaryotes encode for proteins suggests that the genome has a potential which extends beyond the discrete coding loci. The rest of the genome (at least 60%) is transcribed independently of its coding capabilities (Berretta and Morillon, 2009;Djebali et al., 2012; Figure 2). These discoveries fuelled debate about the relationship between the number of protein-coding genes and the complexity of the biology of higher organisms (Bickel and Morris, 2006;Mercer and Mattick, 2013). While the number of coding loci remains virtually fixed throughout evolution, the content of noncoding DNA increases. This implies new and as yet unexplored functions for this part of the genome, including the transcription of noncoding RNAs (Rinn and Chang, 2012;Batista and Chang, 2013). This finally changed the concept of the gene from "coding" into "transcriptional unit." FIGURE 2 | Human and Mouse transcriptomes. The tables show the total number of mouse (UP) and human (DOWN) genes (brown) and transcripts (blue), together with their sub-partition (as percentages, %) in different classes of coding and noncoding RNAs. Data were obtained from the current Mouse (version M15) and Human (version 27) GENCODE datasets (Harrow et al., 2006(Harrow et al., , 2012Mudge and Harrow, 2015).
The concept of the gene is evolving so fast that we are not comfortable giving a more fixed definition to the term other than "a strategy used by evolution to allow the survival of life," for those readers who seek a timeless explanation.
Zooming Out: Gene Regulation and the "Omic" Era If the definition of gene is complex, then the issue "how does a gene work?" is even more arduous to resolve. In their studies on the lactose (Lac) operon of E. coli, François Jacob and Jacques Monod provided a first paradigmatic view of genetic (transcriptional) regulation (Jacob and Monod, 1978). The operon discovery marked a crucial point in science, demonstrating that genes do not work as isolated entities. Indeed, although oversimplified, it represents an elegant interpretation of how genes can be regulated in a coordinated fashion in response to environmental conditions. In the last two decades, many efforts have been made to get a more comprehensive answer to the question, "how do genes work together?" A partial answer came when the International Human Genome Sequencing Consortium, in the framework of the Human Genome Project (HGP) provided a more accurate quantification of the number of genes by publishing the complete human genome sequence (Craig Venter et al., 2001;Lander et al., 2001). The quantification of gene loci in the haploid genome was based on the presence of predicted and known Open Reading Frames (ORF). The regions not predicted to be a gene were defined as "junk" DNA (Ohno, 1972;Niu and Jiang, 2013). In the context of the HGP, a large proportion of "junk" DNA emerged unexpectedly as positively selected by evolution and conserved among the human, dog, mouse and other vertebrate genomes (Mouse Genome Sequencing Consortium et al., 2002;Lindblad-Toh et al., 2005). Moreover, a paucity in the number of coding genes emerged. Researchers were surprised to find a figure as low as ∼22,300 loci, especially when compared to the 60,000 genes of the single-cell organism T. vaginalis (Craig Venter et al., 2001;Lander et al., 2001). It had seemed obvious that humans would have more protein-coding genes than plants. However, this was not the case, as the number of protein-coding genes of A. thaliana is approximately the same as that of humans.
These observations suggested that there is more to the genome than protein-coding genes. It anticipated the outcomes of the two main genome analysis projects of the last 15 years, namely FANTOM (Carninci et al., 2005) and ENCODE (Kawai et al., 2001;ENCODE Project Consortium et al., 2007). The aim of the two consortia was to identify and characterize all the functional elements of the mammalian genome and the entire transcriptional landscape. They used multiple sequencingbased approaches such as high throughput cDNA sequencing, Serial Analysis of Gene Expression (SAGE), Cap Analysis Gene Expression (CAGE), Paired End Tags (PET) together with high resolution tiling arrays and Chromatin immunoprecipitation (ChIP) sequencing (Shiraki et al., 2003;Morozova and Marra, 2008). This unprecedented quantity of continuously updated data revealed that the human genome was not a large container of short transcribed sequences interspersed in a genomic desert, but it was rather, pervasively (at least 70%) transcribed in a lattice of transcripts (Figure 2). In addition to the Transcriptional Active Regions (TARs), showing well-defined physical edge, and to alternative transcriptional start sites (TSS), several other genomic regions defined as "regulatory elements" also emerged to be transcribed. In fact, genes appeared to extend into the "intergenic space" giving rise to a plethora of transcripts which numbered five times more than the number of total genes. The transcripts that were not associated with polysomes, thus candidates for a protein-independent function, constituted the class of noncoding RNAs (ncRNAs).
The fact that mammalian genomes are pervasively transcribed is now well accepted. The "junk" DNA interpretation has fallen out of favor, since pervasive transcription and also developmental control of ncRNA expression, together with high promoter conservation, have provided consistent evidence of global functionality (Berretta and Morillon, 2009;Djebali et al., 2012). Indeed, functional studies in knockout mice have provided compelling evidence for the requirement and sufficiency of particular noncoding transcripts for organ development and function (Ripoche et al., 1997;Moseley et al., 2006;Anguera et al., 2011;Nakagawa et al., 2011;Zhang et al., 2012;Sauvageau et al., 2013).

Potentiality and Actuality: Genome and Epigenome
With the exception of lymphocytes, all nucleated human cells contain the same genome. Nevertheless they show very different morphological and functional characteristics depending on cell type, developmental stage, sex, and age. Eric Lander defined the genome as "a landscape (. . . ) a whole geography of distributions (. . . ) a storybook that's been edited for a couple billion years. And you could take it to bed like A Thousand and One Arabian Nights, and read a different story in the genome every night." Thus, each cell of the same individual tells a different story, even if the storybook is the same. This observation can draw on the Aristotelian theory of potentiality and actuality. Potentiality represents the genotype, which contains all the information needed to develop the function, while actuality is the motion of the genotype into a phenotype that is the composite of an organism's observable features. The driver of the switch from potentiality to actuality, however, was unknown. In the late nineteenth century, the question of how a fertilized egg can give rise to a complex organism with cells of varied phenotypes was the object of long debates between two main schools of embryologists. The "pre-formationists, " who thought that each cell contains preformed elements that enlarge during development, while the "epigenesists" thought that chemical reactions among soluble components execute the developmental plan (Felsenfeld, 2014). Indeed, it was by studying the developmental processes that the clear divergence of phenotypes among differentiating cells and tissues and the fact that they are clonally inherited by the dividing cells became evident.
Historically, the term "epigenetics" was introduced by C. Waddington in the 1940s to describe "the interactions of genes with their environment, which bring the phenotype into being" (Waddington, 1940). The physical importance of gene position along the chromosomes was first demonstrated in D. melanogaster by Muller in 1930(Muller, 1930 and defined as "Position-Effect Variegation, PEV." More generally, these studies addressed the functional differences between two different physical states of the DNA: (i) the heterochromatin, which corresponds to regions of the genome that contain low gene density and is transcriptionally inactive and (ii) the euchromatin, which corresponds to regions with a high density of genes and is transcriptionally active (HSU, 1962). Both euchromatin and heterochromatin are associated with specific DNA methylation and histone modification patterns, leading to the existence of an "epigenetic code" which determines specific chromatin states and, by consequence, gene expression (Felsenfeld, 2014). These modifications involve histone-tail chemical modifications, DNA methylation, histone variants and ATP-remodeling complexes, which are crucial for the establishment of the epigenetic landscape and for the appropriate progression of cell differentiation (Yuan, 2012).
Recent studies have demonstrated that environmental and lifestyle factors may influence epigenetic mechanisms (i.e., DNA methylation, histone acetylation and chromatin plasticity) (Weaver et al., 2004;Feil and Fraga, 2012) and that the acquired modifications can also be transmitted through nonDNA sequence-based (transgenerational) hereditability (Grossniklaus et al., 2013;Szyf, 2013;Dias and Ressler, 2014;Bohacek and Mansuy, 2015;Miska and Ferguson-Smith, 2016). This phenomenon represents a powerful means of change as it allows for the modification of phenotypes without genotype changes. Based on these discoveries, we can now assume two different levels of epigenetic regulation. The first ensures the mitotic inheritance of differentiated cellular states during development (Felsenfeld, 2014), while the second ensures transgenerational inheritance. This occurs through meiosis and acts as an additional evolutionary driving force together with natural selection and genetic drift (Grossniklaus et al., 2013;Miska and Ferguson-Smith, 2016).

NONCODING RNAS: NEW PLAYERS IN OLD PROCESSES
From the "Protein Centric" View to Non-Coding RNAs as Functional Molecules Understanding of the functional importance of RNA started with the discovery of messenger RNA (mRNA) (Brenner et al., 1961;Jacob and Monod, 1961), ribosomal RNA (rRNA) (Scherrer and Darnell, 1962;Scherrer et al., 1963) and transfer RNA (tRNA) (Hoagland et al., 1958). Other classes of relatively small ncRNAs were later identified and characterized such as the small nuclear RNA (snRNA) (Wassarman and Steitz, 1992), the small nucleolar RNA (snoRNA) (Bachellerie et al., 2002), the piwi-interacting RNA (piRNAs) (Cox et al., 1998), the microRNA (miRNA) (Lee et al., 1993) and the small interfering RNA (siRNA) (Fire et al., 1998;Hamilton and Baulcombe, 1999). The versatility of RNA functions was further emphasized in the 1980s, when Thomas Robert Cech discovered ribozymes and "established that RNA, like a protein, can act as a catalyst in living cells" (Kruger et al., 1982). The significance of this work was recognized with the 1989 Nobel Prize in Chemistry.
In the early 1990s the discovery of H19 (Brannan et al., 1990) and Xist (Brockdorff et al., 1992;Brown et al., 1992) uncovered the existence of functional long noncoding RNAs (lncRNA) involved in epigenetic regulation. The abundance of this class of noncoding transcripts was revealed by the advent of deep-sequencing approaches. Many common features have been observed between lncRNAs and mRNAs. For instance, lncRNA loci display analogous genetic marks at their regulatory or transcribed regions and are bound by the RNA polymerase II (Pol II). In addition, similarly to mRNAs, lncRNAs contain introns and present a 7-methylguanosine cap at their 5 ′ end and a poly(A) chain at their 3 ′ end. Despite their similarities, lncRNAs were primarily considered to be the sub-product of transcriptional noise resulting from low RNA Polymerase fidelity, transcription initiation leakiness (Struhl, 2007) or incidental transcription at enhancer regions (De Santa et al., 2010). This was probably due to their low levels of expression and sequence conservation (Mercer et al., 2011;Nitsche and Stadler, 2017) together with the lack of loss-of-function studies.
LncRNAs display a dynamic pattern of expression in differentiation and development, the specific binding of transcription factors on their promoters, and the presence of peculiar chromatin signatures, such as DNase1 hypersensitivity sites and H3K9ac, H3K4me3, and H3K36me3 histone modifications in cells where they are transcribed (Kawai et al., 2001;Guttman et al., 2009;Rinn and Chang, 2012;Kung et al., 2013). Moreover they show high tissue specificity (Gloss and Dinger, 2016) and their misregulation has been associated with several pathological states (Shi et al., 2013;Tang et al., 2013;Ounzain and Pedrazzini, 2015;Uchida and Dimmeler, 2015;Ballarino et al., 2016).

Mechanistic Examples of Long Non-Coding RNAs
In recent years, most lncRNA experimental biology has been observational. Indeed, while high-throughput sequencing approaches have provided comprehensive catalogs of lncRNA genes and transcripts, a bottleneck exists between the large datasets and poor validation methods. Indeed, to date, only a limited number of lncRNAs have been functionally characterized (Johnsson et al., 2014).
The high-sensitivity interactome techniques developed in recent years which have made it possible to map RNA/RNA RNA/protein RNA/DNA interactions (Kashi et al., 2016). Thanks also to recently developed computational tools (Huarte et al., 2010;Agostini et al., 2013;Cheng et al., 2015;Suresh et al., 2015;Ribeiro et al., 2017), several examples of lncRNA mode of action have started to emerge (Figure 3). LncRNAs can be detected in the nucleus, cytoplasm, or both. Cytoplasmic lncRNAs usually act at post-transcriptional level by regulating the stability and/or the translation of target mRNAs. Thus, both BACE1 AS (Faghihi et al., 2008) and TINCR (Kretz et al., 2013) have been shown to increase the stability of their target RNAs while, the group of cytoplasmic 1/2sbsRNAs, act in the opposite way by favoring STAU1-mediated decay via Alu elements (Gong and Maquat, 2011;Wang et al., 2013). An alternative mode of action regards the regulation of mRNA translation by means of complementary base pairing. Examples include the regulation mediated by Uchl1as1 (Carrieri et al., 2012) and p21 (Yoon et al., 2012) lncRNAs. Cytoplasmic lncRNAs can also act as competing endogenous RNAs (ceRNAs) (Cesana et al., 2011) and function as "miRNA sponges" to protect the target mRNAs from repression. Recently, an additional example of ceRNA was found in the newly identified class of circular RNAs (circRNAs). This is the case of the circular ciRS-7 transcript, which contains more than 70 selectively conserved miR-7 target sites (Hansen et al., 2013;Memczak et al., 2013;Piwecka et al., 2017). Despite their common ribosome occupancy (Guttman et al., 2013), lncRNAs are defined as transcripts that do not encompass translation. However, two recent examples of lncRNAs that encode for short and functional micropeptides have been described (Anderson et al., 2015;Nelson et al., 2016) suggesting a coding-based mechanism of action for these ncRNAs.
Although several archetypes of cytoplasmic species have been described, most lncRNAs are predominantly found to be enriched in the nucleus and in particular associated with chromatin (Derrien et al., 2012). This observation supports the idea that many lncRNAs are engaged in the epigenetic and transcriptional regulation of gene expression Morlando et al., 2014). Also in the case of nuclear lncRNAs, some common modes of action have emerged and  (Hacisuleyman et al., 2014). In the cytoplasm, lncRNAs can (E) regulate mRNA translation and stability (Gong and Maquat, 2011;Kretz et al., 2013;Wang et al., 2013), (F) act as sponges for other transcripts or proteins (Gong et al., 2015), (G) serve as micropeptide templates (Anderson et al., 2015;Nelson et al., 2016).
been classified by the scientific community (Kung et al., 2013;Fatica and Bozzoni, 2014;Kashi et al., 2016). They may work either in cis, when they act in the vicinity of their transcriptional locus or in trans, when they act at a distance in the regulation of intra or inter chromosomal loci (Rinn and Chang, 2012). Recognition of the target regions by lncRNAs can occur through different mechanisms such as bridging proteins, RNA-RNA or RNA-DNA hybrids, including triple helix or R-loop formation (Engreitz et al., 2016a). The ability of lncRNAs to act as scaffold molecules allows them to interact simultaneously with several molecular components and chromatin remodeling complexes (Tsai et al., 2010). As guide molecules, they have the ability to recruit functional protein complexes to conduct them directly to specific target loci (Lanz et al., 1999;Rinn et al., 2007;Kino et al., 2010;Di Ruscio et al., 2013). LncRNAs can also influence their targets indirectly. For instance, as decoy molecules they can change the availability of transcription factors and as a consequence, reduce DNA binding capacity (Wang et al., 2008;Kino et al., 2010;Guttman and Rinn, 2012;Rinn and Chang, 2012;Batista and Chang, 2013). The participation of lncRNAs in the formation of nuclear domains has long been the subject of speculation (Nickerson et al., 1989;He et al., 1990). Today several lncRNAs have been shown to act as chromosomal architects for the spatial coordination of gene expression (Korostowski et al., 2012;Hacisuleyman et al., 2014;Zhang H. et al., 2014;Engreitz et al., 2016b). Examples include Xist and FIRRE. Xist promotes X inactivation by acting on the three-dimensional organization of the X chromosome (Splinter et al., 2011;Engreitz et al., 2013;Giorgetti et al., 2016). Another example is the lncRNA FIRRE, which controls murine adipogenesis by promoting the formation of inter-chromosomal domains among functionally related genes (Hacisuleyman et al., 2014).
A widely debated mechanism of nuclear lncRNA action concerns their ability to function in the recruitment of chromatin modifying complexes for the regulation of chromatin states. A paradigmatic example is represented by HOTAIR (Rinn et al., 2007), which has been demonstrated to act as a scaffolding molecule, capable of interacting simultaneously with the PRC2 silencing complex at its 5'-end and with the LSD1/CoREST/REST complex at its 3'-end (Tsai et al., 2010). Similarly to HOTAIR (Rinn et al., 2007) also Xist (Zhao et al., 2008), Bvht (Klattenhoff et al., 2013), Kcnq1ot1 (Pandey et al., 2008), FENDRR (Grote et al., 2013), CARMEN , and Chaer (Wang et al., 2016) represent other examples of lncRNAs where the interaction with PRC2 has been proposed.
A more systematic search has revealed that a vast number of transcripts interact with PRC2 and that siRNA mediated depletion of certain lncRNAs associated with PRC2 leads to changes in gene expression (Khalil et al., 2009;Davidovich et al., 2013;Kaneko et al., 2013;Beltran et al., 2016). However the functional interaction between PRC2 and lncRNAs is still under debate. In particular two recently published papers (Amândio et al., 2016;Blanco and Guttman, 2017;Portoso et al., 2017) have demonstrated that the binding with the PRC2 complex is dispensable for HOTAIR function. This evidence has raised a number of concerns regarding the specificity and the functional relevance of this interaction (Blanco and Guttman, 2017). A second example is represented by Xist, which is known to coordinate X chromosome inactivation (XCI) by silencing transcription through the recruitment of several chromatin modifying complexes, including PRC1 and PRC2 (Wutz et al., 2002;Schoeftner et al., 2006;Zhao et al., 2008;Wutz, 2011;Almeida et al., 2017;Pintacuda et al., 2017). However, doubt was recently cast on this paradigm of epigenetic regulation following a number of studies showing that ablation of different PRC2 components has no impact on Xist-mediated transcriptional silencing (Kalantry and Magnuson, 2006;Schoeftner et al., 2006). On the other hand, the deletion of the A-repeat region of Xist, which is required for Xist-mediated transcriptional silencing (Wutz et al., 2002;Zhao et al., 2008), does not preclude PRC2 recruitment to the X chromosome (Plath et al., 2003;da Rocha et al., 2014). Moreover, several observations also argue against a direct interaction between Xist RNA and PRC2 proteins (Cerase et al., 2014).The indication supporting the broad lncRNA binding with PCR2 has also recently been revisited by the latest discoveries from Guttman's lab, which show that the purification of the Xist lncRNA through UV-crosslinking-based strategies failed to identify its direct interaction with PRC2 (McHugh et al., 2015). Overall, this evidence has brought into question the requirement of PRC2 for lncRNA functioning. It suggests that there may as yet be unknown mechanisms of gene regulation in vivo, possibly acting without the need for their close proximity. Finally, in line with the high functional heterogeneity of lncRNAs, a completely novel mechanism of regulation has been proposed by Olson's lab which holds that the action of the Hand2-associated Uph lncRNA transcription, and not the transcript itself, is the only functional requirement for gene regulation and heart development .
The aforementioned examples represent only the tip of a regulatory RNA iceberg in which lncRNAs together with their interactors are involved. What clearly emerges from these studies is the vagueness of some of the mechanisms proposed, which highlight the need to develop more reliable techniques to better define the bona fide interactions that occur in living cells. Although the molecular comprehension of these noncoding mechanisms of action has improved considerably in recent years, significant efforts are still required.

NEW TECHNOLOGIES FOR THE STUDY OF LNCRNAS/PROTEIN INTERACTIONS
Because of the existence of a wide spectrum of transcripts, RNA-protein interactions represent a conspicuous part of the interactome and fully understanding them is among the most ambitious goals of RNA researchers. Many variables have been shown to control the interactions between RNA and proteins. These include the binding affinity of the protein for the RNA substrate, the concentration of the respective binders, and the competition with other RNAs and/or proteins (Jankowsky and Harris, 2015). Moreover, as specific sets of RNAs and proteins can be localized to defined areas, also subcellular localization and compartmentalization can influence the occurrence of interactions. All together, these variables establish the homeostatic equilibrium on which the biological complexity is established.
Long non-coding RNAs often exert their functions by binding one or more proteins. Hence, the identification of lncRNAproteome contacts is a crucial step toward the understanding of the functional mechanisms in which these noncoding molecules act. Several lncRNAs have been shown to form complexes with proteins in both the nuclear and the cytoplasmic compartments . Thus, it would be reasonable to assume that other RNA-protein interactions might form similar complexes that affect biological functions. Despite their biological importance, RNA-protein interactions are much less well characterized than those for DNA-protein complexes.
Several computational algorithms have been developed to predict the interaction probability of a particular RNA-protein pair by taking advantage of the structural information obtained experimentally and mostly available on the Protein Data Bank (PDB) databases (Berman et al., 2000). These algorithms have been applied to create computational tools and web servers to predict RNA/protein interaction partners. A detailed description of these tools is beyond the scope of this review but those readers who seek more insights will find it useful to read the manuscript from the Dobbs' lab (Muppirala et al., 2013) and book chapters (Chang, 2012;Meller and Porollo, 2012). More recently, an innovative large-scale pipeline has been developed by Ribeiro and colleagues to identify candidate lncRNAs acting as scaffolding molecules for protein complexes (Ribeiro et al., 2017). By using a catRAPID-based omics algorithm (Agostini et al., 2013), these authors have predicted a number of 847 lncRNAs (∼5% of the lncRNA transcriptome) capable of scaffolding half of the known protein complexes and network modules (Ribeiro et al., 2017). This suggests, for the first time, that the lncRNA-mediated scaffolding of protein complexes and modules is a common mechanism in human cells.
Despite the huge potential of these computational methods, experimental validation is essential to verify the occurrence of the predicted interactions in vivo. These experimental approaches can be classified into two main categories, namely RNA-centric or protein-centric methods (Figures 4, 5). The first approach relies on the ability to isolate and purify a specific RNA to subsequently check for the interacting proteins (Lingner and Cech, 1996;Hogg and Collins, 2007;Rinn et al., 2007;Klattenhoff et al., 2013;Yang et al., 2014;McHugh et al., 2015;Wang et al., 2016). In contrast, the protein-centric techniques are based on the immunoprecipitation of a specific protein followed by the analysis of co-precipitated RNAs (Mili, 2004;Wang et al., 2009;Haecker and Renne, 2014). A challenging point that needs discussion consists in the ability of such methods to discriminate between in vivo (in the cells) and in vitro occurring interactions FIGURE 4 | RNA-centric purification methods for the analysis of the RNA-protein interactions. A schematic of exogenous (Left) vs. endogenous (Right) RNA pull-down methods. In the exogenous RNA pull-down the transcript of interest is tagged (i.e., biotinylation) by in vitro transcription. The co-purified proteins are collected and analyzed by Western Blot or Mass-spectrometry analyses. Endogenous RNA pull-down uses native or cross-linked conditions. The experimental procedure is conceptually the same except that, in the case of the cross-linked approach, a set of 90 nucleotides (90 nt) long biotinylated probes is used. See text for further details. (Jankowsky and Harris, 2015). For this reason, many crosslinkedbased methods have been recently developed. These can be used in alternative to the native approaches to reduce the possibility that nonphysiological background interactions can occur in vitro during the experimental procedures (Kalantry and Magnuson, 2006;Xue et al., 2016;Portoso et al., 2017).

RNA-Centric Purification Methods
There are two main categories of RNA-centric purification methods (Figure 4): (i) the exogenous RNA pull-down, which based on in vitro RNA affinity capture methods and (ii) the endogenous RNA pull-down, which is based on the purification of the endogenous transcript under native or ultraviolet (UV) cross-linking conditions. In the first method (Figure 4, left), the candidate RNA is transcribed in vitro as fused to an aptamer and then incubated with nuclear, cytoplasmic or total protein extracts. The newly-formed RNA-protein complexes are purified on a solid support or resins containing proteins or different organic molecules depending on the tagging strategy used (i.e., streptavidin, MS2 viral coat protein) (Bardwell and Wickens, 1990;Slobodin and Gerst, 2010). Finally, aspecific binders are removed from the solid support following stringent washes and the RNA-protein complexes eluted by boiling in sodium dodecyl sulfate (SDS)-containing buffers (Srisawat and Engelke, 2001;Carey et al., 2002;Huarte et al., 2010;Lee et al., 2013;Leppek and Stoecklin, 2014;Liu et al., 2017). This method makes the study of transcripts with low expression possible because it favors protein yield. However, the use of a synthetic RNA to capture proteins and the fact that the binding between RNAs and proteins occur in vitro, can lead to possible artificial interactions. In addition, RNA folding was shown to be regulated and strongly influenced by cellular environment and by the interaction with specific chaperon proteins (Schroeder et al., 2002). As the function of RNA is deeply influenced by its structure and since the in vitro folding may be significantly different from the in vivo conditions (Schroeder et al., 2002;Fallmann et al., 2017;Leamy et al., 2017), potential RNA misfolding occurring in vitro can interfere with results reliability. For all these reasons, data need to be carefully checked and the use of alternative methods is strongly recommended.
The native endogenous RNA pull-down (Figure 4, right) is based on the purification of the endogenous RNA transcript by using a set of antisense biotinylated probes. This set of probes is incubated with the cellular extract in specific buffer conditions that allow their base-pairing with the RNA target. The mixture is then incubated with streptavidin-coated magnetic beads which specifically bind the biotinylated oligos-target RNAprotein complex allowing its precipitation (Zielinski et al., 2006;Tsai et al., 2011;Legnini et al., 2014;McHugh et al., 2015;Ribeiro et al., 2017). The beads are finally washed and the precipitated RNA-protein complexes analyzed. In comparison to the previous method, here the physiological RNA-protein interactions occurring in vivo are better preserved. However, also in this case, the occurrence of background interactions cannot be excluded (Portoso et al., 2017). Furthermore, if the levels of the target RNA in the cell are low, its purification can be more challenging. Finally, as RNA is not a linear molecule and can be engaged in some parts in protein binding, a wide range of probes covering the entire sequence must be tested in order to find free regions available to probes.
The inability to discriminate between specific and contaminating interactions constitutes the weakest point of the above-mentioned methods. In order to identify only the physical contacts that occur in vivo, the endogenous RNA pull-down can be implemented by the addition of an ultraviolet crosslinking step followed by denaturing (up to 8M of urea) washing conditions (Baltz et al., 2012;McHugh et al., 2015). This approach makes it possible to discard all the contaminating RNA binding proteins that can be abundant in native purifications. This is because the use of short UV light wavelengths (usually 254 nm) induces the formation of covalent bonds only between RNA and interecting proteins Baltz et al., 2012).
The choice of a negative control constitutes an important step for ensuring robustness to the results. The use of an antisense RNA is considered the best negative control for the exogenous RNA pull-down (Rinn et al., 2007;Klattenhoff et al., 2013). In the endogenous RNA pull-down the negative control consists of a set of probes that do not target any endogenously expressed RNA sequence (Legnini et al., 2014;Ribeiro et al., 2017). In the case of the cross-linked endogenous RNA pull-down, the ideal negative control would be a noncrosslinked sample and/or a specific RNA whose interactors are known (Baltz et al., 2012;McHugh et al., 2015).

Protein-Centric Purification Methods
Protein-centric methods represent a complementary approach and consist of the immunoprecipitation of the protein of interest with specific antibodies followed by the analysis of co-precipitated RNAs (Figure 5). In recent years two main protein-centric methods have been developed Wang et al., 2009;Haecker and Renne, 2014), namely the native RNA ImmunoPrecipitation (nRIP) (Figure 5, left) and the crosslinking immunoprecipitation (CLIP) (Figure 5, right) approaches. The first approach is usually performed in native conditions  and enables the purification of those RNAs which are stably associated, in their natural condition, with the protein complexes. The analysis of co-precipitated RNAs is then performed by quantitative realtime PCR (qPCR) or by RNA-seq. Examples of lncRNAs interactors identified by nRIP are represented by FENDRR (Grote et al., 2013), HOTTIP  Mhrt, (Han et al., 2014), CARMEN Grote et al., 2013;Han et al., 2014;. In spite of its wide use, nRIP presents a number of limitations and the incidence of contaminating contacts can always occur upon cell lysis (Mili, 2004;Schoeftner et al., 2006;Engreitz et al., 2016b;Blanco and Guttman, 2017). As regards for the RNA-centric methods, false positive contacts can be significantly reduced by the use of short (254 nm) UV irradiation wavelengths, which is the principle of the (UV) Cross-Linked ImmunoPrecipitation (CLIP) approach. It differs from nRIP as, once covalently bound, the protein of interest is immunoprecipitated under stringent conditions after a partial RNAase digestion. The protein-RNA complexes are indirectly labeled by γATP incorporation, denaturated in SDS-containing buffers, size selected by electrophoresis and transferred to nitrocellulose membranes. Proteins are then digested with proteinase K and the recovered RNA ligated to an adapter, reverse transcribed and analyzed by qRT-PCR (Ule et al., 2003).
Since 2003, when the first CLIP approach was used on brain tissues (Ule et al., 2003), many variants of CLIP-based approaches have been developed which combine CLIP experiments with high-throughput sequencing (HITS)-CLIP (also known as CLIPseq) (Licatalosi et al., 2008;Chi et al., 2009;Guil et al., 2012). Since the determination of the exact RNA/protein binding site is crucial, some upgraded protocols have been developed which enable mapping of the contact region at single nucleotide resolution. The Photo Activatable Ribonucleotideenhanced (PAR) CLIP (Spitzer et al., 2014) takes advantage of U to C transition induced by UV crosslinking (365 nm) after the incorporation of a nucleotide analog (i.e., 4'-thiouracil) which is provided during cell culturing. The weakness of this technique is that it cannot be applied to primary tissues since the incorporation of nucleotide analogs occurs during cell replication. The identification of binding sites can also be achieved by the use of alternative CLIP approaches, such as crosslinking and analysis of cDNA (CRAC) (Bohnsack et al., 2012) and the individual-nucleotide resolution UV crosslinking and immunoprecipitation (iCLIP) (König et al., 2010;Huppertz et al., 2014). In these approaches, the binding of the protein to the RNA causes occasional reverse transcription arrest (iCLIP), or transcription errors (deletions or substitutions) (CRAC) which makes it possible to map the protein/RNA binding site at single nucleotide resolution upon sequencing.
CLIP-seq approaches are technically demanding and can produce many sequencing artifacts. In order to increase the library generation efficiency thus enhancing the discovery of bona fide binding sites, an enhanced-CLIP protocol (eCLIP) has recently been developed. In iCLIP experiments, occasional transcription arrests are mapped through circular ligation based methods. Since this step is often inefficient, Van Nostrand and collaborators have modified this part with the addition of two different adapters in two separate steps. The 3' RNA adapter is ligated directly to the crosslinked RNA after immunoprecipitation, while the 3 ′ single-stranded DNA adapter is ligated after reverse transcription to the 3'end of the cDNA (Van Nostrand et al., 2016). These modifications are able to maintain the high-resolution of iCLIP increasing at the same time the efficiency of the adapters ligation. This consequently results in an improvement of the library preparation of the purified RNA fragments, resulting in the enhanced technical and biological reproducibility.
Finally, in order to share the huge amount of data produced by these experiments, many CLIP-seq database are now available. These represent a powerful tool for the identification of this type of interactions, taking advantage of the CLIP sequencing data published by the scientific community (Yang et al., 2015). In addition to CLIP experiment datasets, in the recent years other bioinformatics tools have been developed that allow to explore miRNA/mRNA, RNA-RNA, RNA/Protein physical interactions Anders et al., 2012;Zhang X. et al., 2014).

CONCLUDING REMARKS
In eukaryotes, the expression of genes is controlled at different stages, from the chromatin accessibility of DNA to the RNA transcription, processing and translation. Evolutionary pressure selected a large variety of regulatory means which act to finetune gene expression through sophisticated RNA and protein machineries. Overall these processes helped to explain some of the differences observed among different species with relatively similar number of genes. The advent of large-scale analyses of mammalian transcriptomes expanded these regulatory options and revealed that the transcriptional landscape of all organisms is far more intricate than initially imagined. A great part of the genome is pervasively transcribed into a diverse collection of RNAs. These can be divided into the following categories: protein coding (mRNAs), structural (i.e., rRNAs, snRNAs, snoRNAs) and regulatory (i.e., miRNAs, lncRNAs, circRNAs) RNAs.
LncRNAs represent the most recently discovered class of regulatory RNAs which exert their roles through a variety of mechanisms without being translated into proteins. The interaction with proteins enables RNAs to operate through distinct means and to exert a wide-range of functions across diverse biological processes (Figure 6). One intriguing aspect of lncRNAs is their modular structure and their capability to act as scaffold to facilitate different molecular interactions. Thus, mapping the lncRNA-protein contacts remains one of the most significant challenges to understanding their biological roles more deeply. However, despite great progress in the interactomic field, the ability to discriminate between false positive and true interactors is still a significant challenge that needs to be addressed in order to increase the efficiency and the reproducibility of the different approaches. The use of complementary approaches and multiple replicates still constitute the best strategy for validation and enhances the robustness of the interactions identified.
In the attempt to find order within the RNA landscape we could define pervasive transcription as a cocktail of disordered sounds that are selected by evolution to be turned into music (Figure 6). This is in perfect keeping with the Dawkins' "selfish gene" theory (Wade and Dawkins, 1978) where the "gene, " FIGURE 6 | Functional ribonucleoproteins orchestrating gene expression. Different classes of coding (mRNA), structural (tRNA, rRNA, snRNA, snoRNA) and regulatory (miRNA, piRNA, lncRNA) RNAs produced by pervasive transcription. Some of these transcripts have been positively selected by evolution and turned into functional molecules through interaction with protein machineries. The pentagram shows a fragment of Paganini's "Caprice" no. 24. whatever this means, is considered as the unique substrate for selective pressure throughout evolution.

AUTHOR CONTRIBUTIONS
AC wrote the manuscript and selected the literature. MB proposed the topic, wrote the manuscript and reviewed the text.

ACKNOWLEDGMENTS
The authors are grateful to Pietro Laneve, Fabio Desideri and Prof. Irene Bozzoni for their helpful contributions and their critical reading of the manuscript and to Christine Tracey for the English proofreading. This work was partially supported by a grant from Sapienza University (prot. RM11715C7C8176C1) to MB.