OPINION article

Front. Genet., 20 October 2020

Sec. Systems Biology Archive

Volume 11 - 2020 | https://doi.org/10.3389/fgene.2020.574737

Small Open Reading Frames: How Important Are They for Molecular Evolution?

  • 1. Institute of Biodiversity and Sustainability, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil

  • 2. National Institute of Science and Technology in Molecular Entomology, Rio de Janeiro, Brazil

Introduction

Small Open Reading Frames (small ORFs/sORFs/smORFs) are important sources of putative peptides previously dismissed as being non-functional or junk DNA, as determined by early gene prediction methods. In fact, smORFs of <100 codons are possible coding sequences but sufficiently small to occur very frequently and randomly in genomes; thus, the detection of their coding potential and functional assessment is similar to a walk in the dark. Furthermore, while dozens of smORF peptides have been recently described as essential players in biological processes, many are reported to be potential non-functional products of junk DNA under pervasive translation, leading to the question: from what perspective is this lack of function assessed? In this context, it was recently suggested that non-functional smORF peptides might play a major role during de novo protein coding gene birth, but the evolutionary mechanism is still unclear. Thus, the role of pervasive translation of smORFs in molecular evolution remains puzzling. Here, we present interesting questions for debate and further investigation about the perspective of non-functional smORF peptides as underappreciated hotspots of molecular evolution in eukaryotes.

Small Open Reading Frames: A Subtopic in the Discussion of Junk DNA Function

With respect to the evolution of molecular function, part of the DNA elements accumulate mutations by genetic drift; thus, the evolution of these elements is non-adaptive and neutral (Ohta, 2002). In some cases, the amount of neutrally evolving elements in junk DNA are analogous to the items on a menu available to natural selection (Knibbe et al., 2007; Faulkner and Carninci, 2009; Lynch et al., 2011). Interestingly, it was reported by the ENCODE consortium (the Encyclopedia of DNA Elements) that most of the human junk DNA exhibits some type of biochemical activity (ENCODE Project Consortium, 2012), but lacking adaptive relevance and selective pressure (Doolittle, 2013; Graur et al., 2013). Importantly, junk DNA represents 75–90% of the human genome (Graur, 2017).

Part of the junk DNA menu is composed of neutrally evolving smORF peptides. For instance, thousands of non-coding RNAs are generated by the extensive transcription coverage on junk DNA (ENCODE Project Consortium, 2007). Increasing evidence shows that thousands of smORFs undergo pervasive translation in transcripts annotated as non-coding or in untranslated regions (UTR) of mRNAs (e.g., Aspden et al., 2014; Ingolia et al., 2014). Interestingly, non-coding RNAs and ORFs lacking homologs were reported to be candidates for de novo evolution of protein coding genes (Tautz and Domazet-Lošo, 2011). Moreover, it was recently suggested that neutrally evolving smORF peptides might play a major role in this process (Ruiz-Orera et al., 2018), but the evolutionary mechanism remains to be determined (Ruiz-Orera et al., 2018; Singh and Wurtele, 2020). In this context, two previously proposed concepts used to discuss molecular function evolution are at the core of the junk DNA debate: “causal roles” and “selected effects” (Doolittle and Brunet, 2017), which will be discussed here in the context of smORFs and protein coding gene birth.

The “causal role” describes the activity performed by a neutrally evolving element by chance. For example, a hypothetical genomic sequence generated by a random nucleotide mutation to resemble a TATA box may be recognized and bound by transcription factors but does not trigger gene transcription (Griffiths, 2009; Graur et al., 2013). In other words, “causal roles” are non-adaptive phenotypes, their emergence is random, and they tend to rapidly disappear during evolution. On the other hand, “selected effects” describe the acquisition of adaptive phenotypes based on natural selection (Graur et al., 2013), such as canonical TATA boxes or ORFs that are translated into important proteins. In other words, “selected effects” are functionally relevant for cells.

Importantly, while natural selection drives adaptive evolution (selected effects), it is widely accepted that genetic drift drives junk DNA evolution, as well as the synonymous modifications in coding DNA sequences (CDS) and mutations in UTRs of mRNAs (Ridley, 2004).

Discussion

Applying the aforementioned evidence and concepts, we discuss here a possible eukaryotic mechanism by which neutrally evolving smORFs advance proteome evolution and the evolutionary significance of smORFs.

Firstly, part of the roles performed by neutrally evolving smORF peptides possibly transit from “causal roles” to “selected effects” under environmental pressure, thereby exposing their neutral phenotypes to natural selection and triggering the evolution of new coding genes. Thus, when neutral smORF peptides are selected, they are no longer neutral (Ruiz-Orera et al., 2018). In other words, neutral smORF peptides may be special entrees on the junk DNA menu that are available for natural selection (Figure 1A).

Figure 1

Upon smORFs being selected for, they probably contain low adaptive relevance due to their non-coding transcript characteristics, such as low translation rate, lack of 3′-terminal processing and other suboptimal coding features (non-coding RNA features are reviewed in Quinn and Chang, 2016). This hypothesis is based on the fact that hundreds of smORFs are described as highly conserved but display low expression, low translation efficiency and are observed in transcripts with non-coding characteristics (Cabili et al., 2011; Aspden et al., 2014; Bazzini et al., 2014). However, the nearly neutral theory (Ohta, 2002) suggests that non-coding parts of fixed smORF transcripts are modified by random genetic drift, in some cases, producing small advantageous (or disadvantageous) adaptive effects throughout evolution; thus, we propose that, at a certain point, these modifications refine and elevate the coding potential of smORF transcripts and consequently enhance the adaptive relevance of their peptides, as seen in a large number of important smORF peptides recently discovered (e.g., Magny et al., 2013; Anderson et al., 2015; Lauressergues et al., 2015; Nelson et al., 2016; Pengpeng et al., 2017; Kim et al., 2018; Polycarpou-Schwarz et al., 2018; Chugunova et al., 2019; Tobias-Santos et al., 2019; Pang et al., 2020; Vassallo et al., 2020). Importantly, the acquisition of several optimal coding features might be favored after the smORF has been selected for, because modifications driven by genetic drift could be fixed by natural selection if they improve the translation efficiency of the newly selected smORF. Before the smORF has been selected for, eventual optimal coding features acquired in the nucleotide sequence could rapidly disappear during genetic drift evolution without fixation. Alternatively, nucleotide changes may negatively affect the coding potential and silence a gene. Optimal coding features include structural stabilization, emergence of Kozak consensus, internal ribosome entry sites (IRES), coverage by enhancers and, in some cases, the elongation of coding smORFs to enlarge the CDSs (coding DNA sequences) (Figure 1B). Recently, Couso and Patraquim (2017) proposed that at least a portion of functional smORFs are potential de novo precursors of large CDSs via a stop codon mutation pattern called “CDS elongation.”

Considering the supposition that the action of evolution is gradual, we propose that the aforementioned process be called “coding potential maturation” (Figure 1B). For example, smORF translation is widely reported in transcripts with long non-coding RNA (lncRNA) characteristics (Crappé et al., 2013; Ingolia et al., 2014; Ji et al., 2015; Mackowiak et al., 2015; Li et al., 2018; Lu et al., 2019). These lncRNAs exhibit smORF conservation in divergent species, hinting at natural selection fixation and indicating coding immaturity.

Another potential pathway of coding gene generation occurs via alternative smORFs in UTRs or overlapping the reference CDS of canonical mRNAs. In this scenario, alternative smORFs undergo pervasive translation or the act of translation itself is important for cis-regulatory purposes (Vanderperre et al., 2013; Wu et al., 2020). If the “causal roles” performed by neutrally evolving smORF peptides become “selected effects,” the alternative smORFs would generate independent gene units by retrotransposition, or they would be fixed as alternative smORFs in the original transcripts (Figure 1B). Hence, during retrotransposition events, at least a portion of the transcripts investigated on the basis of pseudogenization may, in fact, represent the maturation of new coding genes, as suggested by a report that pseudogenes can be translated into highly conserved smORF peptides (Ji et al., 2015).

smORFs might be sequence reservoirs potentially activated during the evolution of new phenotypic variations, especially during speciation. Importantly, speciation events have been associated with the evolution of new molecular phenotypes and new relationships with the environment (Bao et al., 2018). Thus, the amount of junk DNA and lncRNAs in cells deserves investigation not only as a random accumulation of sequences and translational noise but also as a repository of substrates to advance the evolution of new coding genes. Interestingly, polyploidization, or whole genome duplication (WGD) events, have been correlated with an increase in the adaptive potential of cells and organisms exposed to stressful conditions (Van De Peer et al., 2017). Unfortunately, thus far, studies of WGD have neglected the role and retention of smORFs during evolution, probably due to methodological difficulties in smORF identification.

However, the sequencing of several genomes based on comparative approaches has recently opened new avenues for smORF research. For instance, recent evolutionary studies performed by our group on the smORFs in the mille-pattes/tarsalless/polished rice (mlpt) gene, the most well-known smORF-containing gene in insects (Savard et al., 2006; Kondo et al., 2007; Pueyo and Couso, 2008, 2011; Cao et al., 2017; Ray et al., 2019), showed that a new ~80 amino acid smORF (smHemiptera) appeared during Hemiptera evolution (Tobias-Santos et al., 2019). Thus, this smORF in the polycistronic mlpt mRNA has been conserved for over 250 million years in the group, and it is not present in the genomes of other insect orders. We expect that new comparative analyses of genomes in the future will yield additional examples of order-specific smORFs, which might constitute an underappreciated reservoir of new genes and evolutionary innovations.

In summary, the study of smORFs has been considerably increasing during the last 5 years because of recent discoveries of important smORF peptides. Accordingly, the advent of ribosome profiling has allowed the discovery of many neutrally evolving and potentially non-functional smORFs undergoing pervasive translation, whose significance remains to be determined (Crappé et al., 2013; Aspden et al., 2014; Bazzini et al., 2014; Olexiouk et al., 2016). In this context, the intriguing question is posed: why would cells spend energy on transcription and translation of neutral and non-functional elements? There is probably more than one answer; however, considering the subjects discussed in this paper, we propose the following perspective: what if the pervasive translation of neutrally evolving smORF peptides composes an elegant mechanism to advance proteome evolution, especially during speciation events? If it does, then non-functional smORF peptides display an important function in an evolutionary sense. Based on this discussion, we suggest that the concept of functionality be revised in the context of smORFs.

Statements

Author contributions

DG-A and RN contributed equally to the writing of this manuscript. RN contributed to funding acquisition. All authors contributed to the article and approved the submitted version.

Funding

RN was supported by CNPq (307952/2017-7 and 431354/2016-2) and FAPERJ (E-26/210-150/2016, E-26/203.298/2016, E-26/202.605/2019, and E-26/211.169/2019). DG-A was a master's student of PPG-PRODBIO-UFRJ/Macaé (CAPES scholarship).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  • 1

    AndersonD. M.AndersonK. M.ChangC. L.MakarewichC. A.NelsonB. R.McAnallyJ. R.et al. (2015). A micropeptide encoded by a putative long noncoding RNA regulates muscle performance. Cell160, 595606. 10.1016/j.cell.2015.01.009

  • 2

    AspdenJ. L.Eyre-WalkerY. C.PhillipsR. J.AminU.MumtazM. A. S.BrocardM.et al. (2014). Extensive translation of small open reading frames revealed by poly-ribo-seq. eLife3:e03528. 10.7554/eLife.03528

  • 3

    BaoR.DiaS. E.IssaH. A.AlhuseinD.FriedrichM. (2018). Comparative evidence of an exceptional impact of gene duplication on the developmental evolution of Drosophila and the higher Diptera. Front. Ecol. Evol. 6:63. 10.3389/fevo.2018.00063

  • 4

    BazziniA. A.JohnstoneT. GChristianoR.MackowiakS. D.ObermayerB.FlemingE. S.et al. (2014). Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation. EMBO J. 33, 981993. 10.1002/embj.201488411

  • 5

    CabiliM.TrapnellC.GoffL.KoziolM.Tazon-VegaB.RegevA.et al. (2011). Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 25, 19151927. 10.1101/gad.17446611

  • 6

    CaoG.GongY.HuX.ZhuM.LiangZ. C.HuangL.et al. (2017). Identification of tarsal-less peptides from the silkworm Bombyx mori. Appl. Microbiol. Biotechnol. 102, 18091822. 10.1007/s00253-017-8708-4

  • 7

    ChugunovaA.LosevaE.MazinP.MitinaA.NavalayeuT.BilanD.et al. (2019). LINC00116 codes for a mitochondrial peptide linking respiration and lipid metabolism. Proc. Natl. Acad. Sci. U.S.A.116, 49404945. 10.1073/pnas.1809105116

  • 8

    CousoJ.PatraquimP. (2017). Classification and function of small open reading frames. Nat. Rev. Mol. Cell Biol.18, 575589. 10.1038/nrm.2017.58

  • 9

    CrappéJ.CriekingeW. V.TrooskensG.HayakawaE.LuytenW.BaggermanG.et al. (2013). Combining in silico prediction and ribosome profiling in a genome-wide search for novel putatively coding sORFs. BMC Genomics14:648. 10.1186/1471-2164-14-648

  • 10

    DoolittleW. F. (2013). Is junk DNA bunk? A critique of ENCODE. Proc. Natl. Acad. Sci. U.S.A.110, 52945300. 10.1073/pnas.1221376110

  • 11

    DoolittleW. F.BrunetT. D. P. (2017). On causal roles and selected effects: our genome is mostly junk. BMC Biol. 15:116. 10.1186/s12915-017-0460-9

  • 12

    ENCODE Project Consortium (2007). Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature447, 799816. 10.1038/nature05874

  • 13

    ENCODE Project Consortium (2012). An integrated encyclopedia of DNA elements in the human genome. Nature489, 5774. 10.1038/nature11247

  • 14

    FaulknerG. J.CarninciP. (2009). Altruistic functions for selfish DNA. Cell Cycle8, 2895900. 10.4161/cc.8.18.9536

  • 15

    GraurD. (2017). An upper limit on the functional fraction of the human genome. Genome Biol. Evol. 9, 18801885. 10.1093/gbe/evx121

  • 16

    GraurD.ZhengY.PriceN.AzevedoR. B. R.ZufallR. A.ElhaikE.et al. (2013). On the immortality of television sets: Function in the human genome according to the evolution-free gospel of ENCODE. Genome Biol. Evol. 5, 578590. 10.1093/gbe/evt028

  • 17

    GriffithsP. E. (2009). In what sense does nothing make sense except in the light of evolution? Acta Biotheor. 57:11. 10.1007/s10441-008-9054-9

  • 18

    IngoliaN. T.BrarG. A.Stern-GinossarN.HarrisM. S.TalhouarneG. J. S.JacksonS. E.et al. (2014). Ribosome profiling reveals pervasive translation outside of annotated protein-coding genes. Cell Rep. 8, 13651379. 10.1016/j.celrep.2014.07.045

  • 19

    JiZ.SongR.RegevA.StruhlK. (2015). Many lncRNAs, 5'UTRs, and pseudogenes are translated and some are likely to express functional proteins. eLife4:e08890. 10.7554/eLife.08890

  • 20

    KimK. H.SonJ. M.BenayounB. A.LeeC. (2018). The mitochondrial-encoded peptide MOTS-c translocates to the nucleus to regulate nuclear gene expression in response to metabolic stress. Cell Metab. 28, 516.e7524.e7. 10.1016/j.cmet.2018.06.008

  • 21

    KnibbeC.CoulonA.MazetO.FayardJ. M.BeslonG. (2007). A long-term evolutionary pressure on the amount of noncoding DNA. Mol. Biol. Evol. 24, 23442353. 10.1093/molbev/msm165

  • 22

    KondoT.HashimotoY.KatoK.InagakiS.HayashiS.KageyamaY. (2007). Small peptide regulators of actin-based cell morphogenesis encoded by a polycistronic mRNA. Nat. Cell Biol. 9, 660665. 10.1038/ncb1595

  • 23

    LauresserguesD.CouzigouJ. M.ClementeH. S.MartinezY.DunandC.BécardG.et al. (2015). Primary transcripts of microRNAs encode regulatory peptides. Nature520, 9093. 10.1038/nature14346

  • 24

    LiH.XiaoL.ZhangL.WuJ.WeiB.SunN.et al. (2018). FSPP: A tool for genome-wide prediction of smORF-encoded peptides and their functions. Front. Genet. 9:96. 10.3389/fgene.2018.00096

  • 25

    LuS.ZhangJ.LianX.SunL.MengK.ChenY.et al. (2019). A hidden human proteome encoded by ‘non-coding' genes. Nucleic Acids Res. 47, 81118125. 10.1093/nar/gkz646

  • 26

    LynchM.BobayL. M.CataniaF.GoutJ. F.RhoM. (2011). The repatterning of eukaryotic genomes by random genetic drift. Annu. Rev. Genomics Hum. Genet. 12, 347366. 10.1146/annurev-genom-082410-101412

  • 27

    MackowiakS. D.ZauberH.BielowC.ThielD.KutzK.CalvielloL.et al. (2015). Extensive identification and analysis of conserved small ORFs in animals. Genome Biol.16:179. 10.1186/s13059-015-0742-x

  • 28

    MagnyE. G.PueyoJ. I.PearlF. M. G.CespedesM. A.NivenJ. E.BishopS. A.et al. (2013). Conserved regulation of cardiac calcium uptake by peptides encoded in small open reading frames. Science341, 11161120. 10.1126/science.1238802

  • 29

    NelsonB. R.MakarewichC. A.AndersonD. M.WindersB. R.TroupesC. D.WuF.et al. (2016). A peptide encoded by a transcript annotated as long noncoding RNA enhances SERCA activity in muscle. Science351, 271275. 10.1126/science.aad4076

  • 30

    OhtaT. (2002). Near-neutrality in evolution of genes and gene regulation. Proc. Natl. Acad. Sci. U.S.A.99, 16134-16137. 10.1073/pnas.252626899

  • 31

    OlexioukV.CrappéJ.VerbruggenS.VerhegenK.MartensL.MenschaertG. (2016). sORFs.org: a repository of small ORFs identified by ribosome profiling. Nucleic Acids Res. 44, D324D329. 10.1093/nar/gkv1175

  • 32

    PangY.LiuZ.HanH.WangB.LiW.MaoC.et al. (2020). Peptide SMIM30 promotes HCC development by inducing SRC/YES1 membrane anchoring and MAPK pathway activation. J. Hepatol. 10.1016/j.jhep.2020.05.028. [Epub ahead of print].

  • 33

    PengpengB.Ramirez-MartinezA.LiH.CannavinoJ.McAnallyJ. R.SheltonJ. M.et al. (2017). Control of muscle formation by the fusogenic micropeptide myomixer. Science356, 323327. 10.1126/science.aam9361

  • 34

    Polycarpou-SchwarzM.GroßM.MestdaghP.SchottJ.GrundS. E.HildenbrandC.et al. (2018). The cancer-associated microprotein CASIMO1 controls cell proliferation and interacts with squalene epoxidase modulating lipid droplet formation. Oncogene37, 47504768. 10.1038/s41388-018-0281-5

  • 35

    PueyoJ. I.CousoJ. P. (2008). The 11-aminoacid long tarsal-less peptides trigger a cell signal in Drosophila leg development. Dev. Biol. 324, 192201. 10.1016/j.ydbio.2008.08.025

  • 36

    PueyoJ. I.CousoJ. P. (2011). Tarsal-less peptides control Notch signalling through the shavenbaby transcription factor. Dev. Biol. 355, 1831936. 10.1016/j.ydbio.2011.03.033

  • 37

    QuinnJ. J.ChangH. Y. (2016). Unique features of long non-coding RNA biogenesis and function. Nat. Rev. Genet. 17, 4762. 10.1038/nrg.2015.10

  • 38

    RayS.RosenbergM. I.Chanut-DelalandeH.DecarasA.SchwertnerB.ToubianaW.et al. (2019). The mlpt/Ubr3/Svb module comprises an ancient developmental switch for embryonic patterning. eLife8:e39748. 10.7554/eLife.39748

  • 39

    RidleyM. (2004). Evolution, 3rd edn. Oxford: Blackwell Pub.

  • 40

    Ruiz-OreraJ.Verdaguer-GrauP.Villanueva-CañasJ. L.MesseguerX.Alb,àM. M. (2018). Translation of neutrally evolving peptides provides a basis for de novo gene evolution. Nat. Ecol. Evol.2, 890896. 10.1038/s41559-018-0506-6

  • 41

    SavardJ.Marques-SouzaH.ArandaM.TautzD. (2006). A segmentation gene in Tribolium produces a polycistronic mRNA that codes for multiple conserved peptides. Cell126, 559569. 10.1016/j.cell.2006.05.053

  • 42

    SinghU.WurteleE. S. (2020). How new genes are born. eLife2020:e55136. 10.7554/eLife.55136

  • 43

    TautzD.Domazet-LošoT. (2011). The evolutionary origin of orphan genes. Nat. Rev. Genet. 12, 692702. 10.1038/nrg3053

  • 44

    Tobias-SantosV.Guerra-AlmeidaD.MuryF.RibeiroL.BerniM.AraujoH.et al. (2019). Multiple roles of the polycistronic gene Tarsal-Less/Mille-Pattes/Polished-Rice during embryogenesis of the kissing bug Rhodnius prolixus. Front. Ecol. Evol. 7:379. 10.3389/fevo.2019.00379

  • 45

    Van De PeerY.MizrachiE.MarchalK. (2017). The evolutionary significance of polyploidy. Nat. Rev. Genet. 18, 411424. 10.1038/nrg.2017.26

  • 46

    VanderperreB.LucierJ. F.BissonnetteC.MotardJ.TremblayG.VanderperreS.et al. (2013). Direct detection of alternative open reading frames translation products in human significantly expands the proteome. PLoS ONE8:e70698. 10.1371/journal.pone.0070698

  • 47

    VassalloA.PalazzottoE.RenzoneG.BottaL.FaddettaT.ScaloniA.et al. (2020). The Streptomyces coelicolor small ORF trpM stimulates growth and morphological development and exerts opposite effects on actinorhodin and calcium-dependent antibiotic production. Front. Microbiol. 11:224. 10.3389/fmicb.2020.00224

  • 48

    WuQ.WrightM.GogolM. M.BradfordW. D.ZhangN.BazziniA. (2020). Translation of small downstream ORFs enhances translation of canonical main open reading frames. EMBO J.39:e104763. 10.15252/embj.2020104763

Summary

Keywords

small ORF, junk DNA, coding potential maturation, causal roles, selected effects, non-coding RNA, alternative ORF, pervasive translation

Citation

Guerra-Almeida D and Nunes-da-Fonseca R (2020) Small Open Reading Frames: How Important Are They for Molecular Evolution?. Front. Genet. 11:574737. doi: 10.3389/fgene.2020.574737

Received

21 June 2020

Accepted

25 August 2020

Published

20 October 2020

Volume

11 - 2020

Edited by

Fatemeh Maghuly, University of Natural Resources and Life Sciences Vienna, Austria

Reviewed by

Benedikt Obermayer, Charité Medical University of Berlin, Germany; Mona Wu Orr, Amherst College, United States

Updates

Copyright

*Correspondence: Diego Guerra-Almeida Rodrigo Nunes-da-Fonseca

This article was submitted to Systems Biology, a section of the journal Frontiers in Genetics

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics