Micropeptides Encoded in Transcripts Previously Identified as Long Noncoding RNAs: A New Chapter in Transcriptomics and Proteomics

Integrative analysis using omics-based technologies results in the identification of a large number of putative short open reading frames (sORFs) with protein-coding capacity within transcripts previously identified as long noncoding RNAs (lncRNAs) or transcripts of unknown function (TUFs). sORFs were previously overlooked because of their diminutive size and the difficulty of identification by bioinformatics analyses. There is now growing evidence of the existence of potentially functional micropeptides produced from sORFs within cells of diverse species. Recent characterization of a few of these revealed their significant divergent roles in many fundamental biological processes, where some also show important relationships with pathogenesis. Recent works therefore provide new insights for exploring the wealth of information that may lie within sORF-encoded short proteins. Here, we summarize the current progress and view of micropeptides encoded in sORFs of protein-coding genes.


INTRODUCTION
Identification of a large number of RNA transcripts by genome-wide analysis suggests a complex network of transcripts that includes tens of thousands of long noncoding RNAs (lncRNAs) and transcripts of unknown function (TUFs) (Carninci et al., 2005;Willingham et al., 2006;Birney et al., 2007;Kapranov et al., 2007). Recent studies have suggested that lncRNAs and TUFs in the human genome represent the greatest source for short open reading frames (sORFs), which were previously overlooked because of their small size and the lack of evidence for "codingness" (Frith et al., 2006;Cohen, 2014;Pauli et al., 2015). As a result, sORFs embedded in lncRNAs and TUFs have not been adequately studied.
sORF-encoded micropeptides first attracted the attention of a group of scientists during their study of lncRNA (Rohrig et al., 2002). From that point, many studies have been carried out to identify potential sORF candidates, and whether there are any more of them that can encode functional micropeptides. Recent advancements in bioinformatics, proteomics and transcriptomics have revealed that traditional computational algorithms used in searches for many potent ORFs may have included oversights as many studies have now identified hundreds of non-annotated sORFs that have coding potential for micropeptides (Ingolia et al., 2011;Slavoff et al., 2013;Bazzini et al., 2014) from yeast  to plants (Hanada et al., 2013;Lauressergues et al., 2015) and humans (Ingolia et al., 2014;Ma et al., 2014). sORF-encoded proteins have emerged as a new, functional class because of their role in many biological activities (Crappé et al., 2014). The diverse biological functions of this new group of short proteins have attracted the attention of the scientific community and increased interest in studying them in more detail (Saghatelian and Couso, 2015;Makarewich and Olson, 2017).
Here, we give a brief overview of the various approaches recently used to identify sORF-encoded micropeptides and their biological function. Based on the results of previous studies, we also try to identify the potential ideas and strategies that can be implemented to characterize other micropeptides' functionalities. Finally, we review the diverse biological function of micropeptides that have been found up until recently, from plants to animals. These suggesting that many biologically significant micropeptides may be concealed in the hidden world of proteomes.

MORE DEVELOPED TECHNIQUES IDENTIFY MORE POTENT sORF-ENCODED MICROPEPTIDES
Traditional computational prediction of protein-coding ORFs relies on a number of stringent criteria to remove meaningless ORFs, such as size cutoff of 300 nucleotides, AUG start Metrics and methods to identify sORF (including both computational and experimental)

Description References
Computing-based method sORFfinder, HAItORF, uPEPperoni Web based tools to locate sORF having coding potential Hanada et al., 2010;Vanderperre et al., 2012;Skarshewski et al., 2014 PhyloCSF A computational method examines evolutionary conservation of a sORF across species Lin et al., 2011 Transcriptomic-based method Ribosome profiling A deep sequencing-based tool of ribosome protected mRNA fragments to obtain a global snapshot of translation Ingolia et al., 2011 Poly-ribo seq A combination of ribosome profiling and polysome to enrich more potent protein coding ORFs Aspden et al., 2014 Ribosome releasing scores (RRS) These three metrics are developed and combined with ribosome profiling to assist in identification of true protein coding ORFs Guttman et al., 2013 Fragment length organization similarity score (FLOSS) Ingolia et al., 2014 ORF regression algorithm for translation evaluation RPFS (ribosome-protected mRNA fragments) (ORF-RATER) Fields et al., 2015 Proteomics-based Proteo genomics A combined approach of proteomics and genomics Slavoff et al., 2013 codon usage, and sequence conservation (Gish and States, 1993;Kochetov, 2005), rendering them inappropriate for sORF detection. Hunting for these tiny treasures has therefore posed a great challenge. However, with the advancement of technology, the challenge has begun to be addressed effectively. Both computational and experimental approaches have made it easier to explore the complexity of the small proteome. Several approaches have been taken to systematically annotate sORFs with coding potential. Along with other conventional strategies, such as cross-species comparison, examination of codon content and coding features used to identify ORFs, various metrics and methods have been developed and are playing prominent roles in identifying putative sORFs ( Table 1).
Ribosome profiling has emerged as a technique for comprehensively and quantitatively measuring translation (Ingolia et al., 2014;Smith et al., 2014). Based on modification of ribosome foot printing, it is mainly premised on deep sequencing of ribosome-protected mRNA fragments to obtain a global snapshot of translation. Application of ribosome profiling has provided several key findings, including prodigious use of non-ATG initiation codons, as well as identification of polycistronic genes, upstream ORFs and overlapping ORFs. Hundreds of putative non-annotated protein-coding sORFs have recently been identified in eukaryotic genomes by using this technique (Ingolia et al., 2011;Bazzini et al., 2014).
However, ribosome occupancy does not always mean true translation, as indicated by the identification of many wellcharacterized nuclear lncRNAs in a ribosome profiling assay (Brannan et al., 1990;Guttman et al., 2013). Many ORFs are associated with ribosomes to regulate the translation of downstream ORFs. This suggests ribosome profiling is not sufficient evidence of protein synthesis. To differentiate more effective protein-coding transcripts from noncoding RNAs, several algorithms and metrics have been developed based on their ribosome-profiling characteristics, including RRS (Guttman et al., 2013), FLOSS (Ingolia et al., 2014), ORF-RATER (Fields et al., 2015), and Ribo taper (Calviello et al., 2016).
Poly-Ribo-Seq, a modification of a ribosome-profiling method, enriches polysomes that are more likely to be actively translating mRNA into proteins. Poly-Ribo-Seq was successfully used to identify several sORFs in the Drosophila genome (Galindo et al., 2007;Aspden et al., 2014).
Mass spectrometry (MS) peptidomics and proteomics experiments have recently been applied to identify sORFencoded micropeptides. MS is advantageous compared with ribosome profiling, as it directly detects the peptide generated from ORFs and therefore validates the production of peptides. However, the bias of MS toward more abundant proteins means it only detects the peptides abundant in cells. Analysis of tandem mass spectrometry (MS/MS) data that mapped expressed peptides to their encoding genomic loci and transcriptome data generated by ENCODE has identified 85 unique peptides that match with 69 lncRNAs (Bánfai, 2012). Slavoff et al. developed a modified proteomic strategy, known as proteogenomics to identify and validate more potent sORFs, wherein they compiled a custom mRNA-seq derived polypeptide database to identify MS fragmentation spectra. In this approach, the proteome is enriched to isolate small polypeptides before proteomic analysis. Through this strategy, 86 uncharacterized SEPs (sORF-encoded polypeptides) of 90 were identified in K562 cells . There are also still some difficulties to consider. The average tissue content of micropeptides is very low, and they are often subjected to degradation or loss during sample preparation, which further impedes their identification. As a result, many micropeptides produced in cells may be absent in MS analysis. New and alternative extraction methods may prove more effective in extracting and identifying micropeptides. For example, Schwaid et al. described an affinity-based approach that could enrich and identify cysteine-containing human sORF-encoded polypeptides (ccSEPs) in cells. They were able to identify 16 novel sSEPs from previously uncharacterized sORFs . MS-based methods have thus, to date, identified a limited number of micro-proteins.

sORF-ENCODED MICROPEPTIDES: INSIGHTS INTO THEIR FUNCTION
Small peptides have high recognition because of their important roles in diverse biological processes (Fricker, 2005;Boonen et al., 2009;Cabrera-Quio et al., 2016). The largest and most extensively studied class of small peptides are classical bioactive peptides, which are derived from larger precursor proteins and contain N-terminal signal sequences. Hormones and neuropeptides are considered the best examples of bioactive molecules (Hashimoto et al., 2001;Cunha et al., 2008). Most of these peptides act as ligands of membrane receptors (Boonen et al., 2009). Micropeptides differ from these bioactive small peptides in that they are not processed from large peptides but rather are translated from sORFs previously identified as lncRNAs and TUFs. Four initial studies (Rohrig et al., 2002;Savard et al., 2006;Galindo et al., 2007;Kondo et al., 2007) were pioneering in opening up new avenues for sORF research. Their studies showed how a sORF can be involved in different developmental contexts with apparently different biological roles during morphogenesis.
As described above, advancements in technologies over the past few years have led to the discovery of several hundred of putative coding sORFs in various species. However, it is still unknown how many of these newly discovered sORFencoded peptides are functional. Existence of a peptide does not always imply it has a function. Experimental demonstration is important in revealing their biological effects. Several approaches can be used to validate candidate-translated sORFs (Housman and Ulitsky, 2016). Recently some micropeptides have been characterized and found to play important roles in fundamental biological processes such as RNA decapping (D'Lima et al., 2017), DNA repair , stress signaling (Matsumoto et al., 2017), apoptosis (Guo et al., 2003), muscle formation (Bi et al., 2017), metabolic homeostasis (Lee et al., 2015), and calcium homeostasis (Magny et al., 2013;Anderson et al., 2015Anderson et al., , 2016Nelson et al., 2016; Figure 1).The following section briefly explains commonly used strategies for deciphering the functions of short proteins that are necessary for their characterization (Figure 2).

IN SILICO (OR COMPUTATIONAL) CHARACTERIZATION
Evolutionary conservation is an important sign that a gene is functional. One hallmark of the sORFs studied thus far is evolutional conservation of micropeptides. An evolutionary conserved micropeptide called polished rice (pri) or tarsal-less (tal) was identified in Drosophila, while the Tribolium orthologue is known as mille-pattes (mlpt) (Savard et al., 2006;Galindo et al., 2007;Kondo et al., 2007). These micropeptides were characterized based on their conservation. Homology-based searching among species for unannotated micropeptides may be performed to predict any conserved biological function (Figure 2). The best example of homology-based characterization is the identification of a group of micropeptides, namely, myoregulin (MLN), phospholamban (PLN), and sarcolipin (SLN). They share conserved peptide sequences from flies to vertebrates involved in Ca 2+ homeostasis through inhibiting SERCA activity (Magny et al., 2013) in muscle. There is a sequence and structural similarity among these peptides. Later, another two micropeptides, endoregulin (ELN), and another-regulin (ALN), were also characterized based on their shared amino acids, and found to show similar functions to MLN/PLN/SLN, but in nonmuscle cell types .
Thus, identification and characterization based on sequence features is a reasonable approach for deciphering the biological FIGURE 1 | Diverse biological function of recently annotated micropeptides. Micropeptides are found to be involved in many biological processes. Myoregulin (MLN), phospholamban (PLN), sarcolipin (SLN), and another regulin (ALN) are a group of peptides that interact with the protein SERCA (a Ca 2+ Pump) in sarcoplasmic and endoplasmic reticulum (S/ER) and maintain Ca 2+ homeostasis in the cell. MOTS-c and humanin are mitochondrial sORF-encoded micropeptides that display important roles in metabolic homeostasis and apoptosis, respectively. Humanin suppresses apoptosis by preventing the translocation of an apoptosis inducing protein, Bax (Bcl2-associated X protein), from cytoplasm to mitochondria. Another micropeptide named MRI-2 is found to enhance non-homologous end joining (NHEJ) of double-strand DNA breaks (DSBs) by associating with other DNA end-binding proteins (Ku proteins). Myomixer, minion, SPAR, and NoBody, four other micropeptides that have been recently discovered, have distinct biological roles wherein myomixer and minion stimulate the fusion of myoblast to form myofiber during muscle formation by participating with another protein, myomaker. The micropeptide SPAR is localized into lysome where it interacts with the lysosomal v-ATPase complex and regulates mTORC1 protein activation during stress signaling. NoBody, a p-body (processing-body, which is involved in mRNA turnover) dissociating micropeptide, shows its function by interacting with the mRNA decapping complex. function of new unannotated micropeptides. Computational predictions of functional sORFs use several key features to identify potential sORFs. Canonical protein-coding ORFs show striking sequence features as measured by the ratio of Ka and Ks (Ka/ Ks < 1, the ratio of synonymous versus nonsynonymous codon substitution), suggesting that canonical protein coding genes are under selective pressure during evolution. Compared with canonical protein coding genes, it is difficult to score statistically significant values for very short sequences because the number of possible changes is low (Ladoukakis et al., 2011). Mackowiak and his group brought a new computational approach to identify conserved sORFs using comparative genomics (Mackowiak et al., 2015). Three qualitative features of coding sequence conservation specific to known micropeptides and canonical proteins were analyzed in their study. The first is the conservation of amino acid sequences by phylogenetic codon substitution frequencies (PhyloCSF). Second is the conservation of the reading frame, which is the conservation of in-frame start and stop codons in related species. The third is a drop in nucleotide sequence conservation around the start and stop codons using PhastCons (Siepel et al., 2005). The combination of these three features has identified about 2,000 sORFs in five systems: human, mouse, zebrafish, fruit fly, and the nematode Caenorhabditis elegans. Translation and protein expression of some of these predicted sORFs have also been confirmed by experimental evidence.
Although functional characterization of sORFs based on sequence conservation is useful, it is not applicable for all. Some non-conserved sORFs may evolve as newly coding ORFs that can also be present and be involved with regulatory functions.

FUNCTIONAL PROTEOMICS
Although some sORFs are found to be highly conserved across species, most show relatively low sequence conservation compared with known protein-coding genes (Carvunis et al., 2012;Slavoff et al., 2013). Therefore, although homology-based functional characterization is reasonable, as mentioned above, it Homology-based searching among species thus can be performed to identify whether the target peptide sequence shares any functional similarity with other proteins. Here the blue and red boxes indicate the conserved sequences among species. (B) Functional proteomics is a commonly used approach for identifying the interacting proteins of a target protein. In this method, first, immunoprecipitation is conducted by using an antibody (Ab) that is designed either against the epitope tagged with a target micropeptide or directly against the micropeptide. Western blot is then performed followed by mass spectrometry analysis to separate and identify the interacting proteins. Red brackets indicate the bands of interacting proteins that are separated by western blot analysis. A negative control (NC) denotes an empty vector that also runs for comparison. The nature of the interacting protein will thus provide clues about the function of the target micropeptide. (C) CRISPR-cas9 mediated gene editing approaches can also be used to check the coding potential of sORFs. To verify the coding potential, an epitope tag (FLAG) can be inserted at the downstream of the sORF into the endogenous locus. CRISPR-cas9 mediated gene editing is started by the recognition of the target site, which is mediated by a guide RNA (gRNA). Guide RNA guides the cas9 endonuclease to a specific location in the genome sequence, which is immediately adjacent to a protospacer adjacent motif (PAM). Upon recognition, the cas9 creates a double strand break (DSB) at the target site. This DSB can then be repaired either by non-homologous end joining (NHEJ) or by homology directed repair (HDR). HDR is used to insert an epitope tag at the target site where a donor vector with homology to the targeted locus must be provided. The donor vector must contain the epitope tag that has to be knocked-in at the target site. Expression of the engineered fusion protein can then be verified by western blot analysis.
has difficulty finding species-specific functional peptides. Several of the micropeptides characterized thus far exert their functions by interacting with other proteins. Several studies have applied functional proteomics successfully to identify the interacting partners. For example, Matsumoto and colleagues employed functional proteomics to study a LINC00961-encoded short protein. This micropeptide interacts with the lysosomal v-ATPase complex to regulate mTORC1 (a rapamycin protein complex) activation (Figure 1) and muscle regeneration. This interaction with the v-ATPase complex and regulation of mTORC1 is specific to the amino acid response. It is therefore known as a small regulatory polypeptide of the amino acid response, or SPAR (Matsumoto et al., 2017).
By employing functional proteomics, another group also characterized and identified the biological significance of another unreported micropeptide, named NoBody (D'Lima et al., 2017). By performing immunoprecipitation and MS analysis, the researchers found NoBody to be a component of the mRNA decapping protein complex that cross-links to EDC4 (enhancer of mRNA decapping 4). The mRNA decapping complex removes the 5 ′ cap from mRNAs to promote 5 ′ -3 ′ decay. Molecular components of this pathway localize to p-bodies. Manipulation of NoBody expression is anticorelated with the P-body number. NoBody regulates the P-body number in cells by interacting with decapping proteins. This micropeptide is therefore called the non-annotated P-body dissociating polypeptide (NoBody).
However, traditional immunoprecipitation methods very often result in the enrichment of many nonspecific interactions of micropeptides. For example, functional proteomics analysis of a micropeptide named modulator of retroviral infection (MRI) has revealed that it is associated with ku70 and ku80, two essential proteins that are involved in the nonhomologous end joining DNA repairing mechanism . Association of MRI with ku70/ku80 suggests that it is involved in the cellular DNA repairing mechanism. Although the immunoprecipitation of MRI also enriched for heat shock protein 70 family members protein, imaging studies ruled out cytosolic heat shock proteins as bona fide interactors that might be formed after the cells are lysed during the immunoprecipitation Grundy et al., 2016). Such a problem thus demands a better approach for identifying micropeptide associated proteins and protein complexes. Recently Chu and colleagues applied an insitu proximity tagging method to elucidate microprotein-protein interactions (MPIs) for an uncharacterized microprotein called c11orf98 . This method relies on an engineered ascorbate peroxidase (APEX) (Rhee et al., 2013). When APEX fusion protein is expressed in the cells and treated with hydrogen peroxide (H 2 O 2 ) in the presence of biotin-phenol, the proteins proximal to the APEX fusion protein are labeled with biotin. The proteins, that are biotinylated, can then be enriched and analyzed by MS. Thus, the analysis of biotinylated proteins provides valuable information about the protein environment of fusion protein. Since the interactions take place in the context of a living cell, the enrichment of nonspecific interactors is reduced. By applying this approach, it was revealed that c11orf98 interacts with nucleolar proteins nucleoplasm and nucleolin , which suggests that the application of APEX tagging is useful to characterize uncharacterized micropeptides.
These studies suggest that functional proteomics may be implemented to understand the function and biological nature of an unannotated short protein through identifying direct binding partners or components (Figure 2).

GENE EDITING APPROACHES
Recently developed Clustered regularly interspaced short palindromic repeats (CRISPR)-associated protein (cas9) mediated gene editing technology has become a powerful approach among scientists to study a gene's function. CRISPR-cas9 mediated gene editing strategies can also be used for identifying and verifying coding potential of sORF encoded peptides. An epitope tag can be knocked-in into the endogenous locus of a micropeptide in-frame with the predicted sORF to produce a fusion protein using CRISPR/cas9-mediated homologous recombination (Figure 2). Detection of the engineered fusion protein by western blot analysis provides the evidence that the mRNA is translated into a stable peptide. This powerful knock-in technique also simplifies many downstream applications that are important for functional characterizing of a gene. For example, immunoprecipitation to identify binding partners of the target proteins. Immunocytochemistry can also be performed in epitope-tagged samples to check the subcellular localization of the fusion protein, which may provide important information about its involvement in biological processes. Recently some research groups have implemented this new technology to verify sORF-encoded peptides (Galindo et al., 2007;Slavoff et al., 2014;Anderson et al., 2015). By using CRISPR-cas9 homologous recombination, an epitope tag was inserted at the downstream of the sORF to confirm whether the sORF containing gene was actively transcribed from its native chromosomal context and translated into a stable peptide. Identification and validation of some sORF-encoded peptides by CRISPR-cas9 mediated gene editing technologies thus indicate the possible successful application of them in identifying and verifying other sORF-encoded peptides.

In Plants
The first eukaryotic micropeptide was identified in plants by a group of researchers studying legumes. A gene called early nodulin 40 (Enod40), previously annotated as lncRNA, was found to encode two short peptides of 12 and 24 amino acids (AAs) in plants, where they interact with a sucrosesynthesizing enzyme during root nodule organogenesis (Rohrig et al., 2002). Since the discovery of the first micropeptide in plants, others have also been functionally characterized. The 36 AAs peptide, which is encoded by the POLARIS (PLS) gene in Arabidopsis, has been shown to affect root growth and leaf vascular patterning (Casson et al., 2002;Chilley et al., 2006). Another two micropeptides, 76 AAs Brick1 (Brk) and 53 AAs ROTUNDIFOLIA (ROT4), were also found to be involved with leaf morphogenesis. In maize, the recessive mutation of Brk1 results in several morphological defects of leaf epithelia (Frank and Smith, 2002). However, ROT4 regulates polar cell proliferation in lateral organs and leaf morphogenesis in Arabidopsis (Narita et al., 2004). In Arabidopsis, two other best-characterized micropeptides were reported: a 51 AAs ROT18/DLV1 and a 25 AAs kiss of death (KOD), which are involved in plant organogenesis (Wen et al., 2004;Valdivia et al., 2012;Guo et al., 2015) and programmed cell death regulation (Blanvillain et al., 2011), respectively. Recently two newer micropeptides have also been identified in maize, Zm401p10 and Zm908p11 with 89 and 97 AAs, respectively, which are involved in pollen development (Ma et al., 2008;Wang et al., 2009;Dong et al., 2013). Characterizations of these micropeptides indicate their functional diversity ranging from plant development to growth, nodulation, organogenesis, pollen development, and cell death.

In Animals
The first identification of micropeptides in animals came from the study of lncRNAs in Drosophila. The sORFs of the long noncoding RNA, namely, polished rice or tarsal-less (tal), encode four micropeptides from 11 to 32 AAs are required during the embryonic development of flies (Galindo et al., 2007;Kondo et al., 2007Kondo et al., , 2010. By triggering proteasomemediated protein processing, the pri micropeptide converts a transcription factor, shavenbaby (Svb), from a repressor into an activator (Zanet et al., 2015). Since then, a handful of micropeptides have been functionally characterized ( Table 2). To identify the characterizing signal molecules from the nonannotated translated sORFs, the Pauli group identified a micropeptide, Toddler, which acts as a motogen, a signal that promotes cell migration. Toddler activates G-proteincoupled APJ (apelin) signaling for this function (Pauli et al., Ma et al., 2008;Wang et al., 2009;Dong et al., 2013 Animal Polished rice (Pri) Insects Mutation analysis Fly embryogenesis 11-32 Galindo et al., 2007;Kondo et al., 2007Kondo et al., , 2010  2014). AGD3, previously classified as a TUF, encodes a small protein of 63 AAs and has been found to show involvement in human stem cell differentiation (Kikuchi et al., 2009). Recently a group of micropeptides was found to show a prominent role in calcium homeostasis, both in skeletal and nonskeletal muscle cells, through the binding and inhibiting of a well-known Ca 2+ ATP-ase pump, SERCA, thereby influencing regular muscle contraction (Magny et al., 2013;Anderson et al., 2015). Nelson et al. described the opposite activity of another lncRNA-derived micropeptide in mammalian muscle, called DWORF (dwarf open reading frame). This micropeptide enhances SERCA activity by displacing those inhibitory proteins and boosts muscle performance. DWORF is abundantly expressed in the mouse heart, and is suppressed in ischemic human heart tissue, suggesting a possible link with heart failure (Nelson et al., 2016). Myomixer, a micropeptide of 84 AAs also has a function in the muscle but is unlike DWORF or other micropeptides in this group. Myomixer plays a role in controlling muscle formation by associating with a fusogenic membrane protein, myomaker, and favors formation of multinucleated myofibers in mice (Bi et al., 2017). Recently, another peptide known as minion (microprotein inducer of fusion), which is specific for skeletal muscle, has been identified. Functional characterization of this microprotein revealed that like myomixer, minion also controls cell fusion, and muscle formation by associating with myomaker (Zhang et al., 2017). The functionality of micropeptides has also been found in the DNA repairing process. For example, a 69 AAs small peptide, MRI-2, has been identified as a novel factor of the nonhomologous end join factor (NHEJ). MRI-2 stimulates NHEJ by interacting with Ku protein, a DNA end-binding protein . As more micropeptides are characterized, more hidden functions are unfolded, as exemplified by another micropeptide that is encoded by a putative lncRNA HOXB-AS3. This conserved 53 AAs peptide, HOX-AS3, inhibits tumorigenesis by the regulation of PKM alternative splicing and metabolic reprogramming of colon cancer cells (Huang et al., 2017). NoBody and SPAR are two additional examples of functional micropeptides, which as we described above, have been characterized recently by their distinct biological significance. According to Weissman, some micropeptides might also be immunogenic without a clear functional role. For example, micropeptides derived from human-infecting cytomegalovirus (HCMV) lncRNA β2.7, were found to robustly stimulate T cell memory responses only in humans with a history of HCMV infection (Fields et al., 2015). Very recently, another group of scientists identified some micropeptides that exhibited differential regulation upon viral infection (Razooky et al., 2017). These indicate that there may be more sORFs that are involve with certain diseases. Thus, translation of some ORFs that have been previously overlooked may contribute in important ways to cell biology.
Biologically significant micropeptides are not only found to be encoded by nuclear-encoded transcripts. Mitochondrial genomes also contribute in the proteome by producing biologically important micropeptides. Humanin, a signaling peptide encoded by mitochondrial sORFs, is functionally involved with programmed cell death. It inhibits translocation of an apoptosis-inducing protein, Bax (Bcl2-associated x-protein), from cytoplasm to mitochondria, and thereby regulates apoptosis (Guo et al., 2003). Humanin also shows neuroprotective effects and is known as a peptide against neurotoxicity related diseases (Matsuoka et al., 2006). Another micropeptide of 16 AAs was also found to be encoded by mitochondrial 12sRNA, named MOTS-c. MOTS-c shows endocrine-like effects on muscle metabolism, insulin sensitivity and weight regulation (Lee et al., 2015). Identification of the mitochondrial-encoded peptides humanin and MOTS-c suggests the possible existence of more potent sORFs in mitochondria along with their role as regulators of biological processes.
The diverse biological functions of these micropeptides serve as an indication that we are at the very beginning of exploring the mystery of micropeptides.

CONCLUSIONS
Technological advances have uncovered the existence of several hundred putative sORF-encoded micropeptides throughout the genomes. Recent identification and characterization of a small number of sORF-encoded micropeptides and their biological role indicate that there is a hidden world of active peptides waiting to be explored. A great deal of effort is still needed to validate whether each of these peptides is biologically important or if they are just transcriptional/translational noise. Some widely used approaches, such as homology-based functionality search, functional proteomics, gene editing technologies, and massive sequencing-based approach, can be implemented on uncharacterized micropeptides to reveal their biological relevance. Tiny size, low abundance, rapid degradation and loss during sample preparation often make it difficult to work with micropeptides, demanding more sensitive and sophisticated methods. Thus, there are many technical challenges in facilitating the study of micropeptides.
Functional studies of micropeptides in a wide range of species demonstrate that they have important biological functions, including involvement in human pathogenesis. HOXB-AS3, DWORF and humanin are some examples of this group, which show involvement in cancer, heart diseases, and neurotoxicity related diseases, respectively. In addition to these, involvement of a group of newly identified micropeptides against viral infection mediated pathogenesis also suggest that there are more micropeptides that may be involved with certain diseases in humans. These findings indicate that micropeptides may represent new opportunities for drug therapies.
Although some of the micropeptides are functionally characterized, the exact mechanism of their mode of action is unclear. Complete understanding of their action may play an important role in therapeutic purposes, where a drug may be designed by modulating or mimicking their function to regulate any biological pathway they may be involved in.
These recent findings provide new insights into sORFencoded micropeptides as a new and important class of biological molecules and offer new avenues of research in the proteomics world.