Exploring Evidence of Non-coding RNA Translation With Trips-Viz and GWIPS-Viz Browsers

Detection of translation in so-called non-coding RNA provides an opportunity for identification of novel bioactive peptides and microproteins. The main methods used for these purposes are ribosome profiling and mass spectrometry. A number of publicly available datasets already exist for a substantial number of different cell types grown under various conditions, and public data mining is an attractive strategy for identification of translation in non-coding RNAs. Since the analysis of publicly available data requires intensive data processing, several data resources have been created recently for exploring processed publicly available data, such as OpenProt, GWIPS-viz, and Trips-Viz. In this work we provide a detailed demonstration of how to use the latter two tools for exploring experimental evidence for translation of RNAs hitherto classified as non-coding. For this purpose, we use a set of transcripts with substantially different patterns of ribosome footprint distributions. We discuss how certain features of these patterns can be used as evidence for or against genuine translation. During our analysis we concluded that the MTLN mRNA, previously misannotated as lncRNA LINC00116, likely encodes only a short proteoform expressed from shorter RNA transcript variants.


INTRODUCTION
Ribosome profiling, or footprinting (a.k.a. Ribo-seq), has allowed for a detailed assessment of whole cellular transcriptomes (Ingolia et al., 2009). The Ribo-seq technique enables this by generating a snapshot of active ribosome locations at a given moment by only sequencing the parts of RNA molecules protected by the ribosome during translation, which are termed ribosome protected fragments (RPFs) or ribosome footprints (Ingolia, 2014). These data are used for inferring parameters of translation, including translation rates of individual mRNAs, differential translation, ribosome pause detection and identification of translated open reading frames (ORFs), among others (Ingolia, 2014;Brar and Weissman, 2015;Andreev et al., 2017). A plethora of computational approaches and software tools have been developed for the analysis of ribosome profiling data (Kiniry et al., 2020). Among many findings made with the use of ribosome profiling were observations of translation of some of the RNA molecules that were previously classified as non-coding RNA (ncRNA).
The evidence that lncRNAs can be translated was initially provided by Ingolia et al. (2011). Later, by analyzing available data, Chew et al., demonstrated that the high ribosomal occupancy in many lncRNAs resembles that in 5 leaders of protein coding mRNAs (Chew et al., 2013). The 5 leader sequences often contain translated short open reading frames, providing an argument in support of translation within lncRNAs. A counter argument was made by Guttman et al., who used ribosome footprint density at stop codons as a signature of genuine translation and developed ribosome release score (RRS) to measure it (Guttman et al., 2013). High RRSs are observed for long protein coding ORFs, but not for short ORFs in 5 leaders and lncRNAs. This argument, however, is flawed, as it only shows that re-initiation and leaky scanning are infrequent downstream of long protein coding ORFs. Indeed, translation of downstream ORFs is observed only in rare cases downstream of relatively short ORFs lacking ATG codons within the entire coding sequence (Benitez-Cantos et al., 2020) or during equally infrequent stop codon readthrough (Loughran et al., 2014). However, when ORFs are short and their translation is inefficient, re-initiation (Munzarová et al., 2011) and leaky scanning (Michel et al., 2014a) are possible, so that the 5 leaders and lncRNAs could have multiple, often overlapping ORFs that are translated. Subsequently, Ingolia et al. (2014) developed an approach for discriminating genuine translation from aberrant RNA protection by the ribosome or other large ribonucleoprotein complexes with the analysis of the distribution of ribosome footprint lengths, called the fragment length organization similarity score, FLOSS. FLOSS scores appear to be similar for protein coding ORFs, 5 leaders and lncRNAs, but were distinct for the protected fragments derived from RNAs with known non-coding functions . While there is an overwhelming body of evidence that many lncRNAs have translated ORFs, it is unlikely that many of them code for stable protein products because the lack of long ORFs and of nucleotide substitution patterns typical for protein coding evolution. Although the functional significance of the translated ORFs remains largely unclear, emerging data suggest certain possibilities, such as ribosome assisted RNA processing (Sun et al., 2020).
Several mRNAs coding for small proteins were initially misclassified as lncRNAs, and some of them were "upgraded" to the status of mRNAs after their products have been identified and characterized. An example is LINC00116 that was found to code for a 56-amino acid functional microprotein found in mitochondria (Catherman et al., 2013). Later, it was independently rediscovered and characterized by several groups, named mitoregulin and assigned the protein coding gene symbol MTLN (Stein et al., 2018;Chugunova et al., 2019;Lin et al., 2019). Mitoregulin has been shown to enhance mitochondrial respiratory activity (Chugunova et al., 2019) and play a regulatory role in adipocyte metabolism (Friesen et al., 2020).
There are many other examples of microprotein encoding mRNAs misclassified as non-coding; a 46-amino acid microprotein, myoregulin encoded by LINC00948 (Anderson et al., 2015); the 7-kilodalton microprotein, non-annotated P-body dissociating polypeptide (NoBody) encoded by LINC01420 (D'Lima et al., 2017); the microprotein, CIP2A-BP encoded by LINC00665 that inhibits triple negative breast cancer progression (Guo et al., 2020), and the small endogenous peptide, SMIM30, which promotes hepatocellular cancer tumorigenesis, encoded by LINC00998 (Pang et al., 2020) to name a few. Nonetheless, it is likely that more await discovery, and therefore analysis of lncRNA translation and protein coding potential is an active area of research.
A number of tools have been developed for automatic detection of translated ORFs using Ribo-seq data (Crappé et al., 2015;Fields et al., 2015;Ji et al., 2015;Raj et al., 2016;Reuter et al., 2016;Erhard et al., 2018;Xiao et al., 2018;Brunet et al., 2019), their predictions vary and in the absence of a gold standard, their accuracies are difficult to estimate (Baranov and Michel, 2016). RNA protection from nuclease digestion could also occur from large RNA-protein complexes other than the ribosome. In fact, a tool Rfoot has been developed specifically for identification of such RNase protection due to RNA-binding proteins (Ji, 2018). It has been discussed that ribosomal footprints can be differentiated from non-ribosomal activity via differences in footprint length and lack of triplet periodicity (Ji et al., 2016;Ingolia et al., 2019). Therefore, for accurate and reliable detection of genuinely translated ORFs and protein-coding potential, it is often necessary to carefully examine available data manually. Here, we demonstrate how publicly available ribosome profiling data can be explored using ribosome profiling data resources from RiboSeq. Org portal, Trips-Viz (TRanscriptomewide Information on Protein Synthesis-Visualized) and GWIPSviz (Genome Wide Information on Protein Synthesis-visualized).
Trips-Viz is a graphical user interface (GUI) on-line platform that allows for interactive analysis and visualization of Riboseq and shotgun RNA sequencing (RNA-seq) data aligned to transcriptomes (Kiniry et al., 2019(Kiniry et al., , 2021. To date Trips-Viz contains 2064 Ribo-seq files and 752 RNA-seq files from 114 studies across nine organisms. In the section "Setup and Configurations, " we describe in detail how to use the relevant functionalities of Trips-Viz. In the section "Data exploration in the context of individual RNA sequences, " we examine a selection of transcripts that illustrate different patterns of ribosomal footprints aligned to them and evaluate these patterns for genuine translation, see Table 1. The GWIPS-viz browser provides visualization of unambiguously mapped footprints to reference genomes (Michel et al., 2014b) and its use is necessary in order to evaluate how well transcript annotations and gene structures are supported by available data. In addition, we use the codon alignment viewer (CodAlignView) that is helpful for visualization of codon substitution that can reflect evolutionary selection acting on protein coding sequences (Jungreis et al., 2021).

SETUP AND CONFIGURATIONS
Trips-Viz 1 provides data aligned to the transcriptomes of several organisms and a rich repertoire of functional visualizations for the analysis of ribosome profiling data. Here we focus on Homo sapiens and the function "Single transcript plot" to manually examine transcripts of interest. For further explanation on the other analyses available within Trips-Viz, please refer to detailed instructions and videos available within the Trips-Viz platform (Kiniry et al., 2019(Kiniry et al., , 2021. Using prior knowledge, we selected the translated ORF on MTLN mRNA (formerly LINC00116) to serve as an example for genuine translation. As an example of a ncRNA whose ORF translation is unlikely we chose RPPH1 that encodes for the RNA component of RNase P. We further explored ribosome footprints aligned to SNHG8, ZFAS1, and XIST. The translation of all three lncRNAs has been reported previously (Ji et al., 2015;Calviello et al., 2016;Martinez et al., 2020), the translation of SNHG8 and ZFAS1 was also reported in additional studies (van Heesch et al., 2019;Gaertner et al., 2020) and the translation of ZFAS1 was also reported by Chen et al. (2020).
While the default options for "Single transcript plot" are usually adequate for initial analysis, there are several parameters that could affect the analysis and their meaning needs to be explained. "Min triplet periodicity score" is a threshold used to filter the data based on the strength of triplet periodicity signal. Triplet periodicity can be used for identification of the reading frame of translation (Michel et al., 2012). Triplet periodicity, as well as other parameters of ribosome profiling data, vary considerably across different studies (O'Connor et al., 2016). Therefore, not all data offer the same power to accurately identify the translated reading frame. To improve the quality of this parameter we used a triplet periodicity score cutoff of 0.5, meaning any read lengths with a score less than this would not be displayed. In addition to improving detection of the footprints' frame of origin, good triplet periodicity is also an indirect signature of good data quality. Although reducing the number of reads analyzed does reduce the coverage and potentially exclude the detection of certain lowly translated ORFs, a reasonably large number of Ribo-seq datasets pass the 0.5 threshold, see Table 2.
Another important parameter is the use of ambiguously mapped reads. Ribosome footprints are short and therefore often cannot be unambiguously aligned. Enabling such multimapping creates an uncertainty regarding the true origin of the footprint. However, disabling multimapping results in a reduction of footprint density in the areas that share similarity with other sequences from the same genome. A number of approaches to mitigate this issue has been developed, see Kiniry, Michel and Baranov for a review (Kiniry et al., 2020). Trips-Viz, however, can either enable or disable ambiguous reads mapping. Here we disable multimapping by default to maximize the specificity, but sometimes explore ribosome profiling density plots under both modes, as this may help in interpretation of data for genes occurring in multiple copies and for closely related paralogs. In addition, when available, the corresponding RNA-seq studies were also enabled. Distribution of RNA-seq reads can be used   (Guo et al., 2014;Wolfe et al., 2014;Crappé et al., 2015;Werner et al., 2015;Calviello et al., 2016;Goodarzi et al., 2016;Iwasaki et al., 2016;Ji et al., 2016;Park et al., 2016;Xu et al., 2016;Fija-Lkowska et al., 2017;Zhang et al., 2017;Gameiro and Struhl, 2018 to assess whether the annotation of a transcript is supported by the data, as well as to assess the mappability of corresponding regions. Changes in RNA-seq coverage could indicate regions that are difficult to sequence or to align, although RNA-seq data can exhibit its own RNA-seq specific biases, such as an increase of density toward the 3 end due to preferential capture of polyadenylated RNA fragments when poly-dT is used for mRNA capture (Weinberg et al., 2016). We visualized exon locations by enabling "Exon Junctions" on the generated plot legends tab which makes it easier to track in conjunction with genomic alignments. Finally in some individual cases, we also used mass spectrometry data available in Trips-Viz. For visualization of genomic alignments, assessment of gene structures and selection of most appropriate transcript isoforms, we used the GWIPS-viz browser. Unlike Trips-Viz, GWIPSviz provides ribosome profiling data aligned to the reference genome sequences, instead of transcriptome sequences. GWIPSviz is based on the UCSC genome browser (Navarro Gonzalez et al., 2021) and is easy to use for anyone familiar with the latter. In addition to ribosome profiling data tracks, GWIPSviz provides a number of auxiliary tracks that are helpful in the interpretation of ribosome profiling data, such as annotation tracks. Here we used the following tracks: "Basic Annotation Set from Gencode Version 25"; "mRNA-seq coverage from all studies, " which is a global aggregate for the number of RNAseq reads aligned to each coordinate; "Ribosome profiles from all studies, " which is the visualization of inferred coordinates of ribosome A-sites (elongating ribosomes); "Initiating ribosome profiles from all studies" is the track for P-sites of ribosomes captured with translation inhibitors that preferentially arrest initiating ribosomes (Ingolia et al., 2011;Lee et al., 2012). Finally, we also enabled "Basewise Conservation by PhyloP100way" for assessment of nucleotide sequence conservation (Pollard et al., 2010). The default color scheme: elongating ribosome profiles are shown in red; initiating ribosome profiles are in blue; while, RNAseq data are in green (Supplementary Figure 1A). It is important to note that while Trips-Viz alignments are strand-specific since transcripts are single stranded, GWIPS-viz alignments are not strand specific. Strand-specificity is provided only for bacterial genomes where a large proportion of genes overlap and the data interpretation would be difficult otherwise. Strand-specificity is provided only for bacterial genomes where a large proportion of genes overlap and the data interpretation would be difficult otherwise. Translation of overlapping antisense lncRNAs has been reported in mammals (Ruiz-Orera and Albà, 2019); hence, to properly analyze the corresponding loci, it is important to explore the corresponding RNAs in Trips-Viz.
In addition to ribosome profiling data resources, we took advantage of CodAlignView 2 , which differentially colors synonymous and non-synonymous codon substitutions, while also differentially coloring the latter depending on whether they lead to similar or radical changes according to BLOSUM62 (Jungreis et al., 2021). Such visualizations enable manual exploration of evolutionary selection acting on potential protein coding sequences, as synonymous and conservative nonsynonymous substitutions are more frequent in protein coding sequences than radical, non-synonymous substitutions (M. F. Lin et al., 2011). The tool also differentially highlights stop codons and ATG codons, and visualizes other features such as predicted splice sites (Supplementary Figure 1B).

DATA EXPLORATION IN THE CONTEXT OF INDIVIDUAL RNA SEQUENCES
MTLN mRNA as an Example of Genuine Protein-Producing Translation As mentioned earlier, MTLN was previously misannotated as lncRNA LINC00116. Since its productive translation has been extensively characterized, we used it as a "gold standard" example. Figure 1A shows RNA-seq and Ribo-seq data aligned to the sequence of longest MTLN mRNA isoform (ENST00000414416).
However, there appears to be no RNA-seq coverage for most of the annotated sequence up to ∼1500 nucleotides (nt) downstream of the annotated transcript 5 end. For reference, we have also included a Trips-Viz visualization with ambiguously mapped reads enabled (Supplementary Figure 2A). When ambiguous mapping is allowed, only an isolated peak of RNAseq emerges in the region of the second exon, ∼500 nt. The discontinuous RNA-seq coverage strongly suggest that this peak is an artifact of ambiguous mapping. Thus, the data suggest that a much shorter transcript is transcribed in all cells that were used for producing these data ( Table 2). Consistent with that, ribosome profiling data appears only downstream of the fifth ATG in the annotated CDS (third in CDS frame). Indeed, if the entire annotated transcript were to be translated, how would preinitiation ribosome complexes reach the annotated coding sequence (CDS), bypassing ∼25 ATGs upstream? Existence of a shorter transcript explains this conundrum as the fifth ATG in the annotated CDS appears to be the first ATG in the truncated transcript supported by both the RNA-seq and Riboseq data, furthermore initiation at this ATG would be expected under the classic scanning model of translation initiation. Triplet periodicity strongly supports translation of the annotated CDS frame (frame three) indicating the genuine "translational" nature of Ribo-seq reads. Trips-Viz also contains proteomics data on the peptide masses that can be matched in mass-spectrometry datasets using MSFragger (Kong et al., 2017) and Philosopher (da Veiga Leprevost et al., 2020). Supplementary Figure 2B shows a screenshot of available data. Interestingly, while the most abundant peptides match MTLN CDS (in blue), there are also peptides whose masses matches products of conceptual translation of other reading frames (green and red), they may represent false positives.
GWIPS-viz can be used to further explore whether the annotated transcript is supported by available RNA-seq and Ribo-seq data. For example, it is possible that some of the annotated introns are retained in mature RNA transcripts and would not be represented in Gencode and subsequently in Trips-Viz. Since the data are aligned to the genomes in GWIPS-viz, such problems with transcript annotations can be spotted. GWIPS-viz also provides an easy way to examine which RNA isoform is best supported by the data when multiple isoforms are present. The analysis of the MTLN locus on GWIPS-viz ( Figure 1B) did not reveal the presence of RNAseq or Ribo-seq reads in addition to what is seen in Trips-Viz. Further, it can be seen that in addition to the long isoform, there are two additional short isoforms (ENST00000426713 and ENST00000611969), with annotated CDS starts from the same start codon that we proposed on the analysis of data in Trips-Viz. Figure 1C shows an enlarged view of this area. A high peak of footprints obtained by enriching ribosomes at the initiating sites can be seen to match the same ATG. The same region also displays high nucleotide conservation in the PhyloP track with a pattern of triplet periodicity typical to protein coding regions due to higher frequency of substitutions in the third subcodon position relative to the first and second subcodon positions. The substitution patterns can be explored more reliably with CodAlignView (Figure 2), where a white color indicates absolute nucleotide conservation; while predominance of green (synonymous or conservative substitutions) is reflective of protein coding evolution. Yet again, the "green" region coincides with the region of high Ribo-seq density observed in the CDSs of shorter RNA isoforms. For reference, the alignment set used in CodAlignView was the 24-mammal subset of the 100-way vertebrate alignment using the hg38 human genome assembly (Rosenbloom et al., 2015). Of note, it was the shorter proteoform that was detected and characterized in previous studies (Catherman et   In summary, the translation of a short proteoform from the short RNA isoforms of MTLN gene is supported by all types of data explored here. This provides a good reference point for the case of genuine translation resulting in production of a stable protein.

RNase P RNA as an Example of Untranslated RNA
RNase P is a large nucleoprotein complex responsible for processing many RNA molecules (Evans et al., 2006). The RNA component of RNase P is transcribed by polymerase III and therefore is not capped (Schramm and Hernandez, 2002). Thus, it is extremely unlikely to be translated, yet fragments of RNase P RNA could contaminate Ribo-Seq data due to protection within the complex and co-isolation with ribosomes. Therefore, we chose the RNA component of RNase P as an example of an untranslated non-coding RNA. In humans it is encoded by the RPPH1 gene. Figure 3A shows a Trips-Viz screenshot displaying the data aligned to the long RNA isoform RPPH1 (ENST00000554988). Like in the previous case, only part of the annotated transcript is supported by RNA-seq data as visualized in the GWIPSviz browser (Figure 3C), indicating the presence of the shorter transcript isoform (ENST00000516869). There are several isolated peaks of ribosome footprint density across the transcript that do not correspond to a single ORF. One of the longest ORFs, with the largest number of footprints aligned to it, is in the second (green) reading frame and is depicted within a blue rectangle on Figure 3A. It can be explored at higher magnification in Figure 3B. The mapped reads do not show any triplet periodicity, indicating there is no preferential support for a specific reading frame. The PhyloP track in GWIPS-viz  Figure 3C) indicates high, nucleotide conservation expected for the sequence of this important housekeeping RNA molecule. However, it does not exhibit a pattern characteristic for protein coding evolution (prevalence of synonymous and positive nonsynonymous codon substitutions over radical non-synonymous substitutions, see Figure 3D). Thus, RPPH1 represents a genuine example of an untranslated non-coding RNA, with aligned Riboseq data that most likely has origins other than protection by translating ribosomes.

Examples of Translation That Are Unlikely to Produce Proteins
For the exploration of translation of lncRNAs whose translational status is less clear, we chose SNHG8, ZFAS1, and XIST. Their translation has been previously reported by several independent ribosome profiling studies (Ji et al., 2015;Calviello et al., 2016;van Heesch et al., 2019;Chen et al., 2020;Gaertner et al., 2020;Martinez et al., 2020) using different methods for automatic detection of translated ORFs (Fields et al., 2015;Calviello et al., 2016;Ji, 2018). For this analysis, we started with small nucleolar RNA host gene 8 (SNHG8), a lncRNA located on human chromosome 4q26. This lncRNA hosts the H/ACA-box small nucleolar RNA (snoRNA), SNORA24. Non-coding genes that host snoRNAs were found to have short, poorly conserved ORFs and were believed to serve little function outside of carrying snoRNAs in their introns (Tycowski et al., 1996;Smith and Steitz, 1998).
Examination of SNHG8 in GWIPS-viz reveals three isoforms ENST00000602414, ENST00000602483, and ENST00000602819 ( Figure 4A). The first two ATGs match with high footprint peaks of initiating ribosomes and are outlined in blue. Nucleotide conservation at this locus is poor, and a signature of accelerated evolution is seen on the PhyloP track. RNA-seq data suggests that the long isoform ENST00000602414 is most likely transcribed. The eighth ATG (outlined in orange) also matches a high footprint peak of initiating ribosomes. Nucleotide conservation for this ORF is similarly poor as visualized on the PhyloP track. It should be noted that all three RNA isoforms contain this ORF. However, initiation at the eighth ATG is more likely under the classical scanning model on the shorter isoform ENST00000602483, as it is the second ATG from the 5 end. We also note another footprint peak of initiating ribosomes that matches with the ATG (sixth ATG site) located on SNORA24 (ENST00000384096); yet, elongating ribosome footprints would not fully encompass the ORF situated at this locus. The PhyloP tracks reveals high nucleotide conservation at SNORA24 indicating its important functional role.
Based on the features seen on GWIPS-viz, we first examined transcript ENST00000602414 on Trips-Viz ( Figure 4B). Footprints aligned at the first ATG show good triplet periodicity with the reads biased to reading frame one (red). This signal is better visualized in Figure 4C with removal of the footprints supporting other reading frames. The corresponding region is shown in CodAlignView in the reading frame matching the ORF (Figure 4D), the high density of radical codon substitutions is not supportive of protein coding evolution.
For the ORF at the eighth ATG noted on GWIPSviz, we examined transcript ENST00000602483 in Trips-Viz ( Figure 5A). Ribosomal footprints appear aligned to the second ATG and support reading frame three (blue). This region is shown at close zoom on Figure 5B. However, codon substitution pattern ( Figure 5C) is not supportive of translation.
For completion, we visualized SNORA24 in Trips-Viz (Supplementary Figure 3A). Although there are footprints aligned to the ATG site that are biased to a single reading frame (blue), they do not encompass the length of the ORF. Small nucleolar RNAs function in ribosome biogenesis and therefore are likely to be isolated as parts of inactive ribosomal complexes. It is also possible that they are protected within other RNAprotein complexes (Ji et al., 2016).
The next RNA examined was zinc finger antisense 1 transcript (ZFAS1), a lncRNA located on human chromosome 20q13.13. It is positioned at the antisense strand of the 5 end of the protein coding ZNFX1 gene. ZFAS1 also hosts three C/D-box snoRNAs namely SNORD12C, SNORD12B, and SNORD12 in sequential introns (Askarian-Amiri et al., 2011).
In Figure 6A, we observed that there is a lack of RNA-seq data corresponding to the 5 end of the longer ZFAS1 isoforms (ENST00000450535, ENST00000441722, ENST00000417721, and ENST00000371743). To explore whether this is potentially due to mapping artifacts, we enabled the track "Multi-read mappability with 24mers." The Umap track represents the probability that a randomly selected read of k-length (24 base pairs is the default) that overlaps a given position in the unconverted genome is uniquely mappable (Karimzadeh et al., 2018). According to the track the mappability is high in this region. We also noted high footprint peaks of initiating ribosomes at the first two exons of the shorter isoforms (ENST00000428008 and ENST00000326677). The 5 parts of these transcripts are visualized at a closer zoom in Figure 6B. The first high footprint peak of initiating ribosomes occurs on the first exon of the shorter isoforms but does not appear to match any ATG sites. The second peak of initiating ribosomes matches the sixth ATG which is located on SNORD12C (ENST00000386307). There also appears to be high peak of elongating ribosomes at this locus. As expected for a snoRNA SNORD12C sequence is highly conserved as can be seen in the PhyloP track.
The third high footprint peak of initiating ribosomes matches the eighth ATG site. Multiple transcript isoforms appear to contain this ORF, which is outlined in a blue rectangle ( Figure 6B). Poor nucleotide conservation is seen for this sequence on the PhyloP track. RNAseq data support existence of ENST00000428008 and ENST00000326677 transcript isoforms. This ATG is the first ATG from the 5 end in these transcripts. We elected to examine transcript ENST00000428008 in Trips-Viz ( Figure 6C). There is a good support for translation of the ORF that starts with this ATG in the corresponding reading frame (green). The corresponding region is shown at a closer zoom in Figure 7A (RNA-seq data disabled and only reading frame two enabled). However, codon substitution patterns do not support selection typical for protein coding evolution for this ORF (Figure 7B).  Additionally, there were footprints upstream of the first ATG that were biased to reading frame three (blue) on Trips-viz that match the first high footprint peak of initiating ribosomes seen in GWIPS-viz ( Figure 6B). However, when looking specifically only at footprints supporting frame three ( Figure 7C, short black dashes show positions of near cognate start codons CTG and GTG). It is clear that these footprints are not contained within a single ORF and span the area containing a stop codon in this reading frame. Thus, these protected fragments are unlikely to derive from actively translated ribosomes. We further visualized sequencing reads aligned to SNORD12C using Trips-Viz (Supplementary Figure 3B). The distribution of sequencing  Frontiers in Cell and Developmental Biology | www.frontiersin.org reads does not exhibit good triplet periodicity and their positions do not match a particular ORF. Like with other snoRNAs, these fragments are unlikely to be genuine ribosomal footprints. It is more likely that they originate from ribonucleprotein or ribosomal complexes according to snoRNAs role in ribosomal RNA processing (Sloan et al., 2017).
Lastly, we examined X Inactive Specific Transcript (XIST), a nuclear lncRNA with over 19,000 nucleotides (19,296) in humans and located on the q arm of the X chromosome. Previous work proposed that XIST evolved in eutherians from the pseudogenization of a protein coding gene (Duret et al., 2006). Following this, another study suggested XIST had dual origins, namely pseudogenization of a protein coding gene and a set of transposable elements. Specifically, the XIST promoter region and four exons in eutherians retained homology to exons of the protein coding LNX3 gene, while the other six exons were similar to different transposable elements (Elisaphenko et al., 2008). The authors further suggest that the XIST gene lost the coding functions of LNX3 gene, but due to transposon insertions and subsequent partial amplification, formed new functional  domains. These new domains are now believed to be necessary for its role in the silencing of X-chromosome genes (Elisaphenko et al., 2008;Romito and Rougeulle, 2011).
Examining XIST on GWIPS-viz revealed a long transcript on the reverse strand ( Figure 8A). Dense ribosomal peaks and footprints are noted at the 5 end of the long isoform ENST00000429829. Available RNA-seq data further supports that the long isoform is transcribed. Zooming in to the area of dense footprints (Figure 8B), showed a high footprint peak of initiating ribosomes that matches the first ATG. The ORF at this locus, outlined in blue, is very short (30nt), the distribution of footprints is consistent with its translation The second ATG also matches a footprint peak of initiating ribosomes. The ORF at this locus is outlined in orange and shows elongating ribosomes mapped to it.
Trips-Viz visualization of the data for this region of transcript ENST00000429829 is shown in Figure 8C. Translation of ORFs initiated at the first and second ATGs is supported with good triplet periodicity matching expected reading frame three (blue) in both cases. Figure 9A shows distribution of footprints that support only these reading frames. We could see an increase of footprint densities in these ORF that exceeds background. However, the codon substitution patterns do not support protein coding evolution for both ORFs (Figures 9B, C, respectively).

FINAL THOUGHTS
Here we used examples of lncRNAs with reported translated ORFs to guide in the manual examination of publicly available ribosome profiling data using Trips-viz and GWIPSviz. CodAlignView was then used for detailed examination of codon substitution patterns as evidence for evolutionary selection acting on potential protein coding sequences. We used MTLN as an example of genuine protein coding RNA and illustrated typical features of ribosome profiling data and codon substitution patterns associated with genuine ORF translation and protein coding evolution. Expression of lncRNAs is highly specific (Hon et al., 2017;Douka et al., 2021), therefore a long RNA isoform of MTLN (ENST00000414416) may be expressed in some cells, however, translation of such mRNA is unlikely to produce MTLN proteoforms since its start codon cannot be reached by scanning preinitiation complex.
RNA component of RNase P encoded by RPPH1 was used as a negative example to demonstrate the patterns that are inconsistent with translation and protein evolution. Finally, we examined the data available for other lncRNAs with reported translated ORFs, i.e., SNHG8 and ZFAS1 and XIST and concluded that they contain multiple short ORFs that are likely translated even though they do not exhibit signatures of protein coding evolution. We can only speculate on the biological significance of translation of these short ORFs. We do not know if they code any stable and biologically active peptides, as there is no support for their evolutionary selection. Yet it is possible that they could be used by the immune system as antigens for self-recognition. Additionally the translation of these ORFs may influence processing, stability, localization and structural folding of the corresponding lncRNAs irrespective of biological significance of the products of this translation.
Because of the complexity of translation and of ribosome profiling data, it is very difficult to design automatic tools for translation detections that are highly accurate. Thus, we hope that manual examination of individual cases using the tools described here, will benefit researchers in examining translation status of individual ORFs in non-coding RNAs.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found in Trips-viz which is freely available at https://trips. ucc.ie. Source code for Trips-viz is available at https://github. com/skiniry/Trips-viz. Gwips-viz is freely available at https:// gwips.ucc.ie/. CodAlignView is freely available at https://data. broadinstitute.org/compbio1/cav.php.

AUTHOR CONTRIBUTIONS
OZ: data curation, visualization, and writing. SK: software, data curation, and writing. PB: conceptualization, supervision, writing, project administration, and funding acquisition. KD: supervision, writing, project administration, and funding acquisition. All authors contributed to the article and approved the submitted version.