AUTHOR=Bar Amir , Argaman Liron , Altuvia Yael , Margalit Hanah TITLE=Prediction of Novel Bacterial Small RNAs From RIL-Seq RNA–RNA Interaction Data JOURNAL=Frontiers in Microbiology VOLUME=Volume 12 - 2021 YEAR=2021 URL=https://www.frontiersin.org/journals/microbiology/articles/10.3389/fmicb.2021.635070 DOI=10.3389/fmicb.2021.635070 ISSN=1664-302X ABSTRACT=The genomic revolution and following large-scale genomic and transcriptomic technologies highlighted hidden genomic treasures, including genomic regions considered previously as non-functional and now shown to be expressed and functional. Among them stand out non-coding small RNAs (sRNAs), identified in RNA-seq and CLIP-seq studies and shown to play important roles in post-transcriptional regulation of gene expression. Yet, large-scale data are noisy, calling for approaches for separating the wheat from the chaff and pinpoint the functional RNAs. Here we address this challenge for bacterial sRNAs, which constitute only a small fraction of the bacterial genome and are expressed under specific conditions. While initially sRNA-encoding genes were identified in intergenic regions, recent evidence suggest that they are also encoded within other, well-defined, genomic elements. This notion was strongly supported by data generated by RIL-seq, a RNA-seq-based methodology we recently developed for deciphering the sRNA-target network. Applying RIL-seq to Escherichia coli, we found that ~64% of the detected RNA pairs involved known sRNAs but ~36% of the pairs did not involve a previously reported sRNA, suggesting that potential novel sRNAs were captured. In the current study we described each RNA by quantitative features that can be extracted from RIL-seq data and show that these features distinguish between the known sRNAs and “other RNAs”. We incorporated these features in a machine learning-based algorithm that predicts novel sRNAs from RIL-seq data. Apparently, two types of features contribute to the prediction of a RNA as sRNA: features that favor the RNA itself as a sRNA and features that disfavor its interacting partners of being sRNAs. Using this algorithm, we identified high-scoring candidates encoded mostly in intergenic regions and 3’ untranslated regions, but also in coding sequences and 5’ untranslated regions, several of which were further tested and verified experimentally. Our study reinforces the emerging concept that sRNAs are encoded within various genomic elements, and provides a computational framework for the detection of additional sRNAs in RIL-seq data of cells expressed under different conditions and of other bacteria.