AUTHOR=Liu Lian , Lei Xiujuan , Fang Zengqiang , Tang Yujiao , Meng Jia , Wei Zhen TITLE=LITHOPHONE: Improving lncRNA Methylation Site Prediction Using an Ensemble Predictor JOURNAL=Frontiers in Genetics VOLUME=Volume 11 - 2020 YEAR=2020 URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2020.00545 DOI=10.3389/fgene.2020.00545 ISSN=1664-8021 ABSTRACT=N6-methyladenosine (m6A) is one of the most widely studied epigenetic modifications. It plays an important role in many biological processes, such as splicing, RNA localization and degradation. Studies have shown that m6A on lncRNA have important functions, including regulating the expression and functions of lncRNA, regulating the synthesis of pre-mRNA, promoting the proliferation of cancer cells and affecting cell differentiation, etc. Although a number of methods have been proposed for predicting m6A RNA methylation sites, most of them aimed at general m6A site prediction without noticing the uniqueness of lncRNA methylation prediction problem. As many lncRNAs do not have polyA tail and cannot be captured in the polyA selection step, which is the most widely adopted RNA-seq library preparation protocol, of RNA-seq library preparation, lncRNA methylation sites cannot be effectively captured and thus are likely to be significantly under-represented in existing experimental data, which will in turn affects the accuracy of existing predictors that were built upon these data in lncRNA methylation site prediction. In this paper, we proposed a new computational framework, LITHOPHONE, which stands for long noncoding RNA methylation site prediction from sequence characteristics and genomic information with an ensemble predictor. We showed that lncRNA and mRNA methylation sites have different patterns exhibited in the extracted features and should be handled differently when making predictions. As the number of known lncRNA m6A sites is limited due to the experiment protocols used and is insufficient for training a reliable lncRNA methylation site predictor, the performance can be improved by combining both lncRNA and mRNA data using an ensemble predictor. We showed that the newly developed LITHOPHONE approach achieved a reasonably good performance when tested on independent datasets (AUC: 0.966 and 0.835 under full transcript and mature mRNA modes, respectively), marking a substantial improvement in lncRNA methylation site prediction compared with existing methods: SRAMP (AUC: 0.827 and 0.749), Gene2vec (AUC: 0.865 and 0.806) and MethyRNA (AUC: 0.801 and 0.679). Additionally, LITHOPHONE was applied to scan the entire human lncRNAome for all possible lncRNA m6A sites, and the results are freely accessible at: http://lith.rnamd.com.