AUTHOR=Zhang Xinru , Wang Shutao , Xie Lina , Zhu Yuhui TITLE=PseU-ST: A new stacked ensemble-learning method for identifying RNA pseudouridine sites JOURNAL=Frontiers in Genetics VOLUME=Volume 14 - 2023 YEAR=2023 URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2023.1121694 DOI=10.3389/fgene.2023.1121694 ISSN=1664-8021 ABSTRACT=Background: Pseudouridine (Ψ) is one of the most abundant RNA modifications found in a variety RNA types, and it plays a significant role in many biological processes. The key to studying the various biochemical functions and mechanisms of Ψ is to identify the corresponding Ψ sites. However, identifying Ψ sites using experimental methods is expensive and time-consuming. Therefore, it is necessary to develop computational methods which can accurately predict Ψ sites based on RNA sequence information. Methods: In this study, we proposed a new model called PseU-ST to identify Ψ sites in Homo sapiens (H. sapiens), Saccharomyces cerevisiae (S. cerevisiae), and Mus musculus (M. musculus). We selected the best six coding schemes and four machine learning algorithms based on a comprehensive test of almost all RNA sequence coding schemes in the iLearnPlus software package, and selected the optimal features for each encoding scheme using chi-square and incremental feature selection (IFS) algorithms. Then, we selected the optimal feature combination and the best base-classifier combination for each species through an extensive performance comparison and employed a stacking strategy to build the predictive model. Results: Empirical performance benchmarking tests demonstrated that PseU-ST achieved better prediction performance compared to other existing models. PseU-ST’s accuracy scores were 93.64%, 87.74%, and 89.64% on H_990, S_628, and M_944, respectively, representing increments of 13.94%, 6.05%, and 0.26% versus the best existing methods on the same benchmark training datasets. Conclusion: The data indicate that PseU-ST is a very competitive prediction model for identifying RNA Ψ sites in H. sapiens, M. musculus, and S. cerevisiae. In addition, we found that the Position-specific trinucleotide propensity based on single strand (PSTNPss) and Position-specific of three nucleotides (PS3) features play an important role in Ψ site identification. The source code for PseU-ST and data are available in our GitHub repository (https://github.com/jluzhangxinrubio/PseU-ST).