AUTHOR=Zhuang Jujuan , Liu Danyang , Lin Meng , Qiu Wenjing , Liu Jinyang , Chen Size TITLE=PseUdeep: RNA Pseudouridine Site Identification with Deep Learning Algorithm JOURNAL=Frontiers in Genetics VOLUME=Volume 12 - 2021 YEAR=2021 URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2021.773882 DOI=10.3389/fgene.2021.773882 ISSN=1664-8021 ABSTRACT=Background: Pseudouridine(Ψ) is a common ribonucleotide modification, which plays a significant role in many biological processes. The identification of Ψ modification sites is of great significance for disease mechanism and biological processes research, in which the machine learning algorithms are desirable as the lab exploratory techniques are expensive and time-consuming. Results: In this work, we propose a deep learning framework, called PseUdeep, to identify Ψ sites of the three species H. sapiens, S. cerevisiae and M. musculus. In this method, three encoding methods are used to extract the features of RNA sequences, that is, one-hot encoding, K-tuple nucleotide frequency pattern (KNFP) and position-specific nucleotide composition (PSNP). The three feature matrices are convoluted twice and are fed into the capsule neural network and the Bidirectional Gated Recurrent Unit network (BiGRU network) with self-attention mechanism for classification. Conclusion: Compared with the other state-of-the-art methods, our model gets the highest accuracy of the prediction, on the independent testing dataset s-200, the accuracy improves 12.38%. Moreover, the dimension of the features we derived from the RNA sequences is only 109, which is much smaller than those used in the traditional algorithms. On evaluation via 5-fold cross-validation and an independent test, PseUdeep outperforms the best traditional machine learning model available. PseUdeep source code and data sets are available at http://github.com/liudanyang/PseUdee