AUTHOR=Zhang Yanqin , Li Zhiyuan TITLE=RF_phage virion: Classification of phage virion proteins with a random forest model JOURNAL=Frontiers in Genetics VOLUME=Volume 13 - 2022 YEAR=2023 URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2022.1103783 DOI=10.3389/fgene.2022.1103783 ISSN=1664-8021 ABSTRACT=Phages play essential roles in biological procession, and the virion proteins encoded by the phage genome constitute critical elements of the assembled phage particle. This study focuses on the classification of phage virion proteins using machine learning methods. We proposed a novel approach, RF_phage virion, for the effective classification of virion and non-virion proteins. The model uses four protein sequence coding methods as features, and the random forest algorithm was employed for solving the classification problem. The performance of the RF_phage virion model was analyzed by comparing the performance of this algorithm with that of classical machine learning methods. The proposed method achieved a specificity (Sp) of 93.37%%, sensitivity (Sn) of 90.30%, accuracy (Acc) of 91.84%, Matthews correlation coefficient (MCC) of 0.8371, and an F1 score of 0.9196.