AUTHOR=Chen Dong , Li Sai , Chen Yu TITLE=ISTRF: Identification of sucrose transporter using random forest JOURNAL=Frontiers in Genetics VOLUME=Volume 13 - 2022 YEAR=2022 URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2022.1012828 DOI=10.3389/fgene.2022.1012828 ISSN=1664-8021 ABSTRACT=Sucrose transporter(SUT) is a type of transmembrane proteins that exist widely in plants and play a significant role in transportation of sucrose and specific signal sensing process of sucrose. Therefore, identifying sucrose transporters is significant to the study of seed development and plant flowering, growth. In the paper, a random forest based model named ISTRF was pro-posed to identify sucrose transporters. Firstly, a database containing 382 SUT proteins and 911 non-SUT proteins is constructed based on UniProt and PFAM database. Secondly, k-separated-bigrams-PSSM is exploited to represent protein sequences. Thirdly, to overcome the influence of imbalance of samples on identification performance, Borderline-SMOTE algorithm was used to overcome the shortcoming of imbalance training data. Finally, random forest algorithm was used to train the identification model. It was proved by 10-fold cross-validation results that k-separated-bigrams-PSSM is the most distinguishable feature for identifying sucrose transporter, Borderline-SMOTE algorithm can improve the performance of identification model. Furthermore, random forest is superior to other classifiers on almost all indictors. Compared with other identification models, ISTRF has the best general performance and makes great improvements in identifying sucrose transporter proteins.