AUTHOR=Ma Dong , Chen Zhihua , He Zhanpeng , Huang Xueqin TITLE=A SNARE Protein Identification Method Based on iLearnPlus to Efficiently Solve the Data Imbalance Problem JOURNAL=Frontiers in Genetics VOLUME=Volume 12 - 2021 YEAR=2022 URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2021.818841 DOI=10.3389/fgene.2021.818841 ISSN=1664-8021 ABSTRACT=Machine learning has been widely used to solve complex problems in engineering applications and 10 scientific fields, and many machine learning-based methods have achieved good results in different 11 fields. SNAREs are key elements of membrane fusion and required for the fusion process of stable 12 intermediates. They are also associated with the formation of some psychiatric disorders. This study 13 processes the original sequence data with the synthetic minority oversampling technique (SMOTE) to 14 solve the problem of data imbalance and produces the most suitable machine learning model with the 15 iLearnPlus platform for the identification of SNARE proteins. Ultimately, a sensitivity of 66.67%, 16 specificity of 93.63%, accuracy of 91.33%, and MCC of 0.528 were obtained in the cross-validation 17 dataset, and a sensitivity of 66.67%, specificity of 93.63%, accuracy of 91.33%, and MCC of 0.528 18 were obtained in the independent dataset (the adaptive skip dipeptide composition descriptor was 19 used for feature extraction, and LightGBM with proper parameters was used as the classifier). These 20 results demonstrate that this combination can perform well in the classification of SNARE proteins 21 and is superior to other methods.