AUTHOR=Dong Gai-Fang , Zheng Lei , Huang Sheng-Hui , Gao Jing , Zuo Yong-Chun TITLE=Amino Acid Reduction Can Help to Improve the Identification of Antimicrobial Peptides and Their Functional Activities JOURNAL=Frontiers in Genetics VOLUME=Volume 12 - 2021 YEAR=2021 URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2021.669328 DOI=10.3389/fgene.2021.669328 ISSN=1664-8021 ABSTRACT=Antimicrobial peptides (AMPs) are considered as potential substitutes for antibiotics in the field of new anti-infective drug design. There has been several machine learning algorithms and web servers have been developed in identifying antimicrobial peptides and their functional activities. However, there is still room for improvement in prediction algorithms and feature extraction methods. The reduced amino acid alphabet effectively solved the problems of simplifying protein complexity and recognizing the structure conservative region. This article went into details about evaluating the performances of more than 5000 amino acid reduced descriptors generated from 74 types of amino acid reduced alphabet in the first stage and the second stage to construct an excellent two-stage classifier, iAMP-RAAC, for identifying antimicrobial peptides and their functional activities. The results showed that the 1st stage AMP classifier was able to achieve the accuracy of 97.21% and 97.11% for the training data set and independence test data set. In the second stage, our classifier still showed good performance. At least three of the four metrics of SN, SP, ACC, and MCC exceeded the calculation results in the literature. Further, the ANOVA with incremental feature selection (IFS) was used for feature selection to further improve prediction performance. After the feature selection of each stage, the classifier obtained the best results with the fewest features and the highest accuracy. At last, a user-friendly web server for iAMP-RAAC was established at http://bioinfor.imu.edu.cn/iampraac providing predictive services for interested inquiring sequences.