AUTHOR=Gao Yue , Li Henan , Zhao Chunjiang , Li Shuguang , Yin Guankun , Wang Hui TITLE=Machine learning and feature extraction for rapid antimicrobial resistance prediction of Acinetobacter baumannii from whole-genome sequencing data JOURNAL=Frontiers in Microbiology VOLUME=14 YEAR=2024 URL=https://www.frontiersin.org/journals/microbiology/articles/10.3389/fmicb.2023.1320312 DOI=10.3389/fmicb.2023.1320312 ISSN=1664-302X ABSTRACT=Background

Whole-genome sequencing (WGS) has contributed significantly to advancements in machine learning methods for predicting antimicrobial resistance (AMR). However, the comparisons of different methods for AMR prediction without requiring prior knowledge of resistance remains to be conducted.

Methods

We aimed to predict the minimum inhibitory concentrations (MICs) of 13 antimicrobial agents against Acinetobacter baumannii using three machine learning algorithms (random forest, support vector machine, and XGBoost) combined with k-mer features extracted from WGS data.

Results

A cohort of 339 isolates was used for model construction. The average essential agreement and category agreement of the best models exceeded 90.90% (95%CI, 89.03–92.77%) and 95.29% (95%CI, 94.91–95.67%), respectively; the exceptions being levofloxacin, minocycline and imipenem. The very major error rates ranged from 0.0 to 5.71%. We applied feature selection pipelines to extract the top-ranked 11-mers to optimise training time and computing resources. This approach slightly improved the prediction performance and enabled us to obtain prediction results within 10 min. Notably, when employing these top-ranked 11-mers in an independent test dataset (120 isolates), we achieved an average accuracy of 0.96.

Conclusion

Our study is the first to demonstrate that AMR prediction for A. baumannii using machine learning methods based on k-mer features has competitive performance over traditional workflows; hence, sequence-based AMR prediction and its application could be further promoted. The k-mer-based workflow developed in this study demonstrated high recall/sensitivity and specificity, making it a dependable tool for MIC prediction in clinical settings.