ORIGINAL RESEARCH article
Front. Cell. Infect. Microbiol.
Sec. Clinical and Diagnostic Microbiology and Immunology
Volume 15 - 2025 | doi: 10.3389/fcimb.2025.1682764
This article is part of the Research TopicMachine Learning and AI-Driven Insights into Microbial Pathogenesis and Drug ResistanceView all articles
An interpretable machine learning model for Early Prediction of Escherichia coli Infection in ICU Patients
Provisionally accepted- Fujian Medical University Union Hospital, Fuzhou, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Background: Early and accurate identification of Escherichia coli (E. coli) infection in intensive care unit (ICU) patients remains challenging butmay improve clinical outcomes if addressed effectively. This study aimed to develop and validate an interpretable machine learning model for early prediction of E. coli infection at ICU admission. Methods: This retrospective study was conducted using the Medical Information Mart for Intensive Care IV (MIMIC-IV) database. Adult patients (aged 18-100 years) with their first ICU admission and a length of stay ≥24 hours were included. E. coli infection was identified based on microbiological results and diagnostic codes. Missing data were imputed using the missForest algorithm. Feature selection was performed with Boruta and least absolute shrinkage and selection operator (LASSO), and intersecting variables were used for model construction. Eight machine learning models, logistic regression, k-nearest neighbors, decision tree, random forest, extreme gradient boosting, light gradient boosting machine, support vector machine (SVM), and neural network, were developed. Model performance in the validation cohort was assessed using area under the receiver operating characteristic curve (AUC) with 95% confidence interval (CI), sensitivity, specificity, F1 score, calibration curves, decision curve analysis (DCA), and clinical impact curves (CIC). Model interpretability was evaluated with Shapley additive explanations (SHAP). Results: A total of 52,554 ICU patients were analyzed, of whom 4,157 (7.9%) had E. coli infection. Twenty-eight intersecting variables were selected for modeling. Among all models, the SVM achieved the highest discrimination (AUC=0.745, 95% CI: 0.726-0.764), followed by random forest (AUC=0.742) and extreme gradient boosting (AUC=0.739). Calibration and decision analyses indicated robust model calibration and clinical utility. SHAP analysis identified gender, age, sepsis, sedative use, and potassium level as the most influential predictors. A web-based tool was developed to enable real-time clinical risk estimation and individualized interpretability. Conclusions: An interpretable SVM-based machine learning model was developed and validated for early prediction of E. coli infection in ICU patients, demonstrating good discrimination, calibration, and potential clinical benefit. The associated online tool provides transparent, individualized risk predictions and may facilitate timely clinical decision-making.
Keywords: Escherichia coli infection, machine learning, Support vector machine, predictive model, Intensive Care Unit
Received: 09 Aug 2025; Accepted: 14 Oct 2025.
Copyright: © 2025 Yang, Zou, Liang, Xu and Chen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
Xiaohong Xu, vancy1988@163.com
Xiaoling Chen, 13365917973@163.com
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.