Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Cell. Infect. Microbiol.

Sec. Clinical and Diagnostic Microbiology and Immunology

Volume 15 - 2025 | doi: 10.3389/fcimb.2025.1682764

This article is part of the Research TopicMachine Learning and AI-Driven Insights into Microbial Pathogenesis and Drug ResistanceView all articles

An interpretable machine learning model for Early Prediction of Escherichia coli Infection in ICU Patients

Provisionally accepted
Shu  YangShu YangLaiyu  ZouLaiyu ZouHuixin  LiangHuixin LiangXiaohong  XuXiaohong Xu*Xiaoling  ChenXiaoling Chen*
  • Fujian Medical University Union Hospital, Fuzhou, China

The final, formatted version of the article will be published soon.

Background: Early and accurate identification of Escherichia coli (E. coli) infection in intensive care unit (ICU) patients remains challenging butmay improve clinical outcomes if addressed effectively. This study aimed to develop and validate an interpretable machine learning model for early prediction of E. coli infection at ICU admission. Methods: This retrospective study was conducted using the Medical Information Mart for Intensive Care IV (MIMIC-IV) database. Adult patients (aged 18-100 years) with their first ICU admission and a length of stay ≥24 hours were included. E. coli infection was identified based on microbiological results and diagnostic codes. Missing data were imputed using the missForest algorithm. Feature selection was performed with Boruta and least absolute shrinkage and selection operator (LASSO), and intersecting variables were used for model construction. Eight machine learning models, logistic regression, k-nearest neighbors, decision tree, random forest, extreme gradient boosting, light gradient boosting machine, support vector machine (SVM), and neural network, were developed. Model performance in the validation cohort was assessed using area under the receiver operating characteristic curve (AUC) with 95% confidence interval (CI), sensitivity, specificity, F1 score, calibration curves, decision curve analysis (DCA), and clinical impact curves (CIC). Model interpretability was evaluated with Shapley additive explanations (SHAP). Results: A total of 52,554 ICU patients were analyzed, of whom 4,157 (7.9%) had E. coli infection. Twenty-eight intersecting variables were selected for modeling. Among all models, the SVM achieved the highest discrimination (AUC=0.745, 95% CI: 0.726-0.764), followed by random forest (AUC=0.742) and extreme gradient boosting (AUC=0.739). Calibration and decision analyses indicated robust model calibration and clinical utility. SHAP analysis identified gender, age, sepsis, sedative use, and potassium level as the most influential predictors. A web-based tool was developed to enable real-time clinical risk estimation and individualized interpretability. Conclusions: An interpretable SVM-based machine learning model was developed and validated for early prediction of E. coli infection in ICU patients, demonstrating good discrimination, calibration, and potential clinical benefit. The associated online tool provides transparent, individualized risk predictions and may facilitate timely clinical decision-making.

Keywords: Escherichia coli infection, machine learning, Support vector machine, predictive model, Intensive Care Unit

Received: 09 Aug 2025; Accepted: 14 Oct 2025.

Copyright: © 2025 Yang, Zou, Liang, Xu and Chen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Xiaohong Xu, vancy1988@163.com
Xiaoling Chen, 13365917973@163.com

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.