Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Cell. Infect. Microbiol.

Sec. Clinical Infectious Diseases

Volume 15 - 2025 | doi: 10.3389/fcimb.2025.1605485

This article is part of the Research TopicClinical prediction models in cancer through bioinformaticsView all 12 articles

Prediction of bacteremia using routine hematological and metabolic parameters based on logistic regression and random forest models

Provisionally accepted
Shi-Yan  ZhangShi-Yan Zhang1,2*Ting-Qiang  WangTing-Qiang Wang1,2Ying  ZhuoYing Zhuo1,2Chun-E  LvChun-E Lv1,2Jing  ShiJing Shi1,2Ling-Hui  YaoLing-Hui Yao1,2
  • 1Fuding Hospital, Ningde, China
  • 2Fuding Hospital, Fujian University of Traditional Chinese Medicine, Fuding, China

The final, formatted version of the article will be published soon.

Background: To evaluate the predictive utility of routine hematological, inflammatory, and metabolic markers for bacteremia and to compare the classification performance of logistic regression and random forest models.: A retrospective study was conducted on 287 inpatients who underwent blood culture testing at Fuding Hospital, Fujian University of Traditional Chinese Medicine between March and August 2024 . Patients were divided into bacteremia (n = 137) and non-bacteremia (n = 150) groups based on blood culture results.Hematological indices, inflammatory markers (e.g., C-reactive protein (CRP), procalcitonin (PCT)), and metabolic indices (e.g., glucose, cholesterol, etc.) and nutritional markers (e.g., albumin) were analyzed.Univariate and multivariate binary logistic regression analyses were used to identify independent risk factors.Logistic regression and random forest models were developed using 33 features with a 70:30 train-test split and evaluated using the receiver operating characteristic (ROC) curves, confusion matrices and standard classification.Results: Hemoglobin, cholesterol, and albumin levels were significantly lower in the bacteremia group, while platelet count, CRP, PCT, glucose, and triglycerides were significantly elevated (all p < 0.05). Logistic regression identified platelet count (Odds ratios (OR) = 1.003, 95% confidence interval (CI): 1.001-1.006), PCT (OR = 1.032, 95% CI: 1.004 -1.060), triglycerides (OR = 1.740, 95% CI: 1.052 -2.879), and low cholesterol (OR = 0.523, 95% CI: 0.383-0.714) as independent risk factors. The area under the ROC curve (AUC) was 0.75 for the random forest model and 0.74 for logistic regression, with recall rates of 0.69 and 0.60, respectively.Routine laboratory markers integrated into machine learning models demonstrated potential for early bacteremia prediction. Random forest exhibited superior sensitivity compared to logistic regression, suggesting its potential utility as a clinical screening tool.

Keywords: Bacteremia, Blood culture, machine learning, random forest, Logistic regression, biomarkers

Received: 03 Apr 2025; Accepted: 03 Jul 2025.

Copyright: © 2025 Zhang, Wang, Zhuo, Lv, Shi and Yao. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Shi-Yan Zhang, Fuding Hospital, Ningde, China

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.