Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Cell. Infect. Microbiol., 28 July 2025

Sec. Clinical Infectious Diseases

Volume 15 - 2025 | https://doi.org/10.3389/fcimb.2025.1605485

This article is part of the Research TopicClinical prediction models in cancer through bioinformaticsView all 14 articles

Prediction of bacteremia using routine hematological and metabolic parameters based on logistic regression and random forest models

Ting-Qiang Wang&#x;Ting-Qiang Wang1†Ying ZhuoYing Zhuo1Chun-E LvChun-E Lv1Jing ShiJing Shi2Ling-Hui YaoLing-Hui Yao1Shi-Yan Zhang*&#x;Shi-Yan Zhang1*†Jinbao Shi*&#x;&#x;Jinbao Shi3*††
  • 1Department of Clinical Laboratory, Fuding Hospital, Fujian University of Traditional Chinese Medicine, Fuding, Fujian, China
  • 2Department of Anesthesiology, Fuding Hospital, Fujian University of Traditional Chinese Medicine, Fuding, Fujian, China
  • 3Department of Nephrology, Ningde Hospital of Traditional Chinese Medicine, Ningde, Fujian, China

Background: This study aimed to evaluate the predictive utility of routine hematological, inflammatory, and metabolic markers for bacteremia and to compare the classification performance of logistic regression and random forest models.

Methods: A retrospective study was conducted on 287 inpatients who underwent blood culture testing at Fuding Hospital, Fujian University of Traditional Chinese Medicine between March and August 2024. Patients were divided into bacteremia (n = 137) and non-bacteremia (n = 150) groups based on blood culture results. Hematological indices, inflammatory markers (e.g., C-reactive protein (CRP), procalcitonin (PCT)), metabolic indices (e.g., glucose, cholesterol) and nutritional markers (e.g., albumin) were analyzed. Univariate and multivariate binary logistic regression analyses were used to identify independent risk factors. Logistic regression and random forest models were developed using 33 features with a 70:30 train-test split and evaluated using the receiver operating characteristic (ROC) curves, confusion matrices and standard classification.

Results: Hemoglobin, cholesterol, and albumin levels were significantly lower in the bacteremia group, while platelet count, CRP, PCT, glucose, and triglycerides were significantly elevated (all p < 0.05). Logistic regression identified platelet count (Odds ratios (OR) = 1.003, 95% confidence interval (CI): 1.001–1.006), PCT (OR = 1.032, 95% CI: 1.004–1.060), triglycerides (OR = 1.740, 95% CI: 1.052–2.879), and low cholesterol (OR = 0.523, 95% CI: 0.383–0.714) as independent risk factors. The area under the ROC curve (AUC) was 0.75 for the random forest model and 0.74 for logistic regression, with recall rates of 0.69 and 0.60, respectively.

Conclusion: Routine laboratory markers integrated into machine learning models demonstrated potential for early bacteremia prediction. Random forest exhibited superior sensitivity compared to logistic regression, suggesting its potential utility as a clinical screening tool.

Background

Bacteremia is a systemic infection resulting from the invasion of pathogenic microorganisms into the bloodstream. Without prompt recognition and treatment, it can progress rapidly to sepsis or septic shock, with mortality rates ranging between 30% and 50% (Rudd et al., 2020). While blood culture remains the diagnostic gold standard, it is hindered by delayed turnaround times (24–72 hours) and reduced sensitivity, especially when prior antibiotic exposure or contamination occurs. These limitations can delay critical treatment decisions (Evans et al., 2021). As a result, there is growing demand for rapid, cost-effective tools to support early bacteremia detection and guide clinical management.

In recent years, routine hematological parameters, such as white blood cell count and the neutrophil-to-lymphocyte ratio, and metabolic markers like glucose and cholesterol have gained attention for their utility in infection detection due to their accessibility and affordability (Zhang et al., 2024). These biomarkers offer partial insight into host inflammatory responses and infection-induced metabolic alterations, with studies demonstrating their moderate diagnostic sensitivity and specificity in infectious disease contexts (Agnello et al., 2021; Gatica et al., 2023; Zhang et al., 2024). However, individual parameters lack predictive power due to complex, nonlinear interactions in systemic infections (Moor et al., 2023). Recent studies have explored the use of machine learning approaches for early bacteremia and sepsis prediction, demonstrating improved risk stratification compared to traditional clinical scores (Moor et al., 2021; Yan et al., 2022; Chua et al., 2025). Machine learning models, which capture such interactions, remain underexplored in resource-limited settings.

This study integrated routine hematological, inflammatory, and metabolic markers to develop logistic regression and random forest prediction models. The primary objective was to evaluate the comparative performance of these models in the early identification of bacteremia and to explore a simple, practical tool for clinical screening support. Additionally, the findings may provide a theoretical foundation for the broader application of such predictive models across diverse healthcare settings.

Materials and methods

Study population

This pilot study retrospectively included 287 hospitalized patients who underwent blood culture testing at Fuding Hospital, Fujian University of Traditional Chinese Medicine, between March and August 2024. Based on the results of blood culture, patients were classified into two groups: the bacteremia group (culture-positive, n = 137) and the non-bacteremia group (culture-negative, n = 150). The sample size reflects all eligible patients meeting the inclusion and exclusion criteria during the specified period, representing the entire population available for analysis at the study center.

Inclusion and exclusion criteria

Inclusion criteria

Patients who underwent at least one blood culture during hospitalization; Availability of complete clinical and laboratory data.

Exclusion criteria

1. Suspected contamination in blood culture results (e.g., common skin flora);

2. Antibiotic use within the preceding three days;

3. Coexisting severe hematological disorders or immunodeficiency;

4. Missing key clinical or laboratory parameters;

5. Blood culture-positive cases lacking clinical signs of infection (e.g., absence of fever, hypotension, or organ dysfunction).

Diagnostic criteria

Diagnosis of bacteremia followed the Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3) (Seymour et al., 2016), and was established if one or more of the following criteria were met: (1) body temperature > 38 °C or <36 °C; (2) systolic blood pressure < 90 mmHg or a drop > 40 mmHg from baseline; (3) isolation of pathogenic organisms from blood cultures (in the case of skin commensals, at least two positive cultures from separate draws were required); (4) evidence of organ dysfunction, such as a Sequential Organ Failure Assessment (SOFA) score ≥ 2.

Ethical approval and compliance

The study protocol was approved by the Ethics Committee of Fuding Hospital, Fujian University of Traditional Chinese Medicine (Approval No. 2023012). All data were anonymized prior to analysis to ensure compliance with ethical and privacy standards. Due to the retrospective nature of the study, the requirement for written informed consent was waived by the ethics committee.

Data collection and laboratory Indicators

Clinical and laboratory data were extracted from the hospital information system and laboratory information system. The following indicators were collected:

Hematological parameters: red blood cell count, hemoglobin, white blood cell count, neutrophils, lymphocytes, platelet count, mean platelet volume (MPV), platelet distribution width (PDW).

Inflammatory markers: CRP and PCT.

Metabolic markers: fasting blood glucose, total cholesterol, triglycerides, uric acid.

Nutritional marker: albumin.

Categorical variables: sex, smoking history, drinking history, hypertension, coronary heart disease, tumor history, diabetes, site of infection.

Laboratory procedures and equipment

All blood samples were obtained prior to the initiation of antimicrobial therapy. Hematological analyses were performed using the Sysmex XN-9000 automated hematology analyzer (Japan). Biochemical markers (glucose, cholesterol, triglycerides, uric acid, albumin) were measured with the Beckman AU5800 automated chemistry analyzer. CRP and PCT levels were assessed using electrochemiluminescence immunoassay. Blood cultures were processed using the BacT/ALERT 3D automated system and pathogen identification was performed with the VITEK MS mass spectrometry system (bioMérieux, France).

Blood cultures were collected using a bilateral dual-bottle approach (aerobic and anaerobic pairs from both arms). If common skin flora such as coagulase-negative staphylococci or Propionibacterium species were isolated, at least two positive cultures from separate sites or repeated collections were required to confirm true infection rather than contamination.

All procedures strictly followed standard operating protocols (SOPs), with internal quality control maintained and coefficients of variation kept below 5%. The present study was centered on bacterial pathogens retrieved from blood cultures. To maintain the research’s focus on bacterial infections and ensure the consistency of statistical analysis, fungi, Mycoplasma, Chlamydia, parasites, and viruses were excluded.

Statistical analysis

Normality of continuous variables was assessed using the Shapiro-Wilk test. Normally distributed data were expressed as mean ± standard deviation and compared using independent samples t-test. Non-normally distributed data were expressed as median (interquartile range, P25 - P75) and analyzed using the Mann-Whitney U test. Categorical variables were presented as frequencies and percentages, with between-group comparisons performed using the chi-square (χ2) test.

To identify risk factors for bacteremia, univariate logistic regression was first conducted. Variables with p < 0.20 were included in the multivariate logistic regression model, and backward stepwise elimination was used to determine independent predictors. Odds ratios (ORs) with 95% confidence intervals (CIs) were reported.

Model development and performance evaluation

Both the logistic regression and random forest models included 33 features, comprising 22 continuous variables (e.g., age, complete blood count indices, CRP, PCT, glucose, cholesterol) and 11 one-hot encoded categorical variables (e.g., sex, comorbidities, site of infection).

The dataset was randomly divided into a training set (n = 201) and a testing set (n = 86) in a 70:30 ratio, with stratification to maintain the proportion of bacteremia cases in both subsets.

Logistic regression model: L2 regularization was applied. Model tuning included optimization of the regularization parameter C (range: 0.001 - 100) and solver type (liblinear, lbfgs).

Random forest model: Hyperparameter optimization was performed via grid search across n_estimators (50, 100, 200), max_depth (None, 10, 20), min_samples_split (2, 5, 10), and min_samples_leaf (1, 2, 4), using five-fold stratified cross-validation. The final model selected 200 decision trees with a maximum tree depth of 20.

Performance metrics included accuracy, precision, recall, F1 score, and AUC.

All traditional statistical analyses were performed using SPSS version 22.0. Machine learning model development and evaluation were conducted in Python 3.7 using the Scikit-learn library. A two-tailed p-value < 0.05 was considered statistically significant.

Results

Shapiro–Wilk test for normality

The Shapiro-Wilk test was applied to assess the distribution of all continuous variables. Results indicated that variables such as hemoglobin, MPV, and albumin conformed to a normal distribution in both groups (p > 0.05). In contrast, most other variables, including CRP, blood glucose, and PCT, exhibited significant deviations from normality (p < 0.05). Accordingly, parametric or non-parametric statistical methods were applied as appropriate in subsequent analyses.

Baseline characteristics of study participants

Baseline characteristics of the bacteremia and non-bacteremia groups are summarized in Table 1. The proportion of female patients was significantly higher in the bacteremia group compared to the non-bacteremia group (47.4% vs. 34.0%, p = 0.020). Additionally, the prevalence of diabetes mellitus was significantly elevated in the bacteremia group (32.1% vs. 20.7%, p = 0.027). Other variables, including age, hypertension, coronary artery disease, malignancy, alcohol consumption, and smoking history, showed no statistically significant differences between groups (all p > 0.05), indicating overall comparability of the two populations with respect to most baseline comorbidities.

Table 1
www.frontiersin.org

Table 1. Baseline characteristics of patients in bacteremia and non-bacteremia groups [n (%)].

Comparison of clinical laboratory parameters

As shown in Table 2, patients in the bacteremia group had significantly lower levels of hemoglobin, red blood cell count, cholesterol, and albumin compared to those in the non-bacteremia group (all p < 0.01). Conversely, levels of platelet count, CRP, PCT, blood glucose, and triglycerides were significantly higher in the bacteremia group (all p < 0.05). No statistically significant differences were observed in other parameters such as mean corpuscular hemoglobin concentration (MCHC) and platelet distribution width (PDW).

Table 2
www.frontiersin.org

Table 2. Comparison of laboratory parameters between the bacteremia and Non-bacteremia groups [median (P25–P75)].

Univariate and multivariate logistic regression analysis

As shown in Table 3, univariate logistic regression analysis revealed that several variables - including hemoglobin, platelet count, PCT, blood glucose, cholesterol, albumin, and diabetes mellitus - were significantly associated with bacteremia. These candidate variables were subsequently entered into a multivariate logistic regression model to identify independent predictors (Table 4).

Table 3
www.frontiersin.org

Table 3. Univariate logistic regression analysis for predictors of bacteremia.

Table 4
www.frontiersin.org

Table 4. Multivariate logistic regression analysis for independent predictors of bacteremia.

The multivariate analysis identified the following as statistically significant independent risk factors for bacteremia: Platelet count (OR = 1.003, p = 0.010); PCT (OR = 1.032, p = 0.023); Triglycerides (OR = 1.740, p = 0.031); Cholesterol (OR = 0.523, p < 0.001).

Performance evaluation of machine learning models

The classification performance of the logistic regression and random forest models on the testing set is presented in Tables 5, 6. The logistic regression model achieved an accuracy of 0.69, with a recall rate of 0.60 for the bacteremia (positive) group and an area under the ROC curve (AUC) of 0.74. The random forest model demonstrated the same overall accuracy (0.69), but with an improved recall rate of 0.69 for the positive group and a slightly higher AUC of 0.75. The confusion matrix (Figure 1) indicated that while the random forest model enhanced sensitivity, it did so at the expense of a modest reduction in specificity. Comparative analysis of the ROC curves (Figure 2) showed similar overall performance between the two models.

Table 5
www.frontiersin.org

Table 5. Classification report for logistic regression model.

Table 6
www.frontiersin.org

Table 6. Classification report for random forest model.

Figure 1
Two confusion matrices compare logistic regression and random forest models. The logistic regression matrix shows 35 true negatives, 10 false positives, 17 false negatives, and 25 true positives. The random forest matrix shows 34 true negatives, 11 false positives, 13 false negatives, and 29 true positives. Both matrices use a color gradient to indicate frequency.

Figure 1. Confusion matrices of logistic regression and random forest models. This figure illustrates the confusion matrices for the logistic regression model (left) and the random forest model (right) on the test dataset. Compared with logistic regression, the random forest model achieved a slightly higher number of true positives (TP = 29 vs. 25) and fewer false negatives (FN = 13 vs. 17), indicating improved sensitivity. However, the random forest model also showed a modest increase in false positives (FP = 11 vs. 10), suggesting a slight reduction in specificity as a trade-off for higher sensitivity.

Figure 2
ROC curve comparison graph showing False Positive Rate on the x-axis and True Positive Rate on the y-axis. Two curves represent Logistic Regression (blue, AUC = 0.74) and Random Forest (red, AUC = 0.75). A diagonal reference line indicates random performance.

Figure 2. Comparison of ROC curves between logistic regression and random forest models. The ROC curves of the two models exhibit similar shapes, indicating that logistic regression and random forest achieved comparable classification performance on this dataset.

Discussion

This study investigated the early predictive value of routine hematological and metabolic biomarkers for bacteremia and compared the classification performance of logistic regression and random forest models. Based on clinical data from 287 hospitalized patients, we identified several biomarkers significantly associated with bacteremia and demonstrated that both machine learning models achieved moderate predictive performance. These findings support the potential utility of routine laboratory indicators as tools for early bacteremia screening.

Independent risk factors

Multivariate logistic regression analysis identified elevated platelet count, PCT, and triglycerides, as well as decreased cholesterol levels, as independent predictors of bacteremia. Increased platelet count may reflect the systemic inflammatory response and coagulation activation during acute infection. Previous studies have associated reactive thrombocytosis with poor outcomes in sepsis (Presume et al., 2022). With respect to procalcitonin (PCT), levels ≥ 0.5 ng/mL are generally regarded as indicative of systemic bacterial infection, while levels >2 ng/mL are associated with higher likelihood of sepsis (Jongwutiwes et al., 2009; Kubo et al., 2024). In our cohort, the median PCT level in bacteremia cases was 0.202 ng/mL (IQR: 0.094 - 0.771), suggesting that even modest elevations may carry predictive value when combined with other biomarkers. This underscores the importance of integrating multiple parameters rather than relying on a single threshold.

Our findings that low cholesterol and elevated triglycerides were independently associated with bacteremia are consistent with the phenomenon of infection-induced metabolic dysregulation (Górecka et al., 2022). Inflammatory cytokines such as interleukin-6 (IL-6) and tumor necrosis factor-alpha (TNF-α) are known to promote hypertriglyceridemia through enhanced hepatic lipogenesis and reduced lipoprotein lipase activity, while simultaneously suppressing cholesterol synthesis and increasing cholesterol catabolism (Górecka et al., 2022; Agnello et al., 2021). Similarly, stress-induced hyperglycemia observed in systemic infections reflects cytokine-driven insulin resistance and increased gluconeogenesis (Leonidou et al., 2008). These mechanisms highlight the pathophysiological basis for the observed metabolic alterations.

Although hyperglycemia did not reach statistical significance in the multivariate analysis (p = 0.067), its borderline association suggests that it should still be considered a clinically relevant risk factor. Bacteremia-induced insulin resistance and enhanced gluconeogenesis may lead to stress-induced hyperglycemia, which can exacerbate organ dysfunction and negatively impact prognosis (Leonidou et al., 2008). Similarly, decreased levels of hemoglobin and albumin, which were significant in univariate analysis, may reflect chronic inflammation, malnutrition, or bone marrow suppression commonly observed in patients with bacteremia (Allison and Lobo, 2024).

Model performance and clinical applicability

The random forest model demonstrated superior sensitivity in detecting positive cases (recall = 0.69) compared to logistic regression (recall = 0.60), with a slightly higher AUC (0.75 vs. 0.74). This suggests that random forest may be more suitable in clinical settings prioritizing sensitivity, such as early screening. In contrast, logistic regression provides interpretable coefficients and stable performance, which may facilitate risk communication and clinical decision-making (Agnello et al., 2024). Our findings are consistent with previous reports. For example, prior machine learning studies, including those by Hernandez et al (Hernandez et al., 2025), incorporated both clinical and laboratory data and achieved AUCs up to 0.83 - 0.90, outperforming our models.

Several established tools, such as qSOFA, SIRS, and NEWS, also rely on physiological parameters that are not captured in our laboratory-based model. The primary advantage of our approach lies in its reliance on inexpensive, routine laboratory tests that can be automated and rapidly obtained, particularly useful in resource-limited settings where clinical scoring may be incomplete. However, the exclusion of physiological data limits our model’s discriminative power, highlighting the need for future multimodal model development.

Future perspectives

Further research should aim to: Incorporate multicenter datasets to enhance model adaptability and robustness. Integrate multimodal data, including clinical scores, microbiological profiles, and imaging features, to improve predictive accuracy. Explore advanced algorithms and interpretability tools, such as XGBoost, LightGBM, SHAP, and LIME, to balance predictive performance with clinical transparency.

Conclusion

In this study, we developed two machine learning models—logistic regression and random forest—to predict bacteremia based on routine hematological and metabolic indicators. Our findings identified elevated platelet count, procalcitonin, and triglycerides, along with decreased cholesterol levels, as independent risk factors for bacteremia. Both models achieved comparable performance with moderate predictive accuracy, while the random forest model demonstrated slightly better sensitivity for identifying positive cases.

The integration of machine learning algorithms with widely available laboratory parameters offers a cost-effective and accessible approach to support early detection of bacteremia, especially in clinical settings where rapid microbiological confirmation is limited. Nonetheless, to improve model generalizability and precision, future work should focus on incorporating high-dimensional clinical, physiological, and molecular data, and validating the models in multicenter prospective cohorts.

Limitations

This study has several limitations. First, as a pilot single-center study with a modest sample size, the statistical power is limited, which may affect the stability and reliability of the predictive models. Second, the restricted time frame may limit the generalizability of the findings, as temporal variability in patient populations or clinical practices was not captured. Third, the single-center design further constrains external validity because patient characteristics and infection epidemiology may differ across institutions and regions.

To address these limitations, we are planning to extend the study period in future research to include a broader range of data over time, which will allow evaluation of temporal trends and enhance model robustness. In addition, we are preparing a multicenter, prospective validation study across hospitals in Fujian Province and beyond, aiming to assess the generalizability, calibration, and clinical utility of our models in diverse healthcare settings. Such external validation will be essential before clinical implementation.

Data availability statement

The original contributions presented in the study are included in the article/supplementary material. Further inquiries can be directed to the corresponding author.

Ethics statement

The studies involving humans were approved by the Ethics Committee of Fuding Hospital, Fujian University of Traditional Chinese Medicine (Approval No. 2023012). The studies were conducted in accordance with the local legislation and institutional requirements. Due to the retrospective nature of the study, the requirement for written informed consent was waived by the ethics committee.

Author contributions

S-YZ: Conceptualization, Software, Writing – review & editing, Writing – original draft, Formal analysis. T-QW: Conceptualization, Software, Writing – review & editing, Writing – original draft. YZ: Conceptualization, Writing – review & editing, Software. C-EL: Writing – review & editing, Software. JS: Methodology, Data curation, Writing – original draft. L-HY: Data curation, Writing – original draft. JBS: Writing – review & editing, Writing – original draft, Supervision, Conceptualization, Formal analysis, Software.

Funding

The author(s) declare that financial support was received for the research and/or publication of this article. The author(s) declare financial support was received for the research, authorship, and/or publication of this article. Financial support from the 2023 Ningde City Natural Science Foundation (health field), China, Grant/Award Number: 2023J46. We also acknowledge the 2023 Ningde City Natural Science Foundation (health field), China, Grant/Award Number: 2023J46.

Acknowledgments

We express our gratitude to the staff of the Department of Clinical Laboratory, Fuding Hospital, Fujian University of Traditional Chinese Medicine for their dedication and assistance in data collection and analysis.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Generative AI was used in the creation of this manuscript.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Agnello, L., Giglio, R. V., Bivona, G., Scazzone, C., Gambino, C. M., Iacona, A., et al. (2021). The value of a complete blood count (CBC) for sepsis diagnosis and prognosis. Diagnostics (Basel) 11. doi: 10.3390/diagnostics11101881

PubMed Abstract | Crossref Full Text | Google Scholar

Agnello, L., Vidali, M., Padoan, A., Lucis, R., Mancini, A., Guerranti, R., et al. (2024). Machine learning algorithms in sepsis. Clinica Chimica Acta 553, 117738. doi: 10.1016/j.cca.2023.117738

PubMed Abstract | Crossref Full Text | Google Scholar

Allison, S. P. and Lobo, D. N. (2024). The clinical significance of hypoalbuminaemia. Clin. Nutr. 43, 909–914. doi: 10.1016/j.clnu.2024.02.018

PubMed Abstract | Crossref Full Text | Google Scholar

Chua, M. T., Boon, Y., Lee, Z. Y., Kok, J. H. J., Lim, C. K. W., Cheung, N. M. T., et al. (2025). The role of artificial intelligence in sepsis in the Emergency Department: a narrative review. Ann. Transl. Med. 13, 4. doi: 10.21037/atm-24-150

PubMed Abstract | Crossref Full Text | Google Scholar

Evans, L., Rhodes, A., Alhazzani, W., Antonelli, M., Coopersmith, C. M., French, C., et al. (2021). Prescott HC et al: Surviving Sepsis Campaign: International Guidelines for Management of Sepsis and Septic Shock 2021. Crit. Care Med. 49, e1063–e1143. doi: 10.1097/CCM.0000000000005337

PubMed Abstract | Crossref Full Text | Google Scholar

Gatica, S., Fuentes, B., Rivera-Asín, E., Ramírez-Céspedes, P., Sepúlveda-Alfaro, J., Catalán, E. A., et al. (2023). Riedel CA et al: Novel evidence on sepsis-inducing pathogens: from laboratory to bedside. Front. Microbiol. 14, 1198200. doi: 10.3389/fmicb.2023.1198200

PubMed Abstract | Crossref Full Text | Google Scholar

Górecka, M., Krzemiński, K., Mikulski, T., and Ziemba, A. W. (2022). ANGPTL4, IL-6 and TNF-α as regulators of lipid metabolism during a marathon run. Sci. Rep. 12, 19940. 10.1038/s41598-022-17439-x

PubMed Abstract | Google Scholar

Hernandez, B., Ming, D. K., Rawson, T. M., Bolton, W., Wilson, R., Vasikasin, V., et al. (2025). Advances in diagnosis and prognosis of bacteraemia, bloodstream infection, and sepsis using machine learning: A comprehensive living literature review. Artif. Intell. Med. 160, 103008. doi: 10.1016/j.artmed.2024.103008

PubMed Abstract | Crossref Full Text | Google Scholar

Jongwutiwes, U., Suitharak, K., Tiengrim, S., and Thamlikitkul, V. (2009). Serum procalcitonin in diagnosis of bacteremia. J. Med. Assoc. Thai 92 Suppl 2, S79–S87.

Google Scholar

Kubo, K., Sakuraya, M., Sugimoto, H., Takahashi, N., Kano, K. I., Yoshimura, J., et al. (2024). Benefits and harms of procalcitonin- or C-reactive protein-guided antimicrobial discontinuation in critically ill adults with sepsis: A systematic review and network meta-analysis. Crit. Care Med. 52, e522–e534. doi: 10.1097/CCM.0000000000006366

PubMed Abstract | Crossref Full Text | Google Scholar

Leonidou, L., Michalaki, M., Leonardou, A., Polyzogopoulou, E., Fouka, K., Gerolymos, M., et al. (2008). Stress-induced hyperglycemia in patients with severe sepsis: a compromising factor for survival. Am. J. Med. Sci. 336, 467–471. doi: 10.1097/MAJ.0b013e318176abb4

PubMed Abstract | Crossref Full Text | Google Scholar

Moor, M., Bennett, N., Plečko, D., Horn, M., Rieck, B., Meinshausen, N., et al. (2023). Predicting sepsis using deep learning across international sites: a retrospective development and validation study. EClinicalMedicine 62, 102124. doi: 10.1016/j.eclinm.2023.102124

PubMed Abstract | Crossref Full Text | Google Scholar

Moor, M., Rieck, B., Horn, M., Jutzeler, C. R., and Borgwardt, K. (2021). Early prediction of sepsis in the ICU using machine learning: A systematic review. Front. Med. (Lausanne) 8, 607952. doi: 10.3389/fmed.2021.607952

PubMed Abstract | Crossref Full Text | Google Scholar

Presume, J., Ferreira, J., Ribeiras, R., and Mendes, M. (2022). Achieving higher efficacy without compromising safety with factor XI inhibitors versus low molecular weight heparin for the prevention of venous thromboembolism in major orthopedic surgery-Systematic review and meta-analysis. J. Thromb. Haemost. 20, 2930–2938. doi: 10.1111/jth.15890

PubMed Abstract | Crossref Full Text | Google Scholar

Rudd, K. E., Johnson, S. C., Agesa, K. M., Shackelford, K. A., Tsoi, D., Kievlan, D. R., et al. (2020). Global, regional, and national sepsis incidence and mortality, 1990-2017: analysis for the Global Burden of Disease Study. Lancet 395, 200–211. doi: 10.1016/S0140-6736(19)32989-7

PubMed Abstract | Crossref Full Text | Google Scholar

Seymour, C. W., Liu, V. X., Iwashyna, T. J., Brunkhorst, F. M., Rea, T. D., Scherag, A., et al. (2016). Singer M et al: Assessment of Clinical Criteria for Sepsis: For the Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3). Jama 315, 762–774. doi: 10.1001/jama.2016.0288

PubMed Abstract | Crossref Full Text | Google Scholar

Yan, M. Y., Gustad, L. T., and Nytrø, Ø (2022). Sepsis prediction, early detection, and identification using clinical text for machine learning: a systematic review. J. Am. Med. Inform Assoc. 29, 559–575. doi: 10.1093/jamia/ocab236

PubMed Abstract | Crossref Full Text | Google Scholar

Zhang, S. Y., Zhuo, Y., Li, B. R., Jiang, Y. Y., Zhang, J., Cai, N., et al. (2024). Identifying key blood markers for bacteremia in elderly patients: insights into bacterial pathogens. Front. Cell Infect. Microbiol. 14, 1472765. doi: 10.3389/fcimb.2024.1472765

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: bacteremia, blood culture, machine learning, random forest, logistic regression, biomarkers

Citation: Wang T-Q, Zhuo Y, Lv C-E, Shi J, Yao L-H, Zhang S-Y and Shi J (2025) Prediction of bacteremia using routine hematological and metabolic parameters based on logistic regression and random forest models. Front. Cell. Infect. Microbiol. 15:1605485. doi: 10.3389/fcimb.2025.1605485

Received: 03 April 2025; Accepted: 03 July 2025;
Published: 28 July 2025.

Edited by:

Wenlin Yang, University of Florida, United States

Reviewed by:

Ruocen Song, University of Florida, United States
Zhaodi Liao, California Institute of Technology, United States

Copyright © 2025 Wang, Zhuo, Lv, Shi, Yao, Zhang and Shi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Shi-Yan Zhang, bXllYm94QDEzOS5jb20=; Jinbao Shi, MTMwMTgwMzM4N0BxcS5jb20=

Present address: Jinbao Shi, Department of Nephrology, Fuding Hospital, Fujian University of Traditional Chinese Medicine, Ningde, Fujian, China

ORCID: Ting-Qiang Wang, orcid.org/0009-0006-5238-4896
Shi-Yan Zhang, orcid.org/0000-0003-4305-8213
Jinbao Shi, orcid.org/0009-0009-2663-8030

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.