Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Cell. Infect. Microbiol., 07 July 2025

Sec. Clinical Infectious Diseases

Volume 15 - 2025 | https://doi.org/10.3389/fcimb.2025.1569748

This article is part of the Research TopicInfections in the Intensive Care Unit - Volume IIIView all 21 articles

Development and validation of a multidimensional predictive model for 28-day mortality in ICU patients with bloodstream infections: a cohort study

Jun Jin,&#x;Jun Jin1,2†Lei Yu&#x;Lei Yu1†Qingshan ZhouQingshan Zhou1Qian DuQian Du1Xiangrong NieXiangrong Nie1Hai-Yan Yin*Hai-Yan Yin2*Wan-Jie Gu*Wan-Jie Gu2*
  • 1Department of Intensive Care Unit, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
  • 2Department of Intensive Care Unit, The First Affiliated Hospital of Jinan University, Guangzhou, China

Background: Bloodstream infections (BSI) are a leading cause of sepsis and death in intensive care unit (ICU). Traditional severity scores, including the Sequential Organ Failure Assessment (SOFA), Acute Physiology Score III (APSIII), and Simplified Acute Physiology Score II (SAPS II), exhibit limitations in effectively predicting mortality among BSI patients, primarily due to their reliance on a narrow range of clinical variables. This study aimed to develop and validate a comprehensive nomogram model for 28-day all-cause mortality prediction in BSI patients.

Methods: A retrospective cohort study was conducted using data from 3,615 patients with positive blood cultures from the MIMIC-IV database, divided into training (n=2,532) and validation (n=1,083) cohorts. Through a two-step variable selection process combining LASSO regression and Boruta algorithm, we identified 12 predictive variables from 58 initial clinical parameters. The model’s performance was evaluated using AUROC, net reclassification improvement (NRI), integrated discrimination improvement (IDI), and decision curve analysis (DCA).

Results: The nomogram demonstrated superior discrimination (AUROC: 0.760 vs. 0.671, P<0.001 for SOFA; 0.760 vs. 0.705, P<0.001 for APSIII; 0.760 vs. 0.707, P<0.001 for SAPS II) in the training cohort, with consistent performance in the validation cohort (AUROC: 0.742). Key predictors identified in our model included the need for mechanical ventilation, the presence of malignancy, platelet count, and scores on the Glasgow Coma Scale (GCS). The model showed significant improvements in NRI and IDI, with consistent net benefit across a wide range of threshold probabilities in DCA.

Conclusions: This study developed and validated a predictive model for 28-day mortality in BSI patients that demonstrated superior performance compared to traditional severity scores. By integrating clinical, laboratory, and treatment-related variables, the model provides a more comprehensive approach to risk stratification. These findings highlight its potential for improving early identification of high-risk patients and guiding clinical decision-making, though further prospective validation is needed to confirm its generalizability.

Introduction

Bloodstream infections (BSI) are a major precipitant of sepsis and a significant contributor to mortality in intensive care unit (ICU) worldwide (Wittekamp et al., 2018; Grumaz et al., 2020). Patients with BSI face a heightened risk of adverse outcomes, making early identification and targeted management essential for improving survival rates (Zengin Canalp and Bayraktar, 2021). Traditional severity scores, including the Sequential Organ Failure Assessment (SOFA), Acute Physiology Score III (APSIII), and Simplified Acute Physiology Score II (SAPS II), are commonly employed to evaluate the severity of illness in patients with sepsis. However, these scores have limitations in accurately predicting mortality, particularly in patients with BSI, as they rely on a limited set of clinical variables and may not fully capture the unique pathophysiology of BSI-related sepsis.

Sepsis, characterized by a dysregulated immune response to infection, often leads to life-threatening organ dysfunction (Singer et al., 2016; Cecconi et al., 2018; Meyer and Prescott, 2024), with BSI being a common and severe precipitant. The rising incidence of sepsis, particularly cases involving BSI, underscores the need for more precise risk stratification tools. Current predictive models often fail to account for the distinct clinical and laboratory profiles of BSI-related sepsis, highlighting the importance of a more comprehensive approach. Traditional severity scores may not capture the full spectrum of sepsis pathophysiology, especially in the context of BSI (Tian et al., 2016).

Using data from the Medical Information Mart for Intensive Care (MIMIC) database (Johnson et al., 2023), this study aims to develop and validate a predictive model for 28-day all-cause mortality in patients with positive blood cultures. By incorporating multidimensional patient data, we seek to enhance the accuracy of mortality prediction in this high-risk population. The proposed model has the potential to serve as a valuable clinical tool, enabling early identification of high-risk BSI patients and facilitating targeted interventions to improve outcomes.

Materials and methods

Data source

The data utilized in this study were extracted from the Medical Information Mart for Intensive Care IV (MIMIC-IV) version 3.0 database. This openly accessible repository contains comprehensive medical information from the ICU of the Massachusetts Institute of Technology Beth Israel Deaconess Medical Center (Johnson et al., 2023), covering patient stays between 2008 and 2022. Permission to use the database was obtained (Certificate No.: 56161429).

Study population

The study population comprised adult patients (≥18 years) admitted to the ICU for the first time with positive blood cultures, hospital stays exceeding 24 hours, and complete data on key variables. Patients younger than 18 years, those with hospital stays shorter than 24 hours, those with missing data on key variables, and those not admitted to the ICU for the first time were excluded (Figure 1).

Figure 1
Flowchart illustrating a study design for predicting 28-day mortality in ICU patients. It starts with 94,458 patients from the MIMIC-IV 3.0 database, narrowing to 3,615 for analysis due to criteria like positive blood cultures. These are split into training (2,532) and validation (1,083) cohorts using LASSO regression, Boruta algorithm, and a nomogram. The model is validated with SHAP values, AUC, IDI, NRI, calibration plot, and decision curve analysis. Concludes that certain factors predict mortality.

Figure 1. Overall study flowchart.

Study methods

A total of 58 variables were acquired using SQL (Wu et al., 2021), encompassing baseline data (age, gender, race, BMI, hypertension, diabetes mellitus, malignant tumor, CKD, cirrhosis, heart failure, myocardial infarction, hyperlipidemia, COPD), vital signs (heart rate, systolic blood pressure, diastolic blood pressure, mean arterial pressure, respiratory rate, pulse oximetry, temperature), laboratory tests (GCS, white blood cell count, red blood cell count, platelet count, hemoglobin, RDW, albumin, sodium, potassium, chloride, glucose, pH, partial pressure of carbon dioxide, partial pressure of oxygen, lactate, prothrombin time, PTT, international normalized ratio, total bilirubin, alanine aminotransferase, aspartate aminotransferase, BUN, creatinine), infection and treatment (microorganism, CRRT, MV, vasopressor, midazolam, dexmedetomidine, propofol), outcome measures (length of stay in hospital, length of stay in ICU, in-hospital mortality, ICU mortality), and severity scores (SOFA, APSIII, SAPS II, Charlson Comorbidity Index).

Statistical methods

Data splitting and imputation

For variables with less than 30% missing values, multiple imputations were performed using a regression model. This method was chosen based on the understanding that maintaining a threshold of 30% for missing data helps ensure that imputation methods yield valid and reliable results, thereby minimizing the risk of bias. The imputation process involved iteratively predicting and filling in missing values for each variable, resulting in five complete datasets. One of these datasets was then randomly selected for the final analysis (Zhang, 2016; El Badisy et al., 2024). The research subjects were then randomly assigned into a training set (70%) and a validation set (30%).

Variable selection

The variable selection process was conducted on the training set to ensure the robustness and accuracy of the predictive model. Initially, LASSO regression was employed to identify significant predictive factors. The optimal value of the regularization parameter λ was determined through 10-fold cross-validation using the 1-standard error (1-SE) criterion, which helps prevent overfitting by selecting a simpler model that retains predictive power. This approach enhances the model’s interpretability and stability, ensuring that only the most meaningful variables are included. Variables with coefficients significantly different from zero (considering the applied penalty) were shortlisted (Hu et al., 2021). Subsequently, the Boruta algorithm was applied to further refine the variable selection process. This algorithm compares the importance of each variable with that of a randomly permuted copy of itself, ensuring that only those variables demonstrating significantly higher importance than their randomized counterparts are selected. In this process, only the “confirmed” variables from Boruta were retained, providing a robust measure of significance (Kursa and Rudnicki, 2010). The final model variables were determined by taking the intersection of the variables selected by both the LASSO and Boruta methods, ensuring that only the most significant and robust predictors, which comprehensively reflect patient outcomes, were included.

Collinearity assessment

To evaluate the presence of multicollinearity among the selected variables, the Variance Inflation Factor (VIF) was computed. Variables with a VIF value exceeding 5 were excluded from the model to mitigate the adverse effects of multicollinearity on the regression analysis (Vatcheva et al., 2016).

Model construction

A nomogram was developed using the selected variables to predict 28-day all-cause mortality for patients with BSI. The nomogram incorporated a comprehensive set of demographic characteristics and clinical variables, including age, albumin levels, BUN, use of CRRT, GCS, lactate levels, mechanical ventilation status, presence of a malignant tumor, PTT, platelet count, RDW, and vasopressor use. Each variable was assigned a point value based on its relative contribution to the prediction of mortality risk, allowing for a quantitative assessment of individual patient risk.

Model evaluation

The discriminative ability of the nomogram and the SOFA score was evaluated by assessing the area under the receiver operating characteristic curve (AUROC). The performance improvement of the nomogram compared to the SOFA score, APSIII score, and SAPS II was assessed using the Integrated Discrimination Improvement (IDI) and the Net Reclassification Improvement (NRI). Calibration curves and the Hosmer-Lemeshow test were utilized to evaluate the calibration of the nomogram. The net clinical benefit was determined through the decision curve analysis (DCA) curve.

Model interpretation

To quantify the importance of each variable in the model, the SHAP (SHapley Additive exPlanations) method was employed. SHAP values provide a measure of the contribution of each feature to the prediction, allowing for the interpretation of the model’s output in terms of the impact of individual variables (Garriga et al., 2022).

Data analysis

The data distribution was analyzed using the Shapiro–Wilk test. Continuous data were represented as mean ± standard deviation or median (interquartile range, IQR), while categorical variables were presented as frequencies and ratios (%). Non-parametric tests (Mann–Whitney U test or Kruskal-Wallis test) were employed for non-normally distributed or heteroscedastic data. Pearson’s chi-square test was used to compare categorical data. All statistical analyses were carried out using R software, utilizing various packages including tableone, mice, rms, pROC, dca, and rdma.

Results

Baseline characteristics

We included 3,615 patients with positive blood cultures, 2,532 in the training cohort and 1,083 in the validation cohort. In the training cohort, 71.8% of patients survived, while 28.2% died. Non-survivors were older (median age 69.0 years [IQR, 59.0-79.0] vs 64.0 years [IQR, 52.0-74.0]; P<0.001) and had higher prevalence of myocardial infarction (11.5% vs 7.8%; P=.004), congestive heart failure (35.2% vs 28.7%; P=0.001), chronic obstructive pulmonary disease (10.3% vs 7.2%; P=0.009), malignant tumor (20.3% vs 12.1%; P<0.001), chronic kidney disease (29.7% vs 19.2%; P<0.001), and cirrhosis (16.2% vs 10.1%; P<0.001). Initial vital signs and laboratory findings showed that non-survivors had lower systolic blood pressure (114.0 mm Hg [IQR, 98.0-132.0] vs 117.0 mm Hg [IQR, 101.0-138.0]; P=0.001) and temperature (36.78°C [IQR, 36.44-37.17] vs 36.89°C [IQR, 36.56-37.33]; P<0.001), and higher levels of lactate (2.8 mmol/L [IQR, 1.7-4.9] vs 1.8 mmol/L [IQR, 1.2-2.8]; P<0.001), creatinine (1.8 mg/dL [IQR, 1.1-3.2] vs 1.2 mg/dL [IQR, 0.8-2.1]; P<0.001), and BUN (39 mg/dL [IQR, 24-63] vs 25 mg/dL [IQR, 16-41]; P<0.001). Similar patterns were observed in the validation cohort (Table 1).

Table 1
www.frontiersin.org

Table 1. Baseline characteristics and comparison of training and validation cohorts.

Model development and variable selection

Through a two-step variable selection process combining LASSO regression and Boruta algorithm, we identified 12 predictive variables from the initial set of clinical parameters. LASSO regression initially selected 14 variables (Figures 2A, B), while Boruta algorithm confirmed 30 important features (Figure 2C). The intersection of these methods yielded the final 12 variables: age, albumin, BUN, CRRT, GCS, lactate, mechanical ventilation, malignant tumor, PTT, platelet count, RDW, and vasopressor use (Figure 2D). Multicollinearity assessment demonstrated variance inflation factor values below 2 (range, 1.02-1.29) for all selected variables, indicating minimal collinearity. Based on these variables, we constructed a nomogram to predict 28-day all-cause mortality for patients with BSI (Figure 3). The nomogram incorporated both demographic characteristics and clinical variables, with point values assigned to each predictor based on their relative contribution to mortality risk.

Figure 2
Four-panel image illustrating data analysis and feature selection:  A) A plot showing binomial deviance versus log lambda with error bars and color-coded variables from 0 to 56.  B) A coefficient path plot displaying various clinical variables across log lambda values.  C) Variable importance plot with colored boxes, representing feature importance scores of clinical variables.  D) Feature selection diagram illustrating variables selected by Boruta (orange) and Lasso (blue) with connecting lines.

Figure 2. Process and results of variable selection. (A) Selection of tuning parameter (lambda) in LASSO regression using minimum criteria (left dotted line) and 1-SE criteria (right dotted line). (B) Coefficient distribution created from the log(lambda) sequence. In this study, predictor variables were selected based on the 1-SE criterion (right dotted line), resulting in 14 nonzero coefficients. (C) Importance scores of predictor variables calculated by the Boruta algorithm. The vertical axis represents the importance score in Z-score form, while the horizontal axis lists all predictor variables. (D) Feature selection results showing key variables identified by both Boruta algorithm (orange) and LASSO regression (blue). The final selected variables represent the intersection of both methods, providing high-confidence predictors for the model.LASSO indicates least absolute shrinkage and selection operator; SE, standard error.

Figure 3
Nomogram displaying various medical parameters: Platelet count, Vasopressor use, Glasgow Coma Scale (GCS), Mechanical Ventilation (MV), Albumin, Partial Thromboplastin Time (PTT), presence of Malignant Tumor, Blood Urea Nitrogen (BUN), Continuous Renal Replacement Therapy (CRRT), Lactate, Red Cell Distribution Width (RDW), and Age. Each parameter is associated with a set of points. Total points are shown on a distribution curve indicating a probability of 0.184, aligning to a total score of 370.

Figure 3. Nomogram for predicting the outcome. Nomogram for estimating the probability of the outcome based on selected clinical variables. Each variable contributes points that sum to a total score, which corresponds to the predicted probability on the bottom scale. *p < 0.05; **p < 0.01; ***p < 0.001.

Predictive model performance

The nomogram demonstrated superior discrimination (AUROC, 0.760 [95% CI, 0.740-0.781]) compared with SOFA (0.671 [0.648-0.694]), APSIII (0.705 [0.683-0.728]), and SAPS II (0.707 [0.685-0.729]) (all P<0.001) in the training cohort. In the validation cohort, the nomogram (AUROC, 0.742 [95% CI, 0.709-0.775]) maintained significantly better discrimination than SOFA (0.681 [0.645-0.717], P=0.001) and SAPS II (0.701 [0.665-0.737], P=0.038), although the difference with APSIII (0.715 [0.680-0.750], P=0.129) did not reach statistical significance (Figure 4, Table 2).

Figure 4
Two ROC curve charts compare the performance of four models: SOFA, APSIII, SAPSII, and Nomogram. Chart A shows scores with AUC values of 0.671, 0.707, 0.705, and 0.760, respectively. Chart B shows AUC values of 0.682, 0.707, 0.715, and 0.742, respectively. Sensitivity and 1-Specificity are on the y-axis and x-axis.

Figure 4. ROC curves for predicting 28-day mortality in patients with bloodstream infections. (A) Training Cohort and (B) Validation Cohort compare the performance of the Nomogram, SOFA, APSIII, and SAPSII scores. The Nomogram demonstrates superior predictive ability in both cohorts.

Table 2
www.frontiersin.org

Table 2. Comparison of the performance of four models in predicting 28-day all-cause mortality in patients with positive blood cultures.

Calibration and model reclassification

Calibration was assessed using the Hosmer-Lemeshow test and calibration curves. The Hosmer-Lemeshow test showed good calibration in both the training (χ²=12.39, df=6, P=0.054) and validation cohorts (χ²=11.576, df=6, P=0.072), indicating no significant deviation between predicted and observed outcomes. The calibration curves demonstrated good agreement between predicted and actual probabilities across the entire range of predicted risk (Figure 5).

Figure 5
Two calibration plots labeled A and B compare predicted probabilities to observed probabilities. Both plots feature lines representing Ideal, SOFA, APSIII, SAPSII, and Nomogram models. The y-axis shows observed probabilities from zero to one, while the x-axis displays predicted probabilities, also from zero to one. Lines depict the fit of each model against the ideal line, illustrating the calibration performance of different scoring systems.

Figure 5. Calibration curves for predicting 28-day mortality in patients with bloodstream infections. (A) Training Cohort and (B) Validation Cohort compare the predicted probabilities of the Nomogram, SOFA, APSIII, and SAPSII scores against the observed probabilities. The dashed line represents the ideal calibration (perfect agreement between predicted and observed probabilities). The Nomogram shows the closest alignment to the ideal line in both cohorts, indicating better calibration performance.

The nomogram showed significant improvements in risk reclassification compared with conventional scores. In the training cohort, categorical NRI values were 0.1422 (95% CI, 0.097-0.1873) versus SOFA, 0.0943 (0.054-0.1346) versus APSIII, and 0.0758 (0.0375-0.114) versus SAPS II (all P<0.001). Continuous NRI values showed similar improvements: 0.5859 (0.5023-0.6683) versus SOFA, 0.442 (0.3574-0.5266) versus APSIII, and 0.4175 (0.3325-0.5025) versus SAPS II (all P<0.001) (Table 2). Decision curve analysis demonstrated consistent net benefit across a wide range of threshold probabilities (0.08-0.92 in training; 0.10-0.84 in validation cohorts) (Figure 6). SHAP analysis identified mechanical ventilation, malignancy, platelet count, and GCS as the strongest predictors of mortality (Figure 7).

Figure 6
Two decision curve analysis graphs, labeled A and B, compare standardized net benefits at various high-risk thresholds and cost-benefit ratios for different scoring systems: SOFA, APSIII, SAPSII, Nomogram, All, and None. Each system is represented by a different colored line, showing variations in net benefit across thresholds.

Figure 6. Decision curve analysis for predicting 28-day mortality in patients with bloodstream infections. (A) Training Cohort and (B) Validation Cohort compare the net benefit of the Nomogram, SOFA, APSIII, and SAPSII scores. The Nomogram shows higher clinical utility across a wider range of threshold probabilities in both cohorts.

Figure 7
Panel A shows a violin plot for various features like Age and RDW against SHAP values, indicating impact on predictions. High feature values are in yellow; low in purple. Panel B is a SHAP waterfall plot depicting contributions of features like Albumin and PTT to prediction scores, with positive contributions in yellow and negative in purple.

Figure 7. SHAP analysis for predicting 28-day mortality in bloodstream infection patients. The beeswarm plot (A) shows the distribution of SHAP values for each feature, with color intensity indicating feature values. The force plot (B) illustrates the contribution of individual features to a specific prediction, showing how each feature affects the model’s output.

Discussion

In this cohort study of 3,615 patients with BSI, we developed and validated a predictive model for 28-day all-cause mortality that demonstrated superior performance compared with conventional severity scores, such as SOFA, APSIII, and SAPS II. By integrating 12 key clinical and laboratory variables spanning multiple pathophysiological domains, the model highlights the importance of a multidimensional approach to risk stratification in sepsis. These findings underscore the critical role of combining metabolic, neurological, and immunological indicators with therapeutic interventions to enhance prognostic accuracy.

The superior performance of our model can be attributed to several factors. First, the inclusion of both laboratory and clinical variables provided a more comprehensive assessment of disease severity than traditional scoring systems. SHAP analysis revealed that mechanical ventilation, malignancy, and platelet count were among the strongest predictors, emphasizing the importance of combining intervention requirements, comorbidity burden, and physiological derangements to better capture mortality risk. Second, the model was robustly validated, demonstrating strong discriminative ability in both training and validation cohorts, supporting its potential generalizability across similar populations. The clinical utility of the model is further underscored by its superior performance compared with conventional severity scores and its ability to improve risk stratification across the spectrum of disease severity. Metrics such as NRI and IDI demonstrated significant enhancements in risk prediction, while DCA confirmed consistent net benefit across a wide range of threshold probabilities. These results suggest the model could serve as a valuable tool for early risk stratification, guiding clinical decision-making, and optimizing resource allocation in patients with BSI.

The identified predictors align with current understanding of sepsis pathophysiology. Neurological dysfunction, represented by the GCS, emerged as a key determinant of mortality. Lower GCS scores, indicative of sepsis-associated encephalopathy (SAE) (Jin et al., 2024), were strongly associated with poor outcomes (Bourhy et al., 2022; Fang et al., 2022; Zou et al., 2022), consistent with prior studies highlighting the prognostic significance of neurological status in sepsis. Elevated lactate levels, a marker of tissue hypoperfusion and metabolic dysfunction, were similarly predictive of mortality. Lactate and lactate clearance in acute cardiac care patients, Occurrence and adverse effect on outcome of hyperlactatemia in the critically ill (Khosravani et al., 2009; Attana et al., 2012; Wright et al., 2022), reinforcing their established role as a key indicator of disease severity. Other laboratory predictors, including hypoalbuminemia, elevated BUN, thrombocytopenia, and prolonged PTT, reflect the systemic derangements characteristic of sepsis. These findings align with known mechanisms where hypoalbuminemia signals systemic inflammation and malnutrition (Furukawa et al., 2019; Mahmud et al., 2021), elevated BUN reflects renal dysfunction (Hu et al., 2021; Harazim et al., 2023; Li et al., 2024), and thrombocytopenia and coagulopathy are markers of disseminated intravascular coagulation (DIC) and severe systemic inflammation (Valladolid et al., 2020; Harmon et al., 2021; Jahn et al., 2022). Notably, the prolonged PTT underscores coagulopathic changes, acting as an indicator of the severity of coagulopathy in sepsis and its correlation with poorer clinical outcomes (Guo et al., 2022). Furthermore, the inclusion of RDW enriches this discussion. Elevated RDW levels suggest increased inflammation and oxidative stress within the body, positioning RDW as a significant prognostic marker that indicates a heightened risk of mortality in patients with sepsis (Crook et al., 2022; Wu et al., 2022). Collectively, these variables offer a more nuanced understanding of the complex pathophysiology of sepsis and its implications for mortality risk, emphasizing the necessity of monitoring these parameters in clinical practice.

The inclusion of therapeutic interventions as predictors—mechanical ventilation, continuous renal replacement therapy, and vasopressor use—merits particular attention. While these variables may partly reflect disease severity, their independent contribution to the model suggests they capture unique aspects of the clinical trajectory not fully represented by physiological parameters alone (Baghdadi et al., 2020; Evans et al., 2021; Bhavani et al., 2022). The SHAP analysis highlighted the substantial impact of these interventions on the model’s predictions, suggesting that treatment-related variables may serve as critical markers of disease progression and prognosis. However, careful interpretation is needed to distinguish between markers of severity and potentially modifiable risk factors.

Comorbidities also played a significant role in mortality prediction. Malignancy, in particular, emerged as a strong predictor, likely reflecting the immunosuppressive effects of both the disease and its treatments (Danahy et al., 2019; Hensley et al., 2019; Cooper et al., 2020). The SHAP analysis underscored the substantial contribution of malignancy to the model’s predictive power, highlighting the importance of accounting for comorbid conditions in risk stratification for BSI patients.

In our study, we compared our nomogram model with three traditional severity scores—SOFA, APSIII, and SAPS II—commonly used to assess critically ill patients but limited in predicting mortality in BSI (Vincent and Moreno, 2010).The SOFA score, while a cornerstone for evaluating organ dysfunction, relies on a narrow set of physiological parameters and excludes key factors like comorbidities, treatment interventions, and laboratory markers, reducing its predictive accuracy for BSI-related mortality (Gershengorn et al., 2021). Similarly, APSIII and SAPS II, though incorporating more variables, fail to address the unique pathophysiology of BSI, omitting critical predictors such as mechanical ventilation, malignancy, platelet count, and lactate levels (Le Gall et al., 1993; Nassar et al., 2014). In contrast, our nomogram model adopts a multidimensional approach, integrating demographic, clinical, laboratory, and treatment variables to provide a more comprehensive assessment of disease severity. This holistic design captures the complex interplay of factors influencing mortality, significantly enhancing predictive accuracy and outperforming traditional scores.

This study has several strengths. The use of a large, well-characterized dataset and robust statistical methods for variable selection and model validation enhances the reliability and generalizability of the findings. By integrating diverse clinical and laboratory variables, the model achieves improved discriminatory power and clinical relevance compared with existing severity scores. Several limitations must be acknowledged. The retrospective design and reliance on data from a single healthcare system may limit the generalizability of our findings. Additionally, our model does not account for dynamic changes in variables over time, which could enhance risk prediction. To address these limitations, future research should focus on validating the model prospectively across diverse populations and incorporating longitudinal data to improve predictive accuracy. Furthermore, studies should explore the causal relationships between key predictors and outcomes, identifying modifiable factors for targeted interventions. Incorporating serial measurements of critical variables, such as lactate and platelets, could enhance the model’s ability to capture the evolving clinical trajectory of sepsis, paving the way for more personalized approaches to risk assessment and management for patients with BSI.

Conclusion

This study developed and validated a predictive model for 28-day all-cause mortality in patients with BSI, demonstrating superior performance compared to traditional severity scores. By integrating clinical, laboratory, and treatment-related variables, the model provides a more comprehensive approach to risk stratification. These findings highlight its potential for improving early identification of high-risk patients and guiding clinical decision-making, though further prospective validation is needed to confirm its generalizability.

Data availability statement

The study utilized data sourced from the Medical Information Mart for Intensive Care IV (MIMIC-IV) database. Access to this dataset is restricted to credentialed users who have completed the necessary training (e.g., CITI Data or Specimens Only Research) and signed the data use agreement. Researchers interested in accessing the data can submit requests through PhysioNet at https://physionet.org/.

Ethics statement

The database was approved by the Institutional Review Board of Beth Israel Deaconess Medical Center. The studies were conducted in accordance with the local legislation and institutional requirements. The ethics committee/institutional review board waived the requirement of written informed consent for participation from the participants or the participants’ legal guardians/next of kin because This research adhered to local regulations and institutional guidelines, and no ethical review or approval was necessary. In compliance with national laws and institutional policies, written informed consent from participants or their legal representatives was not mandated for involvement in this study.

Author contributions

JJ: Conceptualization, Data curation, Formal Analysis, Funding acquisition, Methodology, Writing – original draft. LY: Conceptualization, Formal Analysis, Writing – original draft. QZ: Funding acquisition, Supervision, Writing – review & editing. QD: Data curation, Validation, Writing – original draft. XN: Data curation, Software, Validation, Writing – original draft. H-YY: Supervision, Validation, Writing – review & editing. W-JG: Writing – review & editing.

Funding

The author(s) declare that financial support was received for the research and/or publication of this article. This work was supported by the High-level hospital program, China (HKUSZH202207002), the Shenzhen Fundamental Research Program (JCYJ20220530142411026), and the Health Commission of Guangdong Province, China (A2024193). We extend our gratitude to the Massachusetts Institute of Technology and the Beth Israel Deaconess Medical Center for their contribution to the MIMIC project.

Acknowledgments

The authors would like to thank the Massachusetts Institute of Technology and the Beth Israel Deaconess Medical Center for their contributions to the MIMIC project.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The author(s) declared that they were an editorial board member of Frontiers, at the time of submission. This had no impact on the peer review process and the final decision.

Generative AI statement

The author(s) declare that no Generative AI was used in the creation of this manuscript.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fcimb.2025.1569748/full#supplementary-material

Abbreviations

BSI, Bloodstream Infections; ICU, Intensive Care Unit; SOFA, Sequential Organ Failure Assessment; APSIII, Acute Physiology Score III; SAPS II, Simplified Acute Physiology Score II; AUROC, Area Under the Receiver Operating Characteristic Curve; NRI, Net Reclassification Improvement; IDI, Integrated Discrimination Improvement; DCA, Decision Curve Analysis; SHAP, SHapley Additive exPlanations; GCS, Glasgow Coma Scale; CRRT, Continuous Renal Replacement Therapy; MV, Mechanical Ventilation; PTT, Partial Thromboplastin Time; RDW, Red Cell Distribution Width; BUN, Blood Urea Nitrogen; CKD, Chronic Kidney Disease; COPD, Chronic Obstructive Pulmonary Disease; MIMIC, Medical Information Mart for Intensive Care; SQL, Structured Query Language; BMI, Body Mass Index; pH, Potential of Hydrogen; PaCO2, Partial Pressure of Carbon Dioxide; PaO2, Partial Pressure of Oxygen; INR, International Normalized Ratio; ALT, Alanine Aminotransferase; AST, Aspartate Aminotransferase; CCI, Charlson Comorbidity Index; SAE, Sepsis-Associated Encephalopathy; DIC, Disseminated Intravascular Coagulation.

References

Attanà, P., Lazzeri, C., Picariello, C., Dini, C. S., Gensini, G. F., and Valente, S.. (2012). Lactate and lactate clearance in acute cardiac care patients. Eur. Heart J. Acute. Cardiovasc. Care 1, 115–121. doi: 10.1177/2048872612451168

PubMed Abstract | Crossref Full Text | Google Scholar

Baghdadi, J. D., Brook, R. H., Uslan, D. Z., Needleman, J., Bell, D. S., Cunningham, W. E., et al. (2020). Association of a care bundle for early sepsis management with mortality among patients with hospital-onset or community-onset sepsis. JAMA Intern. Med. 180, 707–716. doi: 10.1001/jamainternmed.2020.0183

PubMed Abstract | Crossref Full Text | Google Scholar

Bhavani, S. V., Semler, M., Qian, E. T., Verhoef, P. A., Robichaux, C., Churpek, M. M., et al. (2022). Development and validation of novel sepsis subphenotypes using trajectories of vital signs. Intensive Care Med. 48, 1582–1592. doi: 10.1007/s00134-022-06890-z

PubMed Abstract | Crossref Full Text | Google Scholar

Bourhy, L., Mazeraud, A., Costa, L. H. A., Levy, J., Rei, D., Hecquet, E., et al. (2022). Silencing of amygdala circuits during sepsis prevents the development of anxiety-related behaviours. Brain 145, 1391–1409. doi: 10.1093/brain/awab475

PubMed Abstract | Crossref Full Text | Google Scholar

Cecconi, M., Evans, L., Levy, M., and Rhodes, A.. (2018). Sepsis and septic shock. Lancet 392, 75–87. doi: 10.1016/S0140-6736(18)30696-2

PubMed Abstract | Crossref Full Text | Google Scholar

Cooper, A. J., Keller, S. P., Chan, C., Glotzbecker, B. E., Klompas, M., Baron, R. M., et al. (2020). Improvements in Sepsis-associated Mortality in Hospitalized Patients with Cancer versus Those without Cancer. A 12-Year Analysis Using Clinical Data. Ann. Am. Thorac. Soc. 17, 466–473. doi: 10.1513/AnnalsATS.201909-655OC

PubMed Abstract | Crossref Full Text | Google Scholar

Crook, J. M., Horgas, A. L., Yoon, S. L., Grundmann, O., and Johnson-Mallard, V.. (2022). Vitamin C plasma levels associated with inflammatory biomarkers, CRP and RDW: results from the NHANES 2003–2006 surveys. Nutrients 14, (6). doi: 10.3390/nu14061254

PubMed Abstract | Crossref Full Text | Google Scholar

Danahy, D. B., Jensen, I. J., Griffith, T. S., and Badovinac, V. P.. (2019). Cutting edge: polymicrobial sepsis has the capacity to reinvigorate tumor-infiltrating CD8 T cells and prolong host survival. J. Immunol. 202, 2843–2848. doi: 10.4049/jimmunol.1900076

PubMed Abstract | Crossref Full Text | Google Scholar

El Badisy, I., Graffeo, N., Khalis, M., and Giorgi, R.. (2024). Multi-metric comparison of machine learning imputation methods with application to breast cancer survival. BMC Med. Res. Methodol. 24, 191. doi: 10.1186/s12874-024-02305-3

PubMed Abstract | Crossref Full Text | Google Scholar

Evans, L., Rhodes, A., Alhazzani, W., Antonelli, M., Coopersmith, C. M., French, C., et al. (2021). Surviving sepsis campaign: international guidelines for management of sepsis and septic shock 2021. Intensive Care Med. 47, 1181–1247. doi: 10.1007/s00134-021-06506-y

PubMed Abstract | Crossref Full Text | Google Scholar

Fang, H., Wang, Y., Deng, J., Zhang, H., Wu, Q., He, L., et al. (2022). Sepsis-induced gut dysbiosis mediates the susceptibility to sepsis-associated encephalopathy in mice. mSystems 7, e0139921. doi: 10.1128/msystems.01399-21

PubMed Abstract | Crossref Full Text | Google Scholar

Furukawa, M., Kinoshita, K., Yamaguchi, J., Hori, S., and Sakurai, A.. (2019). Sepsis patients with complication of hypoglycemia and hypoalbuminemia are an early and easy identification of high mortality risk. Intern. Emerg. Med. 14, 539–548. doi: 10.1007/s11739-019-02034-2

PubMed Abstract | Crossref Full Text | Google Scholar

Garriga, R., Mas, J., Abraha, S., Nolan, J., Harrison, O., Tadros, G., et al. (2022). Machine learning model to predict mental health crises from electronic health records. Nat. Med. 28, 1240–1248. doi: 10.1038/s41591-022-01811-5

PubMed Abstract | Crossref Full Text | Google Scholar

Gershengorn, H. B., Holt, G. E., Rezk, A., Delgado, S., Shah, N., Arora, A., et al. (2021). Assessment of disparities associated with a crisis standards of care resource allocation algorithm for patients in 2 US hospitals during the COVID-19 pandemic. JAMA Netw. Open 4, e214149. doi: 10.1001/jamanetworkopen.2021.4149

PubMed Abstract | Crossref Full Text | Google Scholar

Grumaz, C., Hoffmann, A., Vainshtein, Y., Kopp, M., Grumaz, S., Stevens, P., et al. (2020). Rapid next-generation sequencing-based diagnostics of bacteremia in septic patients. J. Mol. Diagn. 22, 405–418. doi: 10.1016/j.jmoldx.2019.12.006

PubMed Abstract | Crossref Full Text | Google Scholar

Guo, F., Zhu, X., Wu, Z., Zhu, L., Wu, J., Zhang, F., et al. (2022). Clinical applications of machine learning in the survival prediction and classification of sepsis: coagulation and heparin usage matter. J. Transl. Med. 20, 265. doi: 10.1186/s12967-022-03469-6

PubMed Abstract | Crossref Full Text | Google Scholar

Harazim, M., Tan, K., Nalos, M., and Matejovic, M.. (2023). Blood urea nitrogen - independent marker of mortality in sepsis. BioMed. Pap. Med. Fac. Univ. Palacky Olomouc. Czech Repub. 167, 24–29. doi: 10.5507/bp.2022.015

PubMed Abstract | Crossref Full Text | Google Scholar

Harmon, M. B. A., Heijnen, N. F. L., de Bruin, S., Sperna Weiland, N. H., Meijers, J. C. M., de Boer, A. M., et al. (2021). Induced normothermia ameliorates the procoagulant host response in human endotoxaemia. Br. J. Anaesth. 126, 1111–1118. doi: 10.1016/j.bja.2021.02.033

PubMed Abstract | Crossref Full Text | Google Scholar

Hensley, M. K., Donnelly, J. P., Carlton, E. F., and Prescott, H. C.. (2019). Epidemiology and outcomes of cancer-related versus non-cancer-related sepsis hospitalizations. Crit. Care Med. 47, 1310–1316. doi: 10.1097/CCM.0000000000003896

PubMed Abstract | Crossref Full Text | Google Scholar

Hu, J., Wang, Y., Tong, X., and Yang, T.. (2021). When to consider logistic LASSO regression in multivariate analysis? Eur. J. Surg. Oncol. 47, 2206. doi: 1016/j.ejso.2021.04.011

Google Scholar

Hu, B., Xu, G., Jin, X., Chen, D., Qian, X., Li, W., et al. (2021). Novel prognostic predictor for primary pulmonary hypertension: focus on blood urea nitrogen. Front. Cardiovasc. Med. 8, 724179. doi: 10.3389/fcvm.2021.724179

PubMed Abstract | Crossref Full Text | Google Scholar

Jahn, K., Shumba, P., Quach, P., Musken, M., Wesche, J., Greinacher, A., et al. (2022). Group B streptococcal hemolytic pigment impairs platelet function in a two-step process. Cells 11, (10). doi: 10.3390/cells11101637

PubMed Abstract | Crossref Full Text | Google Scholar

Jin, J., Yu, L., Zhou, Q., and Zeng, M.. (2024). Improved prediction of sepsis-associated encephalopathy in intensive care unit sepsis patients with an innovative nomogram tool. Front. Neurol. 15, 1344004. doi: 10.3389/fneur.2024.1344004

PubMed Abstract | Crossref Full Text | Google Scholar

Johnson, A. E. W., Bulgarelli, L., Shen, L., Gayles, A., Shammout, A., Horng, S., et al. (2023). MIMIC-IV, a freely accessible electronic health record dataset. Sci. Data 10, 1. doi: 10.1038/s41597-022-01899-x

PubMed Abstract | Crossref Full Text | Google Scholar

Khosravani, H., Shahpori, R., Stelfox, H. T., Kirkpatrick, A. W., and Laupland, K. B.. (2009). Occurrence and adverse effect on outcome of hyperlactatemia in the critically ill. Crit. Care 13, R90. doi: 10.1186/cc7918

PubMed Abstract | Crossref Full Text | Google Scholar

Kursa, M. B. and Rudnicki, W. R. (2010). Feature selection with the boruta package. J. Stat. Software 36, 1–13. doi: 10.18637/jss.v036.i11

Crossref Full Text | Google Scholar

Le Gall, J. R., Lemeshow, S., and Saulnier, F.. (1993). A new Simplified Acute Physiology Score (SAPS II) based on a European/North American multicenter study. JAMA 270, 2957–2963. doi: 10.1001/jama.1993.03510240069035

PubMed Abstract | Crossref Full Text | Google Scholar

Li, M., Han, S., Liang, F., Hu, C., Zhang, B., Hou, Q., et al. (2024). Machine learning for predicting risk and prognosis of acute kidney disease in critically ill elderly patients during hospitalization: internet-based and interpretable model study. J. Med. Internet Res. 26, e51354. doi: 10.2196/51354

PubMed Abstract | Crossref Full Text | Google Scholar

Mahmud, N., Fricker, Z., Hubbard, R. A., Ioannou, G. N., Lewis, J. D., Taddei, T. H., et al. (2021). Risk prediction models for post-operative mortality in patients with cirrhosis. Hepatology 73, 204–218. doi: 10.1002/hep.31558

PubMed Abstract | Crossref Full Text | Google Scholar

Meyer, N. J. and Prescott, H. C. (2024). Sepsis and septic shock. N Engl. J. Med. 391, 2133–2146. doi: 10.1056/NEJMra2403213

PubMed Abstract | Crossref Full Text | Google Scholar

Nassar, A. P., Malbouisson, L. M. S., and Moreno, R.. (2014). Evaluation of Simplified Acute Physiology Score 3 performance: a systematic review of external validation studies. Crit. Care 18, R117. doi: 10.1186/cc13911

PubMed Abstract | Crossref Full Text | Google Scholar

Singer, M., Deutschman, C. S., Seymour, C. W., Shankar-Hari, M., Annane, D., Bauer, M., et al. (2016). The third international consensus definitions for sepsis and septic shock (Sepsis-3). JAMA 315, 801–810. doi: 10.1001/jama.2016.0287

PubMed Abstract | Crossref Full Text | Google Scholar

Tian, L., Tan, R., Chen, Y., Sun, J., Liu, J., Qu, H., et al. (2016). Epidemiology of Klebsiella pneumoniae bloodstream infections in a teaching hospital: factors related to the carbapenem resistance and patient mortality. Antimicrob. Resist. Infect. Control 5, 48. doi: 10.1186/s13756-016-0145-0

PubMed Abstract | Crossref Full Text | Google Scholar

Valladolid, C., Martinez-Vargas, M., Sekhar, N., Lam, F., Brown, C., Palzkill, T., et al. (2020). Modulating the rate of fibrin formation and clot structure attenuates microvascular thrombosis in systemic inflammation. Blood Adv. 4, 1340–1349. doi: 10.1182/bloodadvances.2020001500

PubMed Abstract | Crossref Full Text | Google Scholar

Vatcheva, K. P., Lee, M., McCormick, J. B., and Rahbar, M. H.. (2016). Multicollinearity in regression analyses conducted in epidemiologic studies. Epidemiol. (Sunnyvale) 6, (2). doi: 10.4172/2161-1165.1000227

PubMed Abstract | Crossref Full Text | Google Scholar

Vincent, J. and Moreno, R. (2010). Clinical review: scoring systems in the critically ill. Crit. Care 14, 207. doi: 10.1186/cc8204

PubMed Abstract | Crossref Full Text | Google Scholar

Wittekamp, B. H., Plantinga, N. L., Cooper, B. S., Lopez-Contreras, J., Coll, P., Mancebo, J., et al. (2018). Decontamination strategies and bloodstream infections with antibiotic-resistant microorganisms in ventilated patients: A randomized clinical trial. JAMA 320, 2087–2098. doi: 10.1001/jama.2018.13765

PubMed Abstract | Crossref Full Text | Google Scholar

Wright, S. W., Hantrakun, V., Rudd, K. E., Lau, C., Lie, K. C., Chau, N. V. V., et al. (2022). Enhanced bedside mortality prediction combining point-of-care lactate and the quick Sequential Organ Failure Assessment (qSOFA) score in patients hospitalised with suspected infection in southeast Asia: a cohort study. Lancet Glob. Health 10, e1281–e1288. doi: 10.1016/S2214-109X(22)00277-7

PubMed Abstract | Crossref Full Text | Google Scholar

Wu, W., Li, Y., Feng, A., Li, L., Huang, T., Xu, A., et al. (2021). Data mining in clinical big data: the frequently used databases, steps, and methodological models. Mil. Med. Res. 8, 44. doi: 10.1186/s40779-021-00338-z

PubMed Abstract | Crossref Full Text | Google Scholar

Wu, H., Liao, B., Cao, T., Ji, T., Huang, J., Ma, K., et al. (2022). Diagnostic value of RDW for the prediction of mortality in adult sepsis patients: A systematic review and meta-analysis. Front. Immunol. 13, 997853. doi: 10.3389/fimmu.2022.997853

PubMed Abstract | Crossref Full Text | Google Scholar

Zengin Canalp, H. and Bayraktar, B. (2021). Direct rapid identification from positive blood cultures by MALDI-TOF MS: specific focus on turnaround times. Microbiol. Spectr. 9, e0110321. doi: 10.1128/spectrum.01103-21

PubMed Abstract | Crossref Full Text | Google Scholar

Zhang, Z. (2016). Multiple imputation with multivariate imputation by chained equation (MICE) package. Ann. Transl. Med. 4, 30. doi: 10.3978/j.issn.2305-5839.2015.12.63

PubMed Abstract | Crossref Full Text | Google Scholar

Zou, L., He, J., Gu, L., Shahror, R. A., Li, Y., Cao, T., et al. (2022). Brain innate immune response via miRNA-TLR7 sensing in polymicrobial sepsis. Brain Behav. Immun. 100, 10–24. doi: 10.1016/j.bbi.2021.11.007

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: bloodstream infections, predictive model, nomogram, 28-day all-cause mortality, sepsis, intensive care unit, MIMIC-IV database

Citation: Jin J, Yu L, Zhou Q, Du Q, Nie X, Yin H-Y and Gu W-J (2025) Development and validation of a multidimensional predictive model for 28-day mortality in ICU patients with bloodstream infections: a cohort study. Front. Cell. Infect. Microbiol. 15:1569748. doi: 10.3389/fcimb.2025.1569748

Received: 01 February 2025; Accepted: 18 June 2025;
Published: 07 July 2025.

Edited by:

Yuetian Yu, Shanghai Jiao Tong University, China

Reviewed by:

Bin Lin, Changxing People’s Hospital, China
Jingchao Shi, Jinhua Central Hospital, China

Copyright © 2025 Jin, Yu, Zhou, Du, Nie, Yin and Gu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Hai-Yan Yin, aGFpeWFueWluMTg2N0AxMjYuY29t; Wan-Jie Gu, d2FuamllZ3VAaG90bWFpbC5jb20=

These authors have contributed equally to this work

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.