- Department of Intensive Care Unit, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
Objective: The aim of this study was to develop a reliable model for predicting mortality in patients with catheter-associated urinary tract infection (CAUTI) in intensive care unit (ICU).
Methods: The MIMIC-IV database was used for model development and validation in this study. Data from the first 24 h of ICU admission were collected, and 70% of the data were used to train the model and 30% to validate the model. Four machine learning models, including XGBoost, DecisionTree (DT), Logistic Regression (LR) and Random Forest (RF), were used to construct the prediction model. The SHAP method was used to explain the best performance model.
Results: A total of 545 patients with CAUTI were finally included. The mortality of ICU patients with CAUTI was 7.89% (43/545). The area under the curve (AUC) of the Logistic regression model was 0.871, which showed better prediction performance among the four models. The DecisionTree machine had limited generalization ability, with an AUC of 0.542 and relatively poor prediction accuracy. The SHAP technique revealed 13 most important predictors of CAUTI in order of importance, among which use of vasoactive drugs,shock index,APSIII score, and concomitant malignancy were identified as variables with high predictive significance.
Conclusion: The interpretable prediction model used in this study can help medical staff improve their ability to predict the risk of death in patients with CAUTI in ICU.
Introduction
Catheter-associated urinary tract infection (CAUTI) is one of the most common health care-associated infections in critical care settings worldwide. Epidemiological studies have shown that (1) the incidence of CAUTI varies significantly across different healthcare systems and economic contexts, ranging from 1.3 to 8.9 per 1000 catheter days. Critical CAUTI is closely related to poor outcomes. In low- and middle-income countries (LMICs), the mortality rate of CAUTI is as high as 31.14%. Studies have highlighted that CAUTI prolongs the average length of hospital stay of patients by 17.84 days and generates an additional cost of approximately US $1,006 per case in the United States (1–3). Studies have suggested (4, 5) that once a patient is diagnosed with CAUTI, the risk of related death is about 10%, which brings heavy clinical and economic burden, and the incidence of infection in ICU increases sharply, which significantly affects the prognosis of patients. As the risk of infection increases, especially in critically ill patients, it is critical to accurately predict the risk of CAUTI related mortality, as this information is essential for clinical decision making and appropriate resource allocation (6, 7).
In recent years, the effectiveness of machine learning (ML) in the field of healthcare prediction has been well demonstrated, such as designing a predictive model for prolonged length of stay (LOS) of extremely preterm infants (vpi) for risk management and decision aid in the early postpartum period (8). And machine learning analytics to diagnose and predict the incidence of pneumonia in patients undergoing elective cardiac surgery (9). Given the inherent ability of machine learning algorithms to capture non-linear relationships, more and more researchers advocate the development of new predictive models to improve patient treatment outcomes.
The purpose of our study is to use the Medical Information Database for Intensive Care (MIMIC-IV) to integrate key clinical variables and develop an interpretable model to predict the risk of death in patients with catheter-associated urinary tract infection (CAUTI) in the intensive care unit (ICU). In addition, SHapley Additive exPlanations (SHAP) method was used to explain the model and explore the prognostic factors of CAUTI. Our study provides reference for clinical medical staff by deeply exploring the risk factors related to death. By identifying poor prognostic outcomes in patients at an early stage of the disease, timely interventions can be taken to improve patient survival, and ultimately improve clinical decision-making and patient outcomes (10, 11).
Materials and methods
Data source
This retrospective study utilized the Medical Information Intensive Care (MIMIC-IV) database (v3.1), an iterative version following MIMIC-III. The database complies with HIPAA security regulations and ensures anonymization of the data. MIMIC-IV contains a large amount of clinical data from 70,000 adult intensive care unit (ICU) patients at the Boston Diabetes Research Institute (BIDMC) between 2008 and 2019 (12).
All patient data within the database is anonymized, obviating the need for informed consent. In adherence to the ethical standards articulated in the 1964 Declaration of Helsinki and its subsequent amendments, the study was conducted. Access to the database was secured following the completion of the National Institutes of Health Web-based training course and the Protecting Human Research Participants examination (No. 43258214).
Participant selection
Patients fulfilling specific criteria were screened through the MIMIC-IV database (version 3.1) for this study. We identified individuals in the database meeting the following criteria:
(1) patients were diagnosed with CAUTI according to the International Classification of Diseases, as indicated by ICD-9 codes, or ICD-10 codes
(2) only the initial ICU admission date was considered for patients with multiple ICU admissions
(3) patients were aged 18 years or older.
Patients who had more than 30% missing values were excluded (13). Ultimately, 545 patients were enrolled in this study (Figure 1).
Data extraction, preparation, and definitions
The predicted outcome was the probability of death during a stay in the intensive care unit (ICU). Baseline demographic variables, comorbidities, vital signs, length of hospital stay, severity scores, and laboratory data were extracted from the MIMIC database based on previous studies as well as expert input and implemented using SQL (Structured Query Language) programming. With the exception of length of stay, vital signs were collected within the first 24 h after each ICU admission, whereas other variables were measured at admission. In addition, in order to avoid overfitting, the least absolute shrinkage and selection operator (LASSO) method was used for variable selection and screening, and the LASSO regression was used to select the optimal regularization parameter λ by 10-fold cross-validation (14).
Management of missing data
Missing data often occur in the MIMIC-IV database. However, if these missing values are ignored during the analysis, the results may be biased. Therefore, we used chained equation multiple imputation (MICE) to deal with missing values, and the number of imputation was set to 5 times to deal with missing data (15). The proportion of missing values in each of the selected variables was less than 30%.
Machine learning explainable tool
The prediction model is interpreted by SHAPmethod, which is a comprehensive method that can accurately evaluate the contribution and influence of each feature on the final prediction result. SHAP analysis is implemented based on the SHAP 0.44.0 library of Python 3.8 (16). SHAP values indicate the extent to which each predictor variable affects the target variable, either positively or negatively. Furthermore, each data point can be understood by its specific set of SHAP values.
Statistical analysis
DecisionLinnc1.0 software is used for data analysis, DecisionLinnc1.0 is a platform that integrates multiple programming language environments and realizes data processing, data analysis and machine learning through a visual interface (17). Categorical variables were presented as total amounts and percentages, and the chi-square test or Fisher exact probability method was used to compare the differences between different groups. Continuous variables were expressed as medians and interquartile ranges (IQR), and comparisons between the two groups were performed with the use of the Wilcoxon rank-sum test.
Four machine learning models -XGBoost, DecisionTree (DT), Logistic Regression (LR), and Random Forest (RF) -were used to construct the prediction model. The predictive performance of each model was evaluated by the area under the receiver operating characteristic curve. In addition, we calculated accuracy, precision, and F1 scores. In addition, in order to evaluate the practicality of the model in decision making by quantifying the net benefits under different threshold probabilities, decision curve analysis (DCA) was performed (18).
Results
Patient characteristics
In this study, 545 adult patients diagnosed with CAUTI were included out of a total of 1344 patients with CAUTI in the MIMIC-IV database. The depiction of the patient screening process can be observed in Figure 1.
Table 1 presents the baseline characteristics of 545 patients who fulfilled the inclusion criteria, categorized into the ICU survival group and non-survival group. The mortality rate of ICU patients diagnosed with CAUTI was 7.89% (43/545). Among these patients, there were 261 females (47.89%) and 284 males (52.11%), with a median age of 74 (21–99) years, and the age difference was not significant (P = 0.803). In terms of length of hospital stay, the median length of hospital stay was 11.94 days for survivors and 15.86 days for non-survivors (P = 0.117), and there was no significant difference in length of ICU stay (P = 0.065). Severity of illness score showed that SOFA score, APSIII score, APSII score, OASIS score and shock index of non-survivors were significantly higher than those of survivors (P < 0.05). The duration of mechanical ventilation of non-survivors was significantly longer (P = 0.004), and vital signs such as heart rate, respiratory rate, and blood oxygen saturation showed significant differences. Among the laboratory indicators, lactic acid, PH value, international normalized ratio of prothrombin time and creatinine were significantly poor in non-survivors (P < 0.05). In terms of complications, the incidence of acute renal failure was significantly higher in non-survivors, as was the incidence of malignancies. In terms of drug use, the use of sedatives, analgesics and vasoactive drugs in non-survivors was significantly higher than that in survivors (P < 0.05). The LASSO regularization method was used to select 13 potential predictors from the training dataset, and these factors were used for model development.
Model building and evaluation
The dataset was divided in a random fashion into two parts: 70% of the data was used to train the model, while 30% was used to validate the model. In the training dataset, we built four models: XGBoost, Logistic Regression (LR), Random Forest (RF), and Decision Tree (DT). The AUC values obtained from the test dataset are shown in Figure 2 and Table 2, respectively. Among these models, LR showed superior predictive performance with an AUC of 0.871, while DT had the lowest generalization ability with an AUC of 0.542. The net benefit of the best-performing model was compared with an alternative approach to clinical Decision making using Decision Curve Analysis (DCA) on the test dataset.

Figure 2. The ROC curve was used to compare the performance of four models in predicting the ICU mortality rate of patients with CAUTI.
We evaluated the overall payoffs at different probability thresholds. The assumptions in Figure 3, represented by the black line, assume that all patients received the intervention. On the other hand, the dashed line represents the case where no patient received any intervention. Given the diverse nature of the study population, developing a treatment strategy based on any of the four machine-learning models would be preferable to treating all or none of the patients by default.

Figure 3. Decision curve analysis of four models plotting the net benefit at different threshold probilities.
Explanation model with the SHAP method
The SHAP algorithm was used to determine the importance of each predictor variable in the prediction results of the LR model. The variable importance map presents a list of variables ranked from highest to lowest according to their level of importance.
The use of vasoactive agents was considered to have the highest predictive value of all prediction periods, followed by the shock index, coexisting malignancies and APSIII score (Figure 4).

Figure 4. The significance of variable weights. SpO2, O2 saturation; APSIII, acute physiology and chronic health evaluation III; VP, vasopressor; SA, sedative analgesic; PT, prothrombin time; AKI, acute kidney injury; OASIS, oxford acute illness severity score; Rdw, red blood cell distribution width; SHAP, SHapley Additive explanation.
In addition, SHAP values were used to identify predictor variables that had a significant effect on mortality risk and to determine their positive or negative association with the target outcome. As shown in Figure 5, the horizontal position indicates whether the effect of the value is associated with an increase or decrease in the predicted value, while the color indicates the high or low state of the variable in a particular observation.

Figure 5. The SHapley Additive exPlanation (SHAP) values. SpO2, O2 saturation; APSIII, acute physiology and chronic health evaluation III; VP, vasopressor; SA, sedative snalgesic; PT, prothrombin time; AKI, acute kidney injury; OASIS, oxford acute illness severity score; Rdw, red blood cell distribution width; SHAP, SHapley Additive exPlanation.
SHAP heat force plots
Figure 6 shows the heat force plots for patients who did not survive and survived. The SHAP values provide insights into the predictive factors of individual patients and quantify the impact of each factor on mortality prediction. The numbers highlighted in bold represent the probabilistic predictions (f(x)), while the base values indicate the predictions made by the model without any input. The log odds ratio of each observation is represented by the function f(x). The left side displays red features that are associated with an elevated risk of mortality, while the blue features represent factors linked to a reduced risk of mortality. The magnitude of the effect on the prediction can be easily visualized by observing the length of the arrows.

Figure 6. SHapley Additive exPlanation (SHAP) force plot. SpO2, O2 saturation; VP, vasopressor; SA, sedative analgesic; AKI, acute kidney injury; SHAP, SHapley Additive exPlanation.
Discussion
In this study, we used a comprehensive intensive care unit (ICU) database to perform a retrospective cohort analysis. We focused on the development and validation of four different machine-learning algorithms that effectively predicted mortality in patients diagnosed with catheter-associated urinary tract infection (CAUTI). XGBoost, DT, and RF were all outperformed by the logistic regression (LR) model. The area under the curve (AUC) of the Logistic regression model was 0.871, which showed better prediction performance among the four models. The DecisionTree machine had limited generalization ability, with an AUC of 0.542 and relatively poor prediction accuracy. The poor performance of decision tree models may be related to overfitting, and its complex branch structure has limited generalization ability in small samples. Random Forest and XGBoost were prone to overfitting and calibration drift because the effective number of events was insufficient to stabilize their large parameter spaces. The superior performance of logistic regression may be due to the linear separability of CAUTI mortality prediction and its resistance to overfitting in small samples. In order to ensure the interpretability of the logistic regression model while maintaining its performance, we adopted the SHAP method for interpretation. This will enhance the understanding of the decision-making process of the model by healthcare professionals and facilitate the practical application of the predicted results. It was observed that within this range, logistic regression showed superior performance. In the field of intensive care research, logistic regression has gained significant popularity due to its application in predicting patient mortality during hospitalization, thus potentially helping healthcare professionals to make informed decisions (19–21).
It is essential to evaluate the advantages of early mortality prediction in clinical practice. In this study, 545 adult patients were included from 1344 CAUTI patients diagnosed in the MIMIC-IV database. The mortality of CAUTI patients in intensive care unit (ICU) was 7.89% (43/545). We utilized SHAP to elucidate the LR model and identify key factors associated with in-hospital mortality in CAUTI patients. Shock index, use of vasoactive drugs, concomitant malignancy, and APSIII score were identified as variables with high predictive significance. SHAP risk threshold can help early identification of high-risk patients, and it is recommended to integrate it into the early warning system of ICU electronic medical record.
However, relatively few studies have investigated the risk factors for mortality in patients with catheter-associated urinary tract infection (CAUTI). A high shock index indicates possible hemodynamic instability and is associated with increased mortality in critically ill patients (3). This instability reflects the inability of the body to maintain adequate perfusion and oxygenation of organs, which impairs their function and leads to multiple organ failure, especially in the context of infections such as CAUTI (22). The use of vasoactive drugs usually indicates the presence of severe inflammation and significant cardiovascular damage in patients, and may lead to an increase in CAUTI mortality (5). Patients with malignancies often have compromised immune systems due to the disease itself or treatment options such as chemotherapy and radiotherapy, making them more susceptible to infections, including CAUTI. Studies have shown (23) that patients with cancer face a high incidence of CAUTI, which is associated with an increased risk of death associated with these infections, and that the metabolic activity of the tumor and the potential to develop neutropenia further complicate the treatment of such patients and increase the risk of serious complications. Malignancy is an independent risk factor for 28-day mortality in patients with CAUTI. APSIII is a scoring system that assesses disease severity based on various physiological parameters; higher scores are associated with an increased risk of death in critically ill patients and can be used as a predictor of clinical outcomes (24). In the future, bedside CDSS tools can be developed to generate death risk scores by entering physiological parameters in real time. However, due to the lack of an external validation cohort, further studies are needed to explore the applicability of this research approach.
Limitations
The strength of our research is attributed to the use of a large sample size obtained from the MIMIC database, and the statistical results are quite persuasive. However, there are several limitations in this study. Firstly, since our data were taken from a publicly accessible database, some variables were incomplete. Secondly, all data originated from ICU patients in the MIMIC database, which raises questions about how well our model can be applied to other populations. Thirdly, our mortality prediction models relied on information available within the first 24 h of each ICU admission; this may overlook subsequent events that could alter prognosis and introduce confounding factors to some degree. Lastly, due to the absence of an external validation cohort, the effectiveness of the developed LR model in clinical practice may be limited.
Conclusion
This study provides a methodological basis for the development of a real-time prediction tool for mortality risk in the ICU and demonstrates the utility of artificial intelligence in accurately predicting catheter-associated urinary tract infection (CAUTI) and mortality in patients admitted to the intensive care unit (ICU). We created an interpretable logistic regression prediction model that performed best in assessing the risk of death in patients with CAUTI. Moreover, this interpretable machine learning approach enables effective identification of risk factors associated with CAUTI patients and will help healthcare providers to identify CAUTI patients with high mortality risk, enabling them to take timely and effective treatment measures.
Data availability statement
The original contributions presented in this study are included in this article/supplementary material, further inquiries can be directed to the corresponding author.
Author contributions
LL: Writing – original draft, Writing – review & editing. XY: Writing – original draft. ZC: Writing – review & editing. QZ: Writing – original draft. DZ: Writing – review & editing.
Funding
The author(s) declare that financial support was received for the research and/or publication of this article. This work was supported by the Wenzhou Science and Technology Bureau Project, Y20220607.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declare that no Generative AI was used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
1. Rosenthal V, Memish Z, Nicastri E, Leone S, Bearman G. Preventing catheter-associated urinary tract infections: a position paper of the International society for infectious diseases, 2024 update. Int J Infect Dis. (2025) 151:107304. doi: 10.1016/j.ijid.2024.107304
2. Tedja R, Wentink J, O’Horo J, Thompson R, Sampathkumar P. Catheter-associated urinary tract infections in intensive care unit patients. Infect Control Hosp Epidemiol. (2015) 36:1330–4. doi: 10.1017/ice.2015.172
3. Greene M, Fakih M, Fowler K, Meddings J, Ratz D, Safdar N, et al. Regional variation in urinary catheter use and catheter-associated urinary tract infection: results from a national collaborative. Infect Control Hosp Epidemiol. (2014) 35:S99–106. doi: 10.1086/677825
4. Li F, Song M, Xu L, Deng B, Zhu S, Li X. Risk factors for catheter-associated urinary tract infection among hospitalized patients: a systematic review and meta-analysis of observational studies. J Adv Nurs. (2019) 75:517–27. doi: 10.1111/jan.13863
5. Kelly T, Ai C, Jung M, Yu K. Catheter-associated urinary tract infections (CAUTIs) and non-CAUTI hospital-onset urinary tract infections: relative burden, cost, outcomes and related hospital-onset bacteremia and fungemia infections. Infect Control Hosp Epidemiol. (2024) 45:864–71. doi: 10.1017/ice.2024.26
6. Conway L, Pogorzelska M, Larson E, Stone P. Adoption of policies to prevent catheter-associated urinary tract infections in United States intensive care units. Am J Infect Control. (2012) 40:705–10. doi: 10.1016/j.ajic.2011.09.020
7. Lewis S, Knelson L, Moehring R, Chen L, Sexton D, Anderson D. Comparison of non-intensive care unit (ICU) versus ICU rates of catheter-associated urinary tract infection in community hospitals. Infect Control Hosp Epidemiol. (2013) 34:744–7. doi: 10.1086/671000
8. Yang Y, Yang H, Rong H, Li X, Cheng R, Shen F. Construction and validation of a risk prediction model for prolonged hospitalization of very premature infants. Value Health. (2025): doi: 10.1016/j.jval.2025.06.011 Online ahead of print
9. Endo T, Tran K, Goodin DA, Katsaros G, Xie Z, Fu XA, et al. Predicting and diagnosing pneumonia in patients undergoing elective cardiac surgery through machine learning analysis of exhaled volatile carbonyl compounds. J Thorac Cardiovasc Surg. (2025) S0022-5223(25)00548-3. doi: 10.1016/j.jtcvs.2025.06.028 [Epub ahead of print].
10. Yerzhan A, Razbekova M, Merenkov Y, Khudaibergenova M, Abdildin Y, Sarria-Santamera A, et al. Risk factors and outcomes in critically Ill patients with hematological malignancies complicated by hospital-acquired infections. Medicina. (2023) 59:214. doi: 10.3390/medicina59020214
11. Jahani-Sherafat S, Razaghi M, Rosenthal V, Tajeddin E, Seyedjavadi S, Rashidan M, et al. Device-associated infection rates and bacterial resistance in six academic teaching hospitals of Iran: findings from the International nocosomial infection control consortium (INICC). J Infect Public Health. (2015) 8:553–61. doi: 10.1016/j.jiph.2015.04.028
12. Wu C, Li X, Li J, Huo R, Zhao H, Ying Y. Association between serum calcium and prognosis in patients with acute ischemic stroke in ICU: analysis of the MIMIC-IV database. BMC Anesthesiol. (2024) 24:139. doi: 10.1186/s12871-024-02528-3
13. Li K, Shi Q, Liu S, Xie Y, Liu J. Predicting in-hospital mortality in ICU patients with sepsis using gradient boosting decision tree. Medicine. (2021) 100:e25813. doi: 10.1097/MD.0000000000025813
14. Cheng X, Zhang Q, Fu Z, Shi Z, Xia P, Zhang Y, et al. Establishment of a predictive model for purulent meningitis in preterm infants. Transl Pediatr. (2022) 11:1018–27. doi: 10.21037/tp-22-236
15. Cro S, Morris T, Kenward M, Carpenter J. Sensitivity analysis for clinical trials with missing continuous outcome data using controlled multiple imputation: a practical guide. Stat Med. (2020) 39:2815–42. doi: 10.1002/sim.8569
16. Ponce-Bobadilla A, Schmitt V, Maier C, Mensing S, Stodtmann S. Practical guide to SHAP analysis: explaining supervised machine learning model predictions in drug development. Clin Transl Sci. (2024) 17:e70056. doi: 10.1111/cts.70056
17. Chen H, Wu C, Cao L, Wang R, Zhang T, He Z. The association between the neutrophil-to-lymphocyte ratio and type 2 diabetes mellitus: a cross-sectional study. BMC Endocr Disord. (2024) 24:107. doi: 10.1186/s12902-024-01637-x
18. Van Calster B, Wynants L, Verbeek J, Verbakel J, Christodoulou E, Vickers A, et al. Reporting and interpreting decision curve analysis: a guide for investigators. Eur Urol. (2018) 74:796–804. doi: 10.1016/j.eururo.2018.08.038
19. Zuo W, Yang X. A machine learning model predicts stroke associated with blood cadmium level. Sci Rep. (2024) 14:14739. doi: 10.1038/s41598-024-65633-w
20. Rahman M, Islam K, Prithula J, Kumar J, Mahmud M, Alam M, et al. Correction: machine learning-based prognostic model for 30-day mortality prediction in Sepsis-3. BMC Med Inform Decis Mak. (2024) 24:264. doi: 10.1186/s12911-024-02685-y
21. Zhang Z, Ho K, Hong Y. Machine learning for the prediction of volume responsiveness in patients with oliguric acute kidney injury in critical care. Crit Care. (2019) 23:112. doi: 10.1186/s13054-019-2411-z
22. Apostolopoulou E, Raftopoulos V, Filntisis G, Kithreotis P, Stefanidis E, Galanis P, et al. Surveillance of device-associated infection rates and mortality in 3 Greek intensive care units. Am J Crit Care. (2013) 22:e12–20. doi: 10.4037/ajcc2013324
23. Bursle E, Dyer J, Looke D, McDougall D, Paterson D, Playford E. Risk factors for urinary catheter associated bloodstream infection. J Infect. (2015) 70:585–91. doi: 10.1016/j.jinf.2015.01.001
Keywords: CAUTI, mortality, intensive care unit, prediction, logistic regression, SHAP
Citation: Liu L, Yu X, Chen Z, Zhang Q and Zhuang D (2025) Predicting mortality in intensive care unit patients with CAUTI using an interpretable machine learning model: a retrospective cohort study from MIMIC-IV database. Front. Med. 12:1665035. doi: 10.3389/fmed.2025.1665035
Received: 13 July 2025; Accepted: 25 August 2025;
Published: 09 September 2025.
Edited by:
Paolo Monardo, Papardo Hospital, ItalyReviewed by:
Sukrit Kanchanasurakit, University of Phayao, ThailandJunfan Wei, Guangzhou University of Chinese Medicine, China
Copyright © 2025 Liu, Yu, Chen, Zhang and Zhuang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Danwen Zhuang, NDgyMzE2MTRAcXEuY29t