Interpretable machine learning for early predicting the risk of ventilator-associated pneumonia in ischemic stroke patients in the intensive care unit

Cao, Heshan; Wei, Junying; Hua, Ping; Yang, Songran

doi:10.3389/fneur.2025.1513732

ORIGINAL RESEARCH article

Front. Neurol., 07 May 2025

Sec. Artificial Intelligence in Neurology

Volume 16 - 2025 | https://doi.org/10.3389/fneur.2025.1513732

This article is part of the Research TopicAI's Transformative Role in Neuro-Intervention: Enhancing Diagnosis and Treatment StrategiesView all 6 articles

Interpretable machine learning for early predicting the risk of ventilator-associated pneumonia in ischemic stroke patients in the intensive care unit

Heshan Cao¹^†

Junying Wei²^†

Ping Hua³^*

Songran Yang^1,4^*

¹Department of Neurology, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, China
²Department of Anaesthesiology, The First Affiliated Hospital of Guangzhou University of Chinese Medicine, Guangzhou, China
³Department of Cardio-Vascular Surgery, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, China
⁴Department of Biobank and Bioinformatics, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, China

Background: The incidence of ventilator-associated pneumonia (VAP) in ischemic stroke (IS) patients is linked to a variety of detrimental outcomes. Current approaches for the early identification of individuals at high risk for developing VAP are limited and often lack clinical interpretability. The goal of this study is to develop and validate an interpretable machine learning (ML) model for early predicting VAP risk in IS patients in the intensive care unit (ICU).

Methods: Data on IS patients were extracted from versions 2.2 and 3.0 of the Medical Information Mart for Intensive Care-IV database, with version 2.2 being used for model training and internal validation and version 3.0 for external testing. The primary outcome was the incidence of VAP post-ICU admission. The Boruta algorithm was used to select features prior to developing 10 ML models. The Shapley Additive Explanation (SHAP) method was employed to assess the global and local interpretability of the model’s decision-making process. The final model and Streamlit were used for developing and launching an online web application.

Results: A total of 419 IS patients were included, with 401 in the derivation and 118 in the test group. Following feature selection, seven clinical characteristics were incorporated in the ML model: systolic and diastolic blood pressure, international normalized ratio, length of stay before mechanical ventilation, dysphagia, antibiotic counts and suctioning counts. Among the 10 evaluated ML models, the Random Forest (RF) model outperformed the others, achieving an internal validation AUC of 0.776, accuracy of 0.704, sensitivity of 0.900, and specificity of 0.588. In external testing, performance dropped to an AUC of 0.644, accuracy of 0.610, sensitivity of 0.688, and specificity of 0.519, raising concerns about the model’s generalizability.

Conclusion: The RF model is reliable in early identifying high-risk IS patients for VAP. The SHAP method offers clear and intuitive explanations for individual risk assessment. The web-based tool has the potential to improve clinical outcomes by promptly recognizing patients at increased VAP risk and facilitating early intervention, further multicenter prospective studies are required to validate its generalizability and practical utility.

1 Introduction

According to World Stroke Organization statistics for 2022, stroke continues to be the second leading cause of mortality and the third leading cause of disability worldwide, thereby posing a significant threat to public health (1). Ischemic stroke (IS) is the most prevalent type, accounting for 60%–70% of all instances (2). Mechanical ventilation (MV) is frequently essential to prevent potentially fatal respiratory failure or apnea in IS patients, particularly in the intensive care unit (ICU), due to the significant neurological abnormalities these patients frequently suffer.

Stroke patients undergoing MV are at increased risk for a serious pulmonary complication known as ventilator-associated pneumonia (VAP), which may have a devastating impact on their respiratory function and overall prognosis (3). Early and accurate identification of IS patients at high risk for VAP remains a critical yet challenging aspect of clinical management, as delayed diagnosis can result in worsened patient outcomes and increased clinical burden (4, 5).

Although machine learning (ML) methods have demonstrated promising results in predictive modeling within clinical research (6, 7), early predictive models specifically targeting VAP risk in IS patients remain scarce. To address this gap, we developed and validated an interpretable ML model utilizing stroke-related clinical data from a large public database. The SHapley Additive exPlanation (SHAP) method (8) was employed to enhance the interpretability of predictions. We also constructed an accessible web-based tool designed to assist clinicians in rapidly identifying IS patients at increased VAP risk.

2 Methods

2.1 Study population

The Medical Information Mart for Intensive Care (MIMIC)—IV database, specifically versions 2.2 and 3.0, was used for this retrospective analysis (9, 10). MIMIC-IV is a publicly accessible, large-scale intensive care database organized and maintained by the Laboratory for Computational Physiology at the Massachusetts Institute of Technology (MIT). This study utilized version 2.2, which includes medical data from approximately 300,000 patients treated at the Beth Israel Deaconess Medical Center (BIDMC) from 2008 to 2019, for model training and internal validation. Version 3.0, which includes data from 2020 to 2022 was utilized for model external testing. The use of MIMIC-IV data was ethically approved by the Institutional Review Boards of BIDMC and MIT. Since all personal data in the database are anonymized, informed consent was waived. The author (Heshan Cao) was granted access to the database (certification number: 63137030). Reporting of this study followed the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD+AI) guidelines (Supplementary Table 1) (11).

This study included patients aged 18 and above who were admitted to the ICU primarily for IS and had MV for more than 48 h. The primary outcome measure was the incidence of VAP. Patients with IS and VAP were identified in the MIMIC-IV database using the International Classification of Diseases, Ninth Revision (ICD-9), or Tenth Revision (ICD-10) codes. By restricting the diagnostic sequence to IS > VAP, we ensured that the diagnosis of IS was prioritized over that of VAP. Supplementary Table 2 lists the relevant ICD codes. Exclusion criteria included VAP diagnoses preceded IS and deaths within 7 days of ICU admission. For patients who had multiple ICU admissions, only the initial admission was considered. Figure 1 depicts a flowchart of inclusion and exclusion criteria.

Figure 1

Figure 1. Inclusion and exclusion flowchart and study workflow.

2.2 Data collection and feature selection

The MIMIC-IV database was queried using Structured Query Language (SQL) to extract features such as demographics, comorbidities, vital signs, laboratory test indicators, and ventilator settings. Records within the first 24 h of MV were used to extract data on vital signs, laboratory indicators, and ventilator settings; variables with multiple records were averaged. Records were also acquired for antibiotic usage, suctioning procedures, and invasive catheter placements during the first 24 h of MV. We encoded comorbidities and VAP incidence as binary values.

A total of 59 features were obtained. Features with more than 20% missing data were first excluded to reduce missing data bias. Features with a missing data rate below 20% were addressed using multiple imputation methods, as indicated in Supplementary Figure 1. Following that, a correlation analysis was conducted on all features, and those with a correlation coefficient greater than 0.7 were excluded to prevent multicollinearity from impacting model performance (Supplementary Figure 2). Finally, the Boruta algorithm was applied to select the most relevant features. Boruta is an all-relevant feature selection method that uses a random forest (RF) classifier to compare the importance of original features with that of randomly permuted “shadow” features. By iteratively eliminating features that do not outperform their shadow counterparts, the algorithm robustly identifies truly informative features (12). In this study, Boruta was executed with a confidence level of 0.01, iterated 500 times to exclude rejected features, as presented in Figure 2.

Figure 2

Figure 2. Results of feature selection using the Boruta algorithm.

2.3 Model development

The MIMIC-IV database data, which ranges from 2008 to 2019, was randomly divided into two sets: 80% for training and 20% for validation, using a stratified sampling strategy. To predict the risk of VAP in IS patients, 10 widely recognized ML models based on different principles were constructed: adaptive boosting (AdaBoost), category boosting (CatBoost), extra trees (ET), light gradient boosting machine (LightGBM), logistic regression (LR), multilayer perceptron (MLP), naive Bayes (NB), RF, support vector machine (SVM), and extreme gradient boosting (XGBoost). This comprehensive approach allowed us to identify the model that best balances performance for early VAP prediction in IS patients. To optimize the prediction models and avoid overfitting, the final hyperparameters for each model were determined using a combination of five-fold cross-validation and Bayesian search. For external validation, the trained models were tested on the MIMIC-IV database version 3.0, which covers the years 2020–2022.

The models’ performance was evaluated using metrics such as the area under the receiver operating characteristic (ROC) curve (AUC), accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). The optimal cutoff value was determined by maximizing the Youden index (sensitivity + specificity − 1). Calibration and decision curves were used to evaluate the model’s calibration and clinical decision-making capability.

2.4 Model explanation

The SHAP method quantifies the contribution of each input feature to the final prediction by leveraging concepts from cooperative game theory, addressing the “black-box” nature of ML models (8). This approach incorporates global and local explanations. Global explanations reveal the impact of features on the overall model, whereas local explanations examine the contribution of features in specific samples. The decision-making process of the final model is visually depicted, with both global and local explanations provided via the SHAP method.

2.5 Webpage deployment

To facilitate its application in clinical contexts, the final prediction model was integrated and released into a web application built with the Streamlit Python library. When values are entered into the required features settings, the application generates a risk score for VAP in individual IS patients, as well as a force plot showing the effect of each feature on the risk assessment.

2.6 Statistical analysis

Data preprocessing, model construction, performance evaluation, and result visualization were conducted using Python (version 3.9.18) and R (version 4.3.2). The variance inflation factor (VIF) was employed to assess potential multicollinearity among the selected features, as detailed in Supplementary Table 3. Continuous variables with a normal distribution were presented as means and standard deviations, while non-normally distributed data were reported as medians (m) and inter-quartile range (IQR). Categorical variables were presented as numbers (n) and percentages (%). Differences in continuous variables were compared using Student’s t-tests or Wilcoxon rank-sum tests, while categorical variables were analyzed using Chi-square tests or Fisher’s exact tests. Statistically significant differences were defined as two-tailed p values <0.05.

3 Results

3.1 Patient characteristics

Using ICD-9/10 codes and the defined inclusion–exclusion criteria, we extracted data on 401 and 118 IS patients from versions 2.2 and 3.0 of the MIMIC-IV database, respectively. In the derivation cohort, 149 IS patients developed VAP, while 64 patients in the test cohort experienced VAP. Table 1 outlines the baseline characteristics of both cohorts. The baseline characteristics of the VAP and non-VAP groups in the derivation cohort are detailed in Supplementary Table 4.

Table 1

Table 1. Baseline characteristics of the derivation and test cohorts.

3.2 Selection of features

Six of the initial 59 potential predictive features were removed due to a missing rate of more than 20%, while 7 were excluded due to a correlation coefficient greater than 0.7, as shown in Supplementary Table 5 and Supplementary Figure 2. Additionally, 39 features were removed during the Boruta algorithm phase. After considering clinical practicality, 7 features were eventually included to construct the ML models, including Systolic Blood Pressure (SBP), Diastolic Blood Pressure (DBP), International Normalized Ratio (INR), Length of Stay before Mechanical Ventilation (LOS Before MV), Dysphagia, Antibiotic counts, and Suctioning counts, as illustrated in Figure 2.

3.3 Model performance

The performance of the 10 ML models is presented in Table 2 and Supplementary Table 6. In the derivation cohort, the NB model performed best with an AUC of 0.790, followed by the RF and LightGBM models, both of which had an AUC of 0.776. In the test cohort, the RF model exhibited the best generalization capability with an AUC of 0.644. Figure 3A compares the AUC values of the 10 ML models in the internal validation and external test sets, while Figures 3B–D present the ROC curve, calibration curve, and decision curve of the final model.

Table 2

Table 2. Performance of 10 machine learning models in validation and test cohort.

Figure 3

Figure 3. Selection and performance visualization of the final model. (A) AUC values for 10 machine learning models in the validation and test sets, with the final selected model indicated by a dashed line. (B) ROC curve for the final model. (C) Calibration curve for the final model. (D) Decision curve for the final model.

3.4 Model explanation

The SHAP method was employed to interpret the final model. Figure 4A presents a bar plot of features ranked by their mean absolute SHAP values. Figure 4B shows a beeswarm plot, illustrating the relationship between each feature’s value and the predicted risk of VAP. These plots indicate that antibiotic counts, LOS before MV, and INR are the top three contributors. Figure 4C shows SHAP dependence plots that further reveal the distribution of each feature and its global relationship with VAP risk.

Figure 4

Figure 4. Global SHAP interpretation of the final model. (A) Global bar plot of SHAP values. (B) Global beeswarm plot of SHAP values. (C) Global dependence plots for individual features.

In addition, the SHAP method was used to conduct local interpretation for the final model. Figure 5 shows detailed local interpretations using SHAP waterfall plots and force plots. As shown in Figure 5A, for patient who did not eventually develop VAP, the SHAP analysis indicates that higher INR, the administration of antibiotics twice, a shorter LOS before MV, normal SBP and DBP, the absence of dysphagia, and fewer suctioning operations negatively contribute to the model’s prediction of VAP, resulting in a low-risk classification. In contrast, as depicted in Figure 5B, for patient who eventually developed VAP, the SHAP analysis reveals that multiple suctioning operations, higher SBP, and the presence of dysphagia positively support the prediction of VAP, while the administration of antibiotics four times and a shorter LOS before MV have a negative impact. The aggregated contributions of these features incline the model toward predicting a high risk of VAP for the patient.

Figure 5

Figure 5. Local SHAP interpretations of the final model. (A,C) Representative waterfall and force plots for ischemic stroke patients without VAP. (B,D) Representative waterfall force plots for ischemic stroke patients with VAP. Red indicates that the feature positively contributes to the risk of VAP; blue indicates a negative contribution.

3.5 Online application

Based on the final RF model, an interactive web-based tool was developed to facilitate clinical application.¹ Clinicians can input patient-specific clinical parameters to obtain an individualized prediction of VAP risk, along with a SHAP force plot clearly depicting each feature’s contribution. As illustrated in Figure 6, red features on the left side like LOS Before MV, suctioning counts, INR, and dysphagia push the prediction toward “VAP,” while the blue features on the right side like antibiotic counts, DBP, and SBP drive the prediction toward “non-VAP.” For the selected scenario shown in Figure 6, the model predicts a 68.51% probability of VAP occurrence, indicating a high risk for VAP.

Figure 6

Figure 6. The web application deployment based on the final model.

4 Discussion

This study developed and validated a ML model using clinical features to predict the risk of VAP in IS patients in the ICU, based on an open-source database. We employed the Boruta algorithm for feature selection before building the model, which enabled the identification of seven predictive parameters and the use of a limited number of clinical variables to enhance clinical practicality. SBP, DBP, INR, suctioning and antibiotic counts were extracted within the first 24 h of MV, allowing for a short predictive window to identify IS patients at high risk of developing VAP. To the best of our knowledge, this study is the first to predict VAP risk in ICU IS patients using an interpretable ML approach.

Our study found dysphagia as a significant risk factor for the incidence of VAP in stroke patients, which is consistent with prior research (13, 14). Stroke patients often experience dysphagia, with incidence rates ranging from 30 to 50% (15, 16). Dysphagia is associated with prolonged hospital stays, increased healthcare costs, and an elevated risk of persistent disability and mortality (17–21). Due to impaired swallowing function and a diminished cough reflex, patients with dysphagia struggle to clear oral secretions, rendering them more susceptible to aspiration events and subsequent pneumonia. Recent studies have further confirmed that stroke-related dysphagia significantly increases the risk of pulmonary infection, underscoring the clinical importance of early dysphagia screening in stroke patients (22, 23). Large-scale prospective studies have reported similar findings, demonstrating that dysphagia following acute ischemic stroke markedly increases the risk of pneumonia and is independently associated with poor outcomes and higher mortality (24, 25). Furthermore, early dysphagia screening (within 24 h of admission) has been proven to reduce the risk of stroke-associated pneumonia (24).

Suctioning of secretions, primarily subglottic secretion drainage, is a commonly recommended treatment for clearing lower airway secretions using an endotracheal tube to prevent VAP. Multiple studies have demonstrated that subglottic suctioning can significantly reduce the incidence of VAP by limiting microbial colonization around the endotracheal tube (26, 27). However, our study observed a significant link between increased suctioning frequency and a higher risk of VAP in IS patients, which is consistent with the findings of Abdallah et al. (28). Although this may seem contradictory, frequent invasive suctioning can impair mucociliary function and compromise airway mucosal integrity, thereby weakening the natural immune barrier and increasing susceptibility to bacterial colonization (29, 30). Moreover, excessive suctioning may introduce exogenous pathogens into the respiratory tract, further elevating the risk of infection (30). A higher frequency of suctioning may also reflect a greater secretion burden, suggesting the presence of a subclinical or early-stage infection. Our findings highlight the importance of carefully balancing suctioning frequency while effectively managing airway secretions to minimize harm, especially in vulnerable populations such as IS patients.

INR, a critical indicator of coagulation function, was also identified as a significant risk factor for VAP in IS patients. In the context of a stroke, an increased INR usually indicates the use of vitamin K antagonists, especially warfarin. Warfarin has been shown to have a protective effect for community-acquired pneumonia, probably due to its effect on disturbed thrombin formation and alveolar fibrin deposition (31). Our findings revealed that IS patients with higher INR had a lower risk of VAP, which may be due to warfarin medication. The underlying mechanism deserves further investigation in larger populations.

According to current guidelines on VAP treatment (32), prophylactic antibiotic use is generally not recommended to prevent VAP, due to the risk of long-term or unnecessary antibiotic use fostering resistant bacterial strains, increasing the burden of antibiotic resistance for both patients and hospitals. However, our findings suggest that administering antibiotics within the first 24 h of MV may lower the incidence of VAP. We speculate that this effect may be attributed to the serious and intricate nature of stroke as well as the existence of concomitant infectious diseases in the ICU. An early and appropriate use of antibiotic may reduce the incidence of VAP by preventing the colonization and proliferation of potential pathogens in the respiratory tract.

Our research indicates that a longer ICU stay before MV may increase the incidence of VAP, which is consistent with previous studies (33, 34), presumably due to the positive link between ICU stay duration and infection risk. Furthermore, the SHAP dependency plots in this study suggest a potential non-linear relationship between DBP, SBP, and the risk of VAP. Higher values of DBP and SBP are associated with an increased risk of VAP, while in the lower blood pressure range, although there is a tendency for low blood pressure to elevate the risk of VAP, the limited number of data points prevents us from drawing a definitive conclusion. Future studies should incorporate more data from patients with low blood pressure to further investigate and clarify this potential non-linear relationship.

ML excels in processing and analyzing complex multimodal and high-dimensional data. However, ML algorithms’ complicated nature makes it difficult to understand how they make prediction and decisions, presenting a “black box” issue that hinders its widely use in healthcare. As highlighted by Stinear et al. (35), developing operational and interpretable ML models is crucial for clinical practice. In this study, we utilized the SHAP method to address the “black box” problem of ML models. SHAP, a unified framework for ML interpretability proposed by Lundberg et al. (8), quantifies the contribution of each feature in the model to the final prediction, aiming to enhance user understanding of decision-making processes and increase confidence and trust in the predictive model’s outcomes. In addition, we deployed a web-based application that medical staff can use to predict VAP in IS patients, and we released it to the public based on the final RF model.

This study has several limitations. Firstly, it is a single-center retrospective study, utilizing health data from multiple time periods at this center for model training, validation, and testing. External validation at additional medical centers is necessary to further evaluate the model’s generalizability. Relying on a single database may also introduce potential data quality issues and selection bias. Secondly, identifying VAP patients using ICD codes in the MIMIC-IV database presents challenges in retrospectively determining the exact timing of VAP diagnosis. Lastly, our study focused primarily on the average or cumulative occurrence of clinical features within the first 24 h of MV, ignoring the impact of dynamic changes in clinical features during ICU stay. Therefore, future multicenter prospective studies conducted across diverse clinical settings are needed to comprehensively validate the robustness and applicability of our predictive model.

5 Conclusion

We developed and evaluated multiple ML algorithms to determine the risk of VAP in ICU patients with IS. Both the internal validation and the external testing showed that the RF model performed reliably. Our findings indicate that SBP, DBP, INR, antibiotic usage, frequency of suctioning within the first 24 h of MV, LOS Before MV, and dysphagia have a substantial impact on risk assessment. The ML model and online web tool developed in this study can help clinicians identify high-risk IS patients for VAP effectively at an early stage. Further multicenter prospective studies are warranted to validate the model’s generalizability and practical utility.

Data availability statement

The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary material.

Author contributions

HC: Writing – original draft, Data curation, Methodology, Validation, Visualization. JW: Writing – original draft, Investigation, Methodology, Validation. PH: Writing – review & editing, Formal analysis, Funding acquisition, Investigation, Methodology, Supervision, Validation. SY: Writing – review & editing, Conceptualization, Formal analysis, Funding acquisition, Methodology, Supervision, Validation, Writing – original draft.

Funding

The author(s) declare that financial support was received for the research and/or publication of this article. This research was funded by the National Natural Science Foundation of China (NSFC no. 81771165), the Natural Science Foundation Project in Guangdong province, China (grant nos. 2020A1515010233 and 2018A030313172), and Guangzhou Science and Technology project of Major Special Research Topics on International Collaborative Innovation (grant nos. 201704030032 and 201807010010).

Acknowledgments

We express our deep gratitude to the staff and administrators of the MIMIC database.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The authors declare that no Gen AI was used in the creation of this manuscript.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fneur.2025.1513732/full#supplementary-material

Footnotes

1. ^https://isvaprisk.streamlit.app/

References

1. Feigin, VL, Brainin, M, Norrving, B, Martins, S, Sacco, RL, Hacke, W, et al. World stroke organization (WSO): global stroke fact sheet 2022. Int J Stroke. (2022) 17:18–29. doi: 10.1177/17474930211065917

PubMed Abstract | Crossref Full Text | Google Scholar

2. Hilkens, NA, Casolla, B, Leung, TW, and De Leeuw, FE. Stroke. Lancet. (2024) 403:2820–36. doi: 10.1016/S0140-6736(24)00642-1

Crossref Full Text | Google Scholar

3. Papazian, L, Klompas, M, and Luyt, CE. Ventilator-associated pneumonia in adults: a narrative review. Intensive Care Med. (2020) 46:888–906. doi: 10.1007/s00134-020-05980-0

PubMed Abstract | Crossref Full Text | Google Scholar

4. Kasuya, Y, Hargett, JL, Lenhardt, R, Heine, MF, Doufas, AG, Remmel, KS, et al. Ventilator-associated pneumonia in critically ill stroke patients: frequency, risk factors, and outcomes. J Crit Care. (2011) 26:273–9. doi: 10.1016/j.jcrc.2010.09.006

PubMed Abstract | Crossref Full Text | Google Scholar

5. Yang, CC, Shih, NC, Chang, WC, Huang, SK, and Chien, CW. Long-term medical utilization following ventilator-associated pneumonia in acute stroke and traumatic brain injury patients: a case-control study. BMC Health Serv Res. (2011) 11:289. doi: 10.1186/1472-6963-11-289

PubMed Abstract | Crossref Full Text | Google Scholar

6. Fernandes, JND, Cardoso, VEM, Comesaña-Campos, A, and Pinheira, A. Comprehensive review: machine and deep learning in brain stroke diagnosis. Sensors. (2024) 24:4355. doi: 10.3390/s24134355

PubMed Abstract | Crossref Full Text | Google Scholar

7. Daidone, M, Ferrantelli, S, and Tuttolomondo, A. Machine learning applications in stroke medicine: advancements, challenges, and future prospectives. Neural Regen Res. (2024) 19:769–73. doi: 10.4103/1673-5374.382228

PubMed Abstract | Crossref Full Text | Google Scholar

8. Lundberg, SM, and Lee, SI. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17). Curran Associates Inc. (2017) 4768–4777

Google Scholar

9. Johnson, A, Bulgarelli, L, Pollard, T, Gow, B, Moody, B, Horng, S, et al. MIMIC-IV (version 3.0). PhysioNet. (2024). doi: 10.13026/hxp0-hg59

Crossref Full Text | Google Scholar

10. Johnson, AEW, Bulgarelli, L, Shen, L, Gayles, A, Shammout, A, Horng, S, et al. MIMIC-IV, a freely accessible electronic health record dataset. Sci Data. (2023) 10:1. doi: 10.1038/s41597-022-01899-x

PubMed Abstract | Crossref Full Text | Google Scholar

11. Collins, GS, Moons, KGM, Dhiman, P, Riley, RD, Beam, AL, Van Calster, B, et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ. (2024) 385:e078378. doi: 10.1136/bmj-2023-078378

PubMed Abstract | Crossref Full Text | Google Scholar

12. Kursa, MB, and Rudnicki, WR. Feature selection with the Boruta package. J Stat Softw. (2010) 36:1–13. doi: 10.18637/jss.v036.i11

Crossref Full Text | Google Scholar

13. Finlayson, O, Kapral, M, Hall, R, Asllani, E, Selchen, D, and Saposnik, G. Canadian stroke network; stroke outcome research Canada (SORCan) working group. Risk factors, inpatient care, and outcomes of pneumonia after ischemic stroke. Neurology. (2011) 77:1338–45. doi: 10.1212/WNL.0b013e31823152b1

PubMed Abstract | Crossref Full Text | Google Scholar

14. You, Q, Bai, D, Wu, C, Wang, H, Chen, X, Gao, J, et al. Risk factors for pulmonary infection in elderly patients with acute stroke: a meta-analysis. Heliyon. (2022) 8:e11664. doi: 10.1016/j.heliyon.2022.e11664

PubMed Abstract | Crossref Full Text | Google Scholar

15. Smithard, DG, O'Neill, PA, Parks, C, and Morris, J. Complications and outcome after acute stroke. Does dysphagia matter? Stroke. (1996) 27:1200–4. doi: 10.1161/01.str.27.7.1200

Crossref Full Text | Google Scholar

16. Mann, G, Hankey, GJ, and Cameron, D. Swallowing disorders following acute stroke: prevalence and diagnostic accuracy. Cerebrovasc Dis. (2000) 10:380–6. doi: 10.1159/000016094

PubMed Abstract | Crossref Full Text | Google Scholar

17. Cloud, G, Hoffman, A, and Rudd, A. Intercollegiate stroke working party. National sentinel stroke audit 1998-2011. Clin Med. (2013) 13:444–8. doi: 10.7861/clinmedicine.13-5-444

PubMed Abstract | Crossref Full Text | Google Scholar

18. Ramsey, DJ, Smithard, DG, and Kalra, L. Early assessments of dysphagia and aspiration risk in acute stroke patients. Stroke. (2003) 34:1252–7. doi: 10.1161/01.STR.0000066309.06490.B8

PubMed Abstract | Crossref Full Text | Google Scholar

19. Sharma, JC, Fletcher, S, Vassallo, M, and Ross, I. What influences outcome of stroke – pyrexia or dysphagia? Int J Clin Pract. (2001) 55:17–20. doi: 10.1111/j.1742-1241.2001.tb10970.x

Crossref Full Text | Google Scholar

20. Saposnik, G, Hill, MD, O'Donnell, M, Fang, J, Hachinski, V, and Kapral, MK. Registry of the Canadian stroke network for the stroke outcome research Canada (SORCan) working group. Variables associated with 7-day, 30-day, and 1-year fatality after ischemic stroke. Stroke. (2008) 39:2318–24. doi: 10.1161/STROKEAHA.107.510362

Crossref Full Text | Google Scholar

21. Boaden, E, Burnell, J, Hives, L, Dey, P, Clegg, A, Lyons, MW, et al. Screening for aspiration risk associated with dysphagia in acute stroke. Cochrane Database Syst Rev. (2021) 2021:CD012679. doi: 10.1002/14651858.CD012679.pub2

PubMed Abstract | Crossref Full Text | Google Scholar

22. Martino, R, Foley, N, Bhogal, S, Diamant, N, Speechley, M, and Teasell, R. Dysphagia after stroke: incidence, diagnosis, and pulmonary complications. Stroke. (2005) 36:2756–63. doi: 10.1161/01.STR.0000190056.76543.eb

PubMed Abstract | Crossref Full Text | Google Scholar

23. Liang, J, Yin, Z, Li, Z, Gu, H, Yang, K, Xiong, Y, et al. Predictors of dysphagia screening and pneumonia among patients with acute ischaemic stroke in China: findings from the Chinese stroke center Alliance (CSCA). Stroke Vasc Neurol. (2022) 7:294–301. doi: 10.1136/svn-2020-000746

PubMed Abstract | Crossref Full Text | Google Scholar

24. Al-Khaled, M, Matthis, C, Binder, A, Mudter, J, Schattschneider, J, Pulkowski, U, et al. Dysphagia in patients with acute ischemic stroke: early dysphagia screening may reduce stroke-related pneumonia and improve stroke outcomes. Cerebrovasc Dis. (2016) 42:81–9. doi: 10.1159/000445299

PubMed Abstract | Crossref Full Text | Google Scholar

25. Yuan, M, Li, Q, Zhang, R, Zhang, W, Zou, N, Qin, X, et al. Risk factors for and impact of poststroke pneumonia in patients with acute ischemic stroke. Medicine. (2021) 100:e25213. doi: 10.1097/MD.0000000000025213

PubMed Abstract | Crossref Full Text | Google Scholar

26. Caroff, DA, Li, L, Muscedere, J, and Klompas, M. Subglottic secretion drainage and objective outcomes: a systematic review and Meta-analysis. Crit Care Med. (2016) 44:830–40. doi: 10.1097/CCM.0000000000001414

PubMed Abstract | Crossref Full Text | Google Scholar

27. Damas, P, Frippiat, F, Ancion, A, Canivet, JL, Lambermont, B, Layios, N, et al. Prevention of ventilator-associated pneumonia and ventilator-associated conditions: a randomized controlled trial with subglottic secretion suctioning. Crit Care Med. (2015) 43:22–30. doi: 10.1097/CCM.0000000000000674

PubMed Abstract | Crossref Full Text | Google Scholar

28. Abdallah, HO, Weingart, MF, Fuller, R, Pegues, D, Fitzpatrick, R, and Kelly, BJ. Subglottic suction frequency and adverse ventilator-associated events during critical illness. Infect Control Hosp Epidemiol. (2021) 42:826–32. doi: 10.1017/ice.2020.1298

PubMed Abstract | Crossref Full Text | Google Scholar

29. Maggiore, SM, Lellouche, F, Pignataro, C, Girou, E, Maitre, B, Richard, J-CM, et al. Decreasing the adverse effects of endotracheal suctioning during mechanical ventilation by changing practice. Respir Care. (2013) 58:1588–97. doi: 10.4187/respcare.02265

PubMed Abstract | Crossref Full Text | Google Scholar

30. Favretto, DO, De CP, SRC, SRM, C, Da, S, Garbin, LM, FTM, M, et al. Endotracheal suction in intubated critically ill adult patients undergoing mechanical ventilation: a systematic review. Rev Lat Am Enfermagem. (2012) 20:997–1007. doi: 10.1590/s0104-11692012000500023

PubMed Abstract | Crossref Full Text | Google Scholar

31. Gouder, C, Agius, M, Gamoudi, D, Gamoudi, N, Farrugia, D, Borg, M, et al. Does previous warfarin treatment effect complications and outcome in hospitalised patients with community-acquired pneumonia? Eur Respir J. (2014) 44:P2573.

Google Scholar

32. Torres, A, Niederman, MS, Chastre, J, Ewig, S, Fernandez-Vandellos, P, Hanberger, H, et al. International ERS/ESICM/ESCMID/ALAT guidelines for the management of hospital-acquired pneumonia and ventilator-associated pneumonia: guidelines for the management of hospital-acquired pneumonia (HAP)/ventilator-associated pneumonia (VAP) of the European Respiratory Society (ERS), European Society of Intensive Care Medicine (ESICM), European Society of Clinical Microbiology and Infectious Diseases (ESCMID) and Asociación Latinoamericana del Tórax (ALAT). Eur Respir J. (2017) 50:1700582. doi: 10.1183/13993003.00582-2017

PubMed Abstract | Crossref Full Text | Google Scholar

33. Shamsizadeh, M, Fathi Jouzdani, A, and Rahimi-Bashar, F. Incidence and risk factors of ventilator-associated pneumonia among patients with delirium in the intensive care unit: a prospective observational study. Crit Care Res Pract. (2022) 2022:4826933–6. doi: 10.1155/2022/4826933

PubMed Abstract | Crossref Full Text | Google Scholar

34. Myny, D, Depuydt, P, Colardyn, F, and Blot, S. Ventilator-associated pneumonia in a tertiary care ICU: analysis of risk factors for acquisition and mortality. Acta Clin Belg. (2005) 60:114–21. doi: 10.1179/acb.2005.022

PubMed Abstract | Crossref Full Text | Google Scholar

35. Stinear, CM, Smith, MC, and Byblow, WD. Prediction tools for stroke rehabilitation. Stroke. (2019) 50:3314–22. doi: 10.1161/STROKEAHA.119.025696

Crossref Full Text | Google Scholar

Keywords: ischemic stroke, ventilator-associated pneumonia, machine learning, SHAP, MIMIC-IV database

Citation: Cao H, Wei J, Hua P and Yang S (2025) Interpretable machine learning for early predicting the risk of ventilator-associated pneumonia in ischemic stroke patients in the intensive care unit. Front. Neurol. 16:1513732. doi: 10.3389/fneur.2025.1513732

Received: 19 October 2024; Accepted: 21 April 2025;
Published: 07 May 2025.

Edited by:

Wu Qiu, Huazhong University of Science and Technology, China

Reviewed by:

Fermin García-Muñoz Rodrigo, Complejo Hospitalario Universitario Insular-Materno Infantil, Spain
Erika Paola Plata Menchaca, Hospital Clinic of Barcelona, Spain

Copyright © 2025 Cao, Wei, Hua and Yang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Ping Hua, aHVhcGluZ0BtYWlsLnN5c3UuZWR1LmNu; Songran Yang, eWFuZ3NyQG1haWwuc3lzdS5lZHUuY24=

^†These authors have contributed equally to this work and share first authorship

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.