AUTHOR=Xu Jixiang , Li Yuan , Zhu Fumin , Han Xiaoxiao , Chen Liang , Qi Yinliang , Zhou Xiaomei TITLE=Construction of a risk prediction model for pulmonary infection in patients with spontaneous intracerebral hemorrhage during the recovery phase based on machine learning JOURNAL=Frontiers in Neurology VOLUME=Volume 16 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/neurology/articles/10.3389/fneur.2025.1571755 DOI=10.3389/fneur.2025.1571755 ISSN=1664-2295 ABSTRACT=ObjectivePulmonary infection (PI) remains a prevalent and severe complication in patients recovering from spontaneous deep subcortical intracerebral hemorrhage (deep SICH). Accurate prediction of PI risk is crucial for early intervention and optimized clinical management. The aim of this study was to develop a machine learning (ML) model for predicting PI risk in patients during the recovery phase of deep SICH and to investigate the contributions of individual risk factors through explainable artificial intelligence techniques.MethodsWe conducted a retrospective study involving 649 patients diagnosed with PI during the recovery phase of deep SICH between 2021 and 2023. The cohort was divided into a training set (70%, n = 454) and a testing set (30%, n = 195). Eight key clinical features were identified using the Boruta algorithm: mechanical ventilation, nasogastric feeding, tracheotomy, antibacterial drug use, hyperbaric oxygen therapy, procalcitonin levels, sedative drug use, and consciousness scores. Seven ML algorithms were employed to build predictive models, with performance evaluated based on the area under the receiver operating characteristic (AUC) curve, sensitivity, specificity, and accuracy. The best-performing model was selected, and SHAP (Shapley Additive Explanations) analysis was performed to interpret feature importance.ResultsAmong 649 patients with deep SICH, no significant baseline differences were found between the training (n = 454) and testing (n = 195) sets. The Boruta algorithm identified eight key predictors of pulmonary infection (PI). The random forest (RF) model achieved the highest AUCs: 0.994 (95% CI: 0.989–0.998) in training and 0.931 (95% CI: 0.899–0.963) in testing. DeLong tests showed RF significantly outperformed several models (DT, SVM, LightGBM), while performance differences with XGBoost (p = 0.95), KNN (p = 0.80), and LR (p = 0.22) were not significant. SHAP analysis revealed mechanical ventilation, nasogastric feeding, and tracheotomy as key risk factors, with hyperbaric oxygen therapy and higher consciousness scores showing protective effects.ConclusionsThis study provides a high-performing and interpretable ML-based risk stratification tool for pulmonary infection in patients during the recovery phase of deep SICH. The integration of SHAP enhances clinical applicability by demystifying complex model outputs, thereby supporting individualized preventive strategies. These findings underscore the promise of explainable AI in advancing neurocritical care and call for prospective multicenter validation and real-time dynamic model adaptation in future research.