AUTHOR=He Menghui , Lu Zhongsheng , Lv Yiwei , Cheng Zihai , Zhang Qiang , Jin Xiaoqing , Han Pei 

TITLE=Machine learning-based prediction of 6-month functional recovery in hypertensive cerebral hemorrhage: insights from XGBoost and SHAP analysis

JOURNAL=Frontiers in Neurology

VOLUME=Volume 16 - 2025

YEAR=2025

URL=https://www.frontiersin.org/journals/neurology/articles/10.3389/fneur.2025.1608341

DOI=10.3389/fneur.2025.1608341

ISSN=1664-2295

ABSTRACT=BackgroundThe poor prognosis of hypertensive cerebral hemorrhage (HICH) remains high. The period of 3–6 months after onset is the most rapid phase of neurological recovery in hemorrhagic stroke patients. Accurate early prediction of 6-month functional outcomes is critical for optimizing therapeutic strategies. This study compared the predictive efficacy of multiple machine learning models to identify the optimal model for forecasting long-term prognosis in HICH patients.MethodsWe conducted a retrospective analysis of clinical data from 807 HICH patients admitted to Qinghai Provincial People's Hospital's Neurosurgery Department between June 2020 and June 2024. After data preprocessing, data from June 2020 to December 2023 (n = 716) were randomly split into training (n = 497) and test sets (n = 219) at a 7:3 ratio. Data from January to June 2024 (n = 91) served as an external validation set. Recursive Feature Elimination (RFE) was performed to identify optimal features, and repeated five-fold cross-validation minimized the risk of overfitting. Model performance was evaluated using Area Under the Curve (AUC) and Decision Curve Analysis (DCA) across XGBoost, Random Forest (RF), Logistic Regression (LR), Support Vector Machine (SVM), and K-Nearest Neighbors (KNN). The optimal model was interpreted via SHapley Additive exPlanations (SHAP).ResultsThe 6-month poor prognosis rate among 807 HICH patients was 27.51%. The XGBoost model exhibited optimal performance in the training set (AUC = 0.921, 95% CI: 0.896–0.944) and demonstrated stability in the external validation set (AUC = 0.813, 95% CI: 0.728–0.899). DCA analysis showed that the XGBoost model provided higher net benefit than other models across threshold probabilities of 0%−20% and 56%−100%. SHAP analysis identified hematoma volume as the most critical predictor, with secondary contributions from Glasgow coma score, white blood cell count, age, serum albumin, and systolic blood pressure, among others.ConclusionXGBoost models demonstrate powerful accuracy in long-term prognosis prediction of HICH patients. The SHAP framework quantifies the specific contributions of key pathophysiological indicators to individual patient model predictions, enabling individualized risk stratification and strategic allocation of medical resources.