Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Neurol.

Sec. Artificial Intelligence in Neurology

This article is part of the Research TopicLeveraging Big Data Mining to Advance Neurological ResearchView all 7 articles

Explainable Machine Learning for Stroke Risk Prediction: A Comparative Study with SHAP-Based Interpretation

Provisionally accepted
Xiaoyu  TangXiaoyu Tang1Min  TangMin Tang2Wu  LiuWu Liu2*Shaoyang  CuiShaoyang Cui1*
  • 1Guangzhou University of Chinese Medicine, Guangzhou, China
  • 2Hubei University of Chinese Medicine, Wuhan, China

The final, formatted version of the article will be published soon.

Background: Stroke is one of the leading causes of death and disability worldwide, making early screening and risk prediction crucial. Traditional methods have limitations in handling nonlinear relationships between variables, class imbalance, and model interpretability. Methods: Logistic regression (LR), random forest (RF), extreme gradient boosting (XGBoost), categorical boosting (CatBoost), multi-layer perceptron (MLP) neural network, and ensemble models were constructed and compared. Their performance in stroke risk prediction was systematically evaluated, and feature contributions were interpreted using SHapley Additive exPlanations (SHAP). Confusion matrices and Precision-Recall (PR) curves were used to compare the differences in recognition of the positive class (stroke patients) among the models, and training time was calculated to quantify resource consumption. Results: The ensemble model and neural network demonstrated superior overall predictive ability to traditional algorithms, with the MLP performing particularly well in terms of recall. SHAP results revealed that "hypertension," "average blood glucose level," and "age" were key influencing factors. Confusion matrices and PR curves indicated differences in positive classification among the models. Training time analysis provided a basis for resource assessment for subsequent deployment. Conclusions: Machine learning methods have advantages in stroke risk prediction. Incorporating interpretability analysis can enhance the clinical credibility of the models, providing data and methodological reference for stroke risk stratification management and early warning.

Keywords: machine learning, Model interpretability, Neural Network, Shap, Stroke prediction

Received: 24 Oct 2025; Accepted: 12 Dec 2025.

Copyright: © 2025 Tang, Tang, Liu and Cui. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Wu Liu
Shaoyang Cui

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.