ORIGINAL RESEARCH article
Front. Neurol.
Sec. Artificial Intelligence in Neurology
This article is part of the Research TopicLeveraging Big Data Mining to Advance Neurological ResearchView all 7 articles
Explainable Machine Learning for Stroke Risk Prediction: A Comparative Study with SHAP-Based Interpretation
Provisionally accepted- 1Guangzhou University of Chinese Medicine, Guangzhou, China
- 2Hubei University of Chinese Medicine, Wuhan, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Background: Stroke is one of the leading causes of death and disability worldwide, making early screening and risk prediction crucial. Traditional methods have limitations in handling nonlinear relationships between variables, class imbalance, and model interpretability. Methods: Logistic regression (LR), random forest (RF), extreme gradient boosting (XGBoost), categorical boosting (CatBoost), multi-layer perceptron (MLP) neural network, and ensemble models were constructed and compared. Their performance in stroke risk prediction was systematically evaluated, and feature contributions were interpreted using SHapley Additive exPlanations (SHAP). Confusion matrices and Precision-Recall (PR) curves were used to compare the differences in recognition of the positive class (stroke patients) among the models, and training time was calculated to quantify resource consumption. Results: The ensemble model and neural network demonstrated superior overall predictive ability to traditional algorithms, with the MLP performing particularly well in terms of recall. SHAP results revealed that "hypertension," "average blood glucose level," and "age" were key influencing factors. Confusion matrices and PR curves indicated differences in positive classification among the models. Training time analysis provided a basis for resource assessment for subsequent deployment. Conclusions: Machine learning methods have advantages in stroke risk prediction. Incorporating interpretability analysis can enhance the clinical credibility of the models, providing data and methodological reference for stroke risk stratification management and early warning.
Keywords: machine learning, Model interpretability, Neural Network, Shap, Stroke prediction
Received: 24 Oct 2025; Accepted: 12 Dec 2025.
Copyright: © 2025 Tang, Tang, Liu and Cui. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
Wu Liu
Shaoyang Cui
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
