ORIGINAL RESEARCH article
Front. Neurol.
Sec. Movement Disorders
Volume 16 - 2025 | doi: 10.3389/fneur.2025.1678463
SHAP-Based Interpretable Machine Learning for Parkinson's Disease Severity Prediction: Integrated Analysis of Clinical and Environmental Features
Provisionally accepted- 1Kaifeng Central Hospital, Kaifeng, China
- 2Yellow River Conservancy Technical University, Kaifeng, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Parkinson's Disease (PD) is the second most common neurodegenerative disease after Alzheimer's disease. With over 10 million patients worldwide and projected to reach 25.2 million by 2050, the acceleration of population aging has made PD severity assessment a core issue in clinical treatment and resource allocation. Traditional assessment methods such as the Unified Parkinson's Disease Rating Scale (UPDRS) and Hoehn-Yahr staging, though widely used, have limitations including strong subjectivity and difficulty in quantifying interactive factors. To achieve more precise and transparent prediction, this study constructed a multi-dimensional machine learning model integrating clinical and environmental features, introducing SHAP (SHapley Additive exPlanations) methods to enhance model interpretability. Based on data from 500 Parkin-son's disease patients, the study integrated 7 standardized clinical phenotypes (excluding UPDRS to prevent data leakage) and 8 environmental exposure fac-tors (such as temperature, PM2.5, UV intensity, etc.), compared the performance of 10 machine learning algorithms, and selected the optimal model through 5-fold cross-validation with comprehensive sampling strategy evaluation. Following rigorous target variable reconstruction using independent clinical dimensions, 1 XGBoost with SMOTE sampling achieved realistic discriminative performance (AUC = 0.781, precision = 0.548, recall = 0.750), demonstrating clinically meaningful capability appropriate for screening applications. SHAP interpretability analysis revealed non-motor symptoms as the primary predictor (SHAP value = 2.76), followed by serum dopamine concentration (2.39) and age (2.16), while environmental factors such as PM2.5 concentration, wind speed, and sunshine duration demonstrated modest but statistically significant contributions. Cross-validation results confirmed model stability (AUC: 0.781 ± 0.016), with balanced sensitivity (75.0%) and specificity (83.8%) suitable for clinical decision support. This proof-of-concept study developed an interpretable PD severity prediction framework with methodological safeguards against data leakage, achieving inte-gration of disease phenotypes, biochemical markers, and environmental factors. While demonstrating promising screening potential with realistic performance expectations (AUC = 0.781), the cross-sectional, single-center design limits gen-eralizability, requiring external validation and longitudinal studies before clinical deployment. This preliminary framework provides foundation for evidence-based severity screening approaches pending broader validation studies.
Keywords: Parkinson's disease, machine learning, Disease Severity, environmental factors, Shap, Interpretable artificial intelligence
Received: 05 Aug 2025; Accepted: 09 Sep 2025.
Copyright: © 2025 Jin, Li, Han, Qiu, Zhang, Xu and Han. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Xiang Li, Yellow River Conservancy Technical University, Kaifeng, China
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.