ORIGINAL RESEARCH article
Front. Aging Neurosci.
Sec. Parkinson’s Disease and Aging-related Movement Disorders
Volume 17 - 2025 | doi: 10.3389/fnagi.2025.1687925
This article is part of the Research TopicMachine Learning Revolutionizing Aging-Related Movement Disorder DiagnosticsView all 3 articles
Diagnostic Classification of Mild Cognitive Impairment in Parkinson's Disease Using Subject-Level Stratified Machine-Learning Analysis
Provisionally accepted- 1Xinyang Normal University, Xinyang, China
- 2Shanghai Institute of Technology, Shanghai, China
- 3Shandong Academy of Eye Disease Prevention and Therapy, Jinan, China
- 4Nanjing Xiaozhuang University, Nanjing, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Background: The timely identification of mild cognitive impairment (MCI) in Parkinson's disease (PD) is essential for early intervention and clinical management, yet it remains a challenge in practice. Methods: We conducted an analysis of 3,154 clinical visits from 896 participants in the Parkin-son's Progression Markers Initiative (PPMI) cohort. Participants were divided into two groups: cognitively normal (PD-NC, MoCA ≥26) and MCI (PD-MCI, 21 ≤MoCA ≤25). To ensure no visit-level information leakage, subject-level stratified sampling was employed to split the data into training (70%) and hold-out test (30%) sets. From an initial set of twelve routinely assessed clinical features, seven were selected using least absolute shrinkage and selection operator (LASSO) logistic regression: age, sex, years of education, disease duration, UPDRS-I, UPDRS-III, and Geriatric Depression Scale (GDS). Four machine learning models—logistic regression (LR), support vector machine (SVM), random forest (RF), and extreme gradient boo-sting (XGBoost)—were trained using subject-level stratified 10-fold cross-validation with Bayesian optimization. Probabilistic outputs were dichotomized using three thresholding strategies: default 0.5, F1-score maximization, and Youden index maximization. Results: On the independent test set, SVM achieved the highest overall performance with AUC-ROC of 0.7252 and AUC-PR of 0.5008. LR also performed competitively despite its simplicity. RF achieved the top performance in sensitivity, reaching 0.8150. Feature importance analysis consistently highlighted age, years of education, and disease duration as the most informative predictors for distinguishing PD-MCI. Additionally, more stringent site-level split validation yielded slightly decreased overall performance, with LR showing improved AUC-PR. Importantly, the core feature importance ranking remained largely consistent across validation strategies. Conclusion: This study developed and validated robust machine learning models for PD-MCI classification using standard clinical assessments alone. Through subject-level or site-level stratified cross-validation combined with Bayesian optimization, we achieved rigorous model evaluation while minimizing overfitting risk. These findings demonstrate the potential for implementing data-driven, interpretable diagnostic tools to enhance early cognitive impairment screening in routine PD care.
Keywords: Mild Cognitive Impairment, Parkinson's disease, machine learning, Stratified sampling, Bayesian optimization, Feature importance
Received: 18 Aug 2025; Accepted: 06 Oct 2025.
Copyright: © 2025 Wang, Chen, Xie, Wang, Hu, Han, Wang and Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Jing Wang, wangjing@xynu.edu.cn
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.