Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Aging Neurosci.

Sec. Parkinson’s Disease and Aging-related Movement Disorders

Volume 17 - 2025 | doi: 10.3389/fnagi.2025.1687925

This article is part of the Research TopicMachine Learning Revolutionizing Aging-Related Movement Disorder DiagnosticsView all 3 articles

Diagnostic Classification of Mild Cognitive Impairment in Parkinson's Disease Using Subject-Level Stratified Machine-Learning Analysis

Provisionally accepted
Jing  WangJing Wang1*Yanfang  ChenYanfang Chen1Xiao  XieXiao Xie1Pengwei  WangPengwei Wang1Hang  HuHang Hu1Hongfang  HanHongfang Han2Lihan  WangLihan Wang3Li  ZhangLi Zhang4
  • 1Xinyang Normal University, Xinyang, China
  • 2Shanghai Institute of Technology, Shanghai, China
  • 3Shandong Academy of Eye Disease Prevention and Therapy, Jinan, China
  • 4Nanjing Xiaozhuang University, Nanjing, China

The final, formatted version of the article will be published soon.

Background: The timely identification of mild cognitive impairment (MCI) in Parkinson's disease (PD) is essential for early intervention and clinical management, yet it remains a challenge in practice. Methods: We conducted an analysis of 3,154 clinical visits from 896 participants in the Parkin-son's Progression Markers Initiative (PPMI) cohort. Participants were divided into two groups: cognitively normal (PD-NC, MoCA ≥26) and MCI (PD-MCI, 21 ≤MoCA ≤25). To ensure no visit-level information leakage, subject-level stratified sampling was employed to split the data into training (70%) and hold-out test (30%) sets. From an initial set of twelve routinely assessed clinical features, seven were selected using least absolute shrinkage and selection operator (LASSO) logistic regression: age, sex, years of education, disease duration, UPDRS-I, UPDRS-III, and Geriatric Depression Scale (GDS). Four machine learning models—logistic regression (LR), support vector machine (SVM), random forest (RF), and extreme gradient boo-sting (XGBoost)—were trained using subject-level stratified 10-fold cross-validation with Bayesian optimization. Probabilistic outputs were dichotomized using three thresholding strategies: default 0.5, F1-score maximization, and Youden index maximization. Results: On the independent test set, SVM achieved the highest overall performance with AUC-ROC of 0.7252 and AUC-PR of 0.5008. LR also performed competitively despite its simplicity. RF achieved the top performance in sensitivity, reaching 0.8150. Feature importance analysis consistently highlighted age, years of education, and disease duration as the most informative predictors for distinguishing PD-MCI. Additionally, more stringent site-level split validation yielded slightly decreased overall performance, with LR showing improved AUC-PR. Importantly, the core feature importance ranking remained largely consistent across validation strategies. Conclusion: This study developed and validated robust machine learning models for PD-MCI classification using standard clinical assessments alone. Through subject-level or site-level stratified cross-validation combined with Bayesian optimization, we achieved rigorous model evaluation while minimizing overfitting risk. These findings demonstrate the potential for implementing data-driven, interpretable diagnostic tools to enhance early cognitive impairment screening in routine PD care.

Keywords: Mild Cognitive Impairment, Parkinson's disease, machine learning, Stratified sampling, Bayesian optimization, Feature importance

Received: 18 Aug 2025; Accepted: 06 Oct 2025.

Copyright: © 2025 Wang, Chen, Xie, Wang, Hu, Han, Wang and Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Jing Wang, wangjing@xynu.edu.cn

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.