ORIGINAL RESEARCH article
Front. Neurol.
Sec. Neurorehabilitation
Predicting Cognitive Impairment in Parkinson's Disease: A Machine Learning Approach Based on Clinical and Neuropsychological Data
Provisionally accepted- The Second Affiliated Hospital of Fujian Medical University, Quanzhou, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Abstract Background Cognitive impairment (CI) is a common and disabling non-motor symptom of Parkinson’s disease (PD), significantly reducing quality of life and increasing caregiver burden. Despite extensive research, early prediction of CI remains challenging owing to diverse clinical presentations, inconsistent treatment adherence, and inherent limitations in the sensitivity of conventional biomarkers and cognitive assessment tools. Methods and materials This retrospective cohort study included 514 patients with PD who had complete baseline data and at least 6 months of follow-up. The participants were randomly divided into training (n = 359) and test cohorts (n = 155). Demographic, clinical, biochemical, and neuropsychological data were obtained at baseline. CI was defined by Mini-Mental State Examination scores below education-adjusted thresholds and validated using the Montreal Cognitive Assessment (MoCA). Multiple machine learning (ML) models, including random forest (RF), logistic regression (LR), gradient boosting, CatBoost, and support vector machines, were developed and evaluated using the area under the receiver operating characteristic curve (AUROC), accuracy, recall, F1-score, calibration, and decision curve analysis. Feature importance analysis identified the key predictive variables. Results During follow-up, patients who developed CI were characterized by a significantly older age; longer disease duration; lower levels of albumin, hematocrit (HCT), and blood lipids; and higher prevalence of hypertension. Feature selection identified age, platelet count, time from diagnosis to baseline visit, apolipoprotein B level, and HCT level as the predictors. The RF model demonstrated the best overall performance, with AUROC of 0.846, accuracy of 0.75, and F1-score of 0.775, followed by CatBoost and LR. Calibration and decision curve analyses confirmed stable probability estimation and superior clinical utility of RF compared with those of “treat all” or “treat none” strategies. The MoCA score was further used to verify the model’s stability. Conclusions ML models that integrate multimodal clinical and neuropsychological data demonstrate high accuracy in predicting CI in Parkinson’s disease, with RF emerging as the most reliable approach. This framework provides a practical tool for early risk stratification, potentially enabling timely interventions and individualized management to reduce the burden of cognitive decline in PD.
Keywords: Parkinson's disease, cognitive impairment, prediction, Machne leaming, random forest, Outpatient management
Received: 24 Sep 2025; Accepted: 25 Nov 2025.
Copyright: © 2025 Yang, Wang, Zhang, Xiao, Chen, Guo, Wang and Huang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Jinzhong Huang
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
