Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Neurol.

Sec. Neurorehabilitation

Predicting Cognitive Impairment in Parkinson's Disease: A Machine Learning Approach Based on Clinical and Neuropsychological Data

Provisionally accepted
  • The Second Affiliated Hospital of Fujian Medical University, Quanzhou, China

The final, formatted version of the article will be published soon.

Abstract Background Cognitive impairment (CI) is a common and disabling non-motor symptom of Parkinson’s disease (PD), significantly reducing quality of life and increasing caregiver burden. Despite extensive research, early prediction of CI remains challenging owing to diverse clinical presentations, inconsistent treatment adherence, and inherent limitations in the sensitivity of conventional biomarkers and cognitive assessment tools. Methods and materials This retrospective cohort study included 514 patients with PD who had complete baseline data and at least 6 months of follow-up. The participants were randomly divided into training (n = 359) and test cohorts (n = 155). Demographic, clinical, biochemical, and neuropsychological data were obtained at baseline. CI was defined by Mini-Mental State Examination scores below education-adjusted thresholds and validated using the Montreal Cognitive Assessment (MoCA). Multiple machine learning (ML) models, including random forest (RF), logistic regression (LR), gradient boosting, CatBoost, and support vector machines, were developed and evaluated using the area under the receiver operating characteristic curve (AUROC), accuracy, recall, F1-score, calibration, and decision curve analysis. Feature importance analysis identified the key predictive variables. Results During follow-up, patients who developed CI were characterized by a significantly older age; longer disease duration; lower levels of albumin, hematocrit (HCT), and blood lipids; and higher prevalence of hypertension. Feature selection identified age, platelet count, time from diagnosis to baseline visit, apolipoprotein B level, and HCT level as the predictors. The RF model demonstrated the best overall performance, with AUROC of 0.846, accuracy of 0.75, and F1-score of 0.775, followed by CatBoost and LR. Calibration and decision curve analyses confirmed stable probability estimation and superior clinical utility of RF compared with those of “treat all” or “treat none” strategies. The MoCA score was further used to verify the model’s stability. Conclusions ML models that integrate multimodal clinical and neuropsychological data demonstrate high accuracy in predicting CI in Parkinson’s disease, with RF emerging as the most reliable approach. This framework provides a practical tool for early risk stratification, potentially enabling timely interventions and individualized management to reduce the burden of cognitive decline in PD.

Keywords: Parkinson's disease, cognitive impairment, prediction, Machne leaming, random forest, Outpatient management

Received: 24 Sep 2025; Accepted: 25 Nov 2025.

Copyright: © 2025 Yang, Wang, Zhang, Xiao, Chen, Guo, Wang and Huang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Jinzhong Huang

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.