ORIGINAL RESEARCH article
Front. Aging Neurosci.
Sec. Parkinson’s Disease and Aging-related Movement Disorders
A lightweight cerebrospinal fluid biomarker-based model for first-diagnosis prediction of Parkinson's disease: model development, external validation, and local deployment
Provisionally accepted- 1Department of Interventional Neurology, First Affiliated Hospital of Zhengzhou University,, Zheng Zhou, China
- 2Reproductive medicine center, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
- 3Massachusetts General Hospital Department of Neurology, Boston, United States
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Background: Despite substantial progress in biomarker research, Parkinson 's disease (PD) still lacks widely validated, easily deployable diagnostic tests for reliable early-stage detection, particularly in resource-limited circumstances. Objective: This study aimed to develop and externally validate a lightweight machine learning model for the first-diagnosis prediction of PD using baseline cerebrospinal fluid (CSF) biomarkers from the Parkinson's Progression Markers Initiative (PPMI). Methods: Baseline CSF data from 665 participants (PD = 415, controls = 190, SWEDD = 60) were used. Five machine learning classifiers—L2-regularized logistic regression (L2-LR), random forest (RF), histogram-based gradient boosting (HistGB), support vector machine with RBF kernel (SVM-RBF), and multilayer perceptron (MLP)—were trained and compared. Feature selection focused on five core CSF biomarkers (Aβ42, α-synuclein, total tau, phosphorylated tau181 and hemoglobin). Model performance was evaluated using AUC, PR-AUC, and Brier scores, followed by isotonic calibration and independent validation using the University of Pennsylvania dataset. Results: A lightweight, biomarker-based RF model effectively distinguishes first-diagnosis PD cases using limited baseline CSF indicators. Its offline Streamlit deployment offers a practical tool for resource-limited settings, bridging the gap between computational prediction and real-world neurological diagnosis.
Keywords: Parkinson's disease, cerebrospinal fluid biomarkers, machine learning, early diagnosis, Local model deployment
Received: 16 Oct 2025; Accepted: 24 Nov 2025.
Copyright: © 2025 Hu, Liu, CAO, Wei, Liu and Yang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Jing-Hua Yang, jyang@bu.edu
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
