A lightweight cerebrospinal fluid biomarker-based model for first-diagnosis prediction of Parkinson's disease: model development, external validation, and local deployment

Hu, Xinchao; Liu, Yu; CAO, Yuan; Wei, Chunli; Liu, Kun; Yang, Jing-Hua

doi:10.3389/fnagi.2025.1723169

ORIGINAL RESEARCH article

Front. Aging Neurosci.

Sec. Parkinson’s Disease and Aging-related Movement Disorders

A lightweight cerebrospinal fluid biomarker-based model for first-diagnosis prediction of Parkinson's disease: model development, external validation, and local deployment

Provisionally accepted

Xinchao Hu¹

Yu Liu²

Yuan CAO³

Chunli Wei¹ Kun Liu

Kun Liu¹

Jing-Hua Yang^1*

¹Department of Interventional Neurology, First Affiliated Hospital of Zhengzhou University,, Zheng Zhou, China
²Reproductive medicine center, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
³Massachusetts General Hospital Department of Neurology, Boston, United States

The final, formatted version of the article will be published soon.

Background: Despite substantial progress in biomarker research, Parkinson 's disease (PD) still lacks widely validated, easily deployable diagnostic tests for reliable early-stage detection, particularly in resource-limited circumstances. Objective: This study aimed to develop and externally validate a lightweight machine learning model for the first-diagnosis prediction of PD using baseline cerebrospinal fluid (CSF) biomarkers from the Parkinson's Progression Markers Initiative (PPMI). Methods: Baseline CSF data from 665 participants (PD = 415, controls = 190, SWEDD = 60) were used. Five machine learning classifiers—L2-regularized logistic regression (L2-LR), random forest (RF), histogram-based gradient boosting (HistGB), support vector machine with RBF kernel (SVM-RBF), and multilayer perceptron (MLP)—were trained and compared. Feature selection focused on five core CSF biomarkers (Aβ42, α-synuclein, total tau, phosphorylated tau181 and hemoglobin). Model performance was evaluated using AUC, PR-AUC, and Brier scores, followed by isotonic calibration and independent validation using the University of Pennsylvania dataset. Results: A lightweight, biomarker-based RF model effectively distinguishes first-diagnosis PD cases using limited baseline CSF indicators. Its offline Streamlit deployment offers a practical tool for resource-limited settings, bridging the gap between computational prediction and real-world neurological diagnosis.

Keywords: Parkinson's disease, cerebrospinal fluid biomarkers, machine learning, early diagnosis, Local model deployment

Received: 16 Oct 2025; Accepted: 24 Nov 2025.

Copyright: © 2025 Hu, Liu, CAO, Wei, Liu and Yang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Jing-Hua Yang, jyang@bu.edu

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.