ORIGINAL RESEARCH article
Front. Aging Neurosci.
Sec. Parkinson’s Disease and Aging-related Movement Disorders
Volume 17 - 2025 | doi: 10.3389/fnagi.2025.1672971
This article is part of the Research TopicMachine Learning Revolutionizing Aging-Related Movement Disorder DiagnosticsView all articles
Explainable Machine Learning for Early Detection of Parkinson's Disease in Aging Populations Using Vocal Biomarkers
Provisionally accepted- 1Nazarbayev University, Astana, Kazakhstan
- 2University of Canberra, Canberra, Australia
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Parkinson's Disease (PD) is a progressive neurodegenerative disorder that disproportionately affects the aging population and imposes a growing burden on global health systems. Early-stage detection remains clinically challenging due to the gradual and often ambiguous onset of symptoms. This study introduces an accurate and interpretable machine-learning framework for early PD identification using non-invasive biomedical voice biomarkers from the UCI Parkinson's dataset (195 sustained phonation recordings from 31 participants: 23 PD and 8 healthy controls, ages 46–85). The pipeline employs subject-level stratified splitting and normalization, applies BorderlineSMOTE to mitigate class imbalance in diagnostically critical regions, and uses an initial XGBoost to select the top 10 acoustic features for a Bayesian-optimized XGBoost classifier; the decision threshold is tuned by F1-maximization on validation data. On the held-out test set, the model attains 98.0% accuracy, 0.97 macro-F1, and 0.991 ROC-AUC, outperforming a strong neural baseline (DNN) by +4.0 percentage points (pp) in accuracy (94.0→98.0), +4.3 pp in macro-F1 (92.7→97.0), and +0.050 in AUC (0.941→0.991), and a classical SVM by +7.0 pp accuracy (91.0→98.0), +6.5 pp macro-F1 (90.5→97.0), and +0.089 AUC (0.902→0.991). Model decisions are explained with SHAP, providing global and patient-specific insights into influential voice features. These results demonstrate the feasibility of a scalable, non-invasive, and explainable voice-based tool for early PD screening, with strong potential for integration into mobile or telehealth diagnostic platforms.
Keywords: Parkinson's disease, Aging-related Neurodegeneration, Biomedical Voice Biomarkers, Explainable Machine Learning, Early Diagnosis and Predictive Modeling
Received: 25 Jul 2025; Accepted: 27 Aug 2025.
Copyright: © 2025 Egbo, Nigmetolla, Khan and Jamwal. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Prashant Kumar Jamwal, Nazarbayev University, Astana, Kazakhstan
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.