Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Aging Neurosci.

Sec. Parkinson’s Disease and Aging-related Movement Disorders

Volume 17 - 2025 | doi: 10.3389/fnagi.2025.1672971

This article is part of the Research TopicMachine Learning Revolutionizing Aging-Related Movement Disorder DiagnosticsView all articles

Explainable Machine Learning for Early Detection of Parkinson's Disease in Aging Populations Using Vocal Biomarkers

Provisionally accepted
  • 1Nazarbayev University, Astana, Kazakhstan
  • 2University of Canberra, Canberra, Australia

The final, formatted version of the article will be published soon.

Parkinson's Disease (PD) is a progressive neurodegenerative disorder that disproportionately affects the aging population and imposes a growing burden on global health systems. Early-stage detection remains clinically challenging due to the gradual and often ambiguous onset of symptoms. This study introduces an accurate and interpretable machine-learning framework for early PD identification using non-invasive biomedical voice biomarkers from the UCI Parkinson's dataset (195 sustained phonation recordings from 31 participants: 23 PD and 8 healthy controls, ages 46–85). The pipeline employs subject-level stratified splitting and normalization, applies BorderlineSMOTE to mitigate class imbalance in diagnostically critical regions, and uses an initial XGBoost to select the top 10 acoustic features for a Bayesian-optimized XGBoost classifier; the decision threshold is tuned by F1-maximization on validation data. On the held-out test set, the model attains 98.0% accuracy, 0.97 macro-F1, and 0.991 ROC-AUC, outperforming a strong neural baseline (DNN) by +4.0 percentage points (pp) in accuracy (94.0→98.0), +4.3 pp in macro-F1 (92.7→97.0), and +0.050 in AUC (0.941→0.991), and a classical SVM by +7.0 pp accuracy (91.0→98.0), +6.5 pp macro-F1 (90.5→97.0), and +0.089 AUC (0.902→0.991). Model decisions are explained with SHAP, providing global and patient-specific insights into influential voice features. These results demonstrate the feasibility of a scalable, non-invasive, and explainable voice-based tool for early PD screening, with strong potential for integration into mobile or telehealth diagnostic platforms.

Keywords: Parkinson's disease, Aging-related Neurodegeneration, Biomedical Voice Biomarkers, Explainable Machine Learning, Early Diagnosis and Predictive Modeling

Received: 25 Jul 2025; Accepted: 27 Aug 2025.

Copyright: © 2025 Egbo, Nigmetolla, Khan and Jamwal. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Prashant Kumar Jamwal, Nazarbayev University, Astana, Kazakhstan

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.