AUTHOR=Tan Jie , Huang E. , Hao Yang , Wan Hongping , Zhang Qian TITLE=Risk factors associated with severe progression of Parkinson’s disease: random forest and logistic regression models JOURNAL=Frontiers in Neurology VOLUME=Volume 16 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/neurology/articles/10.3389/fneur.2025.1550789 DOI=10.3389/fneur.2025.1550789 ISSN=1664-2295 ABSTRACT=Background and aimParkinson’s disease (PD) is a neurodegenerative disorder with significant variability in disease progression. Identifying clinical and environmental risk factors associated with severe progression is essential for early diagnosis and personalized treatment. This study evaluates the performance of Random Forest (RF) and Logistic Regression (LR) models in forecasting the major risk factors associated with severe PD progression.MethodsWe performed a retrospective analysis of 378 PD patients (aged 40–75 years) with at 2 years of follow-up. The dataset included patient demographics, clinical features, medication history, comorbidities, and environmental exposures. The data were randomly split into a training group (70%) and a validation group (30%). Both the RF and LR models were trained on the training set, and performance was assessed through accuracy, sensitivity, specificity, and the Area Under the Curve (AUC) derived from ROC analysis.ResultsBoth models identified similar risk factors for severe PD progression, including older age, tremor-dominant motor subtype, long-term levodopa use, comorbid depression, and occupational pesticide exposure. The RF model outperformed the LR model, achieving an AUC of 0.85, accuracy of 82%, sensitivity of 79%, and specificity of 85%. In comparison, the LR model had an AUC of 0.78, accuracy of 76%, sensitivity of 74%, and specificity of 79%. ROC analysis showed that while both models could distinguish between slow and rapid disease progression, the RF model had stronger discriminatory power, particularly for identifying high-risk patients.ConclusionThe RF model provides better predictive accuracy and discriminatory power compared to Logistic Regression in identifying risk factors for severe PD progression. This study highlights the potential of machine learning techniques like Random Forest for early risk stratification and personalized management of PD.