ORIGINAL RESEARCH article

Front. Aging Neurosci., 10 March 2026

Sec. Parkinson’s Disease and Aging-related Movement Disorders

Volume 18 - 2026 | https://doi.org/10.3389/fnagi.2026.1730550

Machine learning analysis of population-wide plasma proteins identifies hormonal biomarkers of Parkinson’s disease

  • 1. Tri-Institutional PhD Program in Computational Biology, New York, NY, United States

  • 2. Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, United States

  • 3. Division of Hematology, Department of Interdisciplinary Engineering, Daegu Gyeongbuk Institute of Science and Technology, Daegu, Republic of Korea

  • 4. Caryl and Israel Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY, United States

  • 5. Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medicine, New York, NY, United States

Abstract

With the number of Parkinson’s patients expected to rise due to an aging population, there is an increasing need to identify new diagnostic markers. These markers should be affordable and suitable for routine use to monitor the population, help stratify patients for treatment pathways, and provide new avenues for therapy. Genetic predisposition and familial forms account for approximately 10% of Parkinson’s disease (PD) cases, leaving a large fraction of the population with minimal effective markers for identifying high-risk individuals. The establishment of population-wide omics and longitudinal health monitoring studies provides an opportunity to apply machine learning approaches to these unbiased cohorts to identify novel PD markers. In this study, we present the application of three machine learning models to identify protein plasma biomarkers of PD using plasma proteomic measurements from 43,408 UK Biobank subjects as the training and test set and an additional 103 samples from the Parkinson’s Progression Markers Initiative (PPMI) as external validation. We identified a group of highly predictive protein plasma markers, including known markers Dopa decarboxylase (DDC) and Calbindin 2 (CALB2) as well as new markers involved in the JAK–STAT and PI3K-AKT pathways and hormonal signaling. We further demonstrated that these features are well correlated with UPDRS severity scores and stratified these into protective and risk-associated features that potentially contribute to the pathogenesis of PD.

1 Introduction

Parkinson’s disease (PD) is the second most common neurodegenerative disease, with an estimated prevalence of 10 million people worldwide (Ball et al., 2019). There is a massive economic burden on individuals, families, and the U. S. government, totaling over 50 billion dollars per year (Yang et al., 2020), and the number of PD patients is expected to grow with increased longevity of the general population (Bach et al., 2011). Parkinson’s disease not only diminishes the quality of life but also imposes a substantial societal burden through caregiving needs, lost productivity, and high healthcare costs. PD symptoms are marked by tremors, bradykinesia, and other movement-centered symptoms. This condition can eventually lead to severe loss of basic movements paired with swallowing difficulties. Pathologically, the disease is marked by the death of midbrain dopamine (mDA) neurons in the substantia nigra. This extensive death of mDA neurons in the substantia nigra creates Lewy bodies that impair cell function until death (Roeters et al., 2017) and results in the release of protein products into the serum and cerebrospinal fluid (CSF) (Santacruz et al., 2021). Symptoms often lag disease pathology, and the presymptomatic (prodromal) phase can last approximately 20 years (Roos et al., 2022). The onset of symptoms occurs after the majority of dopamine neurons are lost, presenting a significant challenge for early intervention and treatment options (Mandybur, 2018). A9 neurons are the mDA subtype with the highest loss in PD. Therefore, an increase in blood plasma levels of specific A9 proteins could be strong candidate biomarkers for early detection (Kamath et al., 2022). Single-cell analyses of A9 neurons lay a foundation that could provide deep insights into PD (Ma and Lim, 2021), but they are costly and are not currently used for diagnostics. Due to the long prodromal phase of PD, early detection is essential for intervention therapy (Mahlknecht et al., 2022; Hawkes, 2008). High-throughput proteomic screens allow for fast and cheap identification of impactful proteins. These new minimally invasive blood tests can be widely used in the aging population to diagnose conditions and provide early, targeted treatment, unlike CSF testing or imaging, which are often costly or too invasive.

Previous studies of PD biomarkers have highlighted a strong inflammatory component in PD, including a heavy effect size of dipeptidyl peptidase 7 and granzyme B (Rutledge et al., 2024). Recent evidence has shown that inflammation may contribute to the worsening of PD condition and enhance disease progression through reactive oxygen species that mDA neurons are particularly susceptible to Stojkovska et al. (2015). In addition to inflammatory markers, hormonal markers such as estradiol and testosterone have been shown to have significant correlations with PD progression metrics (Bovenzi et al., 2023), and lower incidence of PD in women may be partially attributed to the protective effect of estrogen (Shulman, 2002).

Previous studies to understand PD risk centered on genetic information using variations of linear models (Escott-Price et al., 2015; Han et al., 2020). Genome-wide association studies (GWAS) are based on the idea that many single-nucleotide polymorphisms, which individually have minor associations with PD, would compound and lead to stronger predictive models for developing PD (Euesden et al., 2015). Nalls et al. have obtained 0.69 area under the receiver operating curve (AUROC) PD classification performance from variant-only models trained on the UK Biobank (Nalls et al., 2019). Previous research using risk models estimates the probability of developing PD over time, helping to identify individuals at high risk for preventive care. Our current study focuses on developing early diagnostic models that diagnose or categorize individuals as having PD or not, aiding in therapeutic decision-making. More recent approaches such as whole genome sequencing, proteomics, or phenotypic evaluation can add significant predictive values. CSF proteomic models have shown strong classification of PD but are invasive and carry some risk of complications, which would exclude their use for large-scale screens (Kwon et al., 2022; Williams and Fost, 1992). More recent methods incorporate machine learning approaches for classification of PD featuring multiomic models spanning transcriptomic (Makarious et al., 2022) to in-depth invasive CSF (Tunbridge et al., 2019), sensor (Um et al., 2017), and imaging data (Tsukita et al., 2022).

The multiomic models presented here use high-throughput proteomic data generated by Olink assays, demographic information, and genetic data. The proteomic data are obtained using minimally invasive and low-cost information using only 1 μL of plasma (Zhang et al., 2024). This information will become more widespread in the future and is ideal for non-invasive combinations for screening and understanding the mechanism of disease pathology.

Using proteomic data across large populations can aid the discovery of more disease insights. Neuronal markers that may leak into CSF and then blood have been shown in studies focused on CSF in limited sample sizes (Phani et al., 2012; Winchester et al., 2022). Proteomic data have been collected from the substantia nigra to identify differences in pathways between PD cases and controls (Jang et al., 2023). Previous state-of-the-art CSF proteomic models for PD prediction yielded an AUROC of 0.80 for held-out sub-cohorts, and combining CSF and plasma proteomics yielded an AUROC of 0.89 (Rutledge et al., 2024; Phani et al., 2012). However, CSF and imaging measurements are not routine and are typically collected from high-risk or diagnosed PD patients, which are less applicable for population screens (Um et al., 2017). Furthermore, it is not clear whether these biomarkers are correlated with the severity of PD, as defined by the Unified Parkinson’s Disease Rating Scale (UPDRS), such that they can be used to track disease progression.

We used a deep learning model, a regression model, and a support vector machine (SVM) model to identify potential novel biomarkers for PD. Our results identified that 9 of our top 20 features from the neural network are hormonally related. We characterized these proteins, such as prolactin (PRL), through protein–protein interaction queries as well as through extensive computational analysis and literature reviews. Additionally, we found enrichment for the JAK–STAT pathway, which plays a strong role in immunity as well as other functions. We further validated some of our top features identified in the UK Biobank on an independent external validation set (PPMI). Following this, we used top features to determine PD severity through the UPDRS score, canonically accepted as a severity metric for PD. To the best of our knowledge, this is one of the largest population-based proteomic studies for Parkinson’s disease, leveraging machine learning approaches. Furthermore, we validated the identified biomarkers by correlating them with Parkinson’s disease severity scores, demonstrating their potential utility for tracking disease progression.

2 Results

2.1 Study design overview

This study was performed in four stages: data acquisition, data cleaning plus alignment, model architecture tuning, and model interpretation (Figure 1). Data were acquired with permission from the UK Biobank and PPMI resources (Biobank, U, 2014; Aleksovski et al., 2018). The two studies used different approaches and technologies for variant detection. The second step consisted of extensive alignment of genomic array results, followed by filtering to match the PPMI genomic feature set to that of the UK Biobank. We developed three models—a neural network, support vector machine (SVM), and ridge regression—to classify PD vs. control using the proteomic data, genetic variants, or a combination of both. Training and testing of the model were performed on the UK Biobank datasets, and external validation was performed on the PPMI data. Model interpretation was performed by extensive literature review of the top features and KEGG pathways to ascertain biological interpretation. To identify the most predictive features, Shapley Additive Explanations (SHAP) (Lundberg and Lee, 2017) were calculated for the top 85 (12.5%, or approximately 250 top features from the top two models) common features between the two top-performing models. These features were used in a linear model to predict the PD severity scores defined by the Unified Parkinson’s Disease Rating Scale (UPDRS) metric. The Unified Parkinson’s Disease Rating Scale (UPDRS) comprises multiple subscales that capture distinct clinical domains of Parkinson’s disease, including non-motor experiences of daily living (Part I), motor experiences of daily living (Part II), motor examination (Part III), and motor complications (Part IV). Among these, the UPDRS III Motor Examination is widely regarded as the most sensitive to disease progression and typically demonstrates the fastest longitudinal change, reflecting worsening motor impairment.

Figure 1

2.2 Dataset populations

In this study, we took advantage of two large population cohorts. The first is the UK Biobank, which is one of the largest biomedical databases that contains genomics and health-related data from half a million UK residents (Bach et al., 2011; Sudlow et al., 2015). The second cohort is patient samples from the Parkinson’s Progression Markers Initiative, an observational study sponsored by the Michael J. Fox Foundation aimed at identifying biomarkers to predict, diagnose, and track Parkinson’s disease progression (Marek et al., 2011). Both study populations are a majority of European ancestry and are skewed toward older patients with a range of 31–83 years and median ages of 59 years and 60 years for the UK Biobank and PPMI, respectively (Table 1). Selecting UK Biobank participants who were profiled by the Olink Explore 1,536 protein panel reduced the number of patients to 43,408. Among those with Olink data, 778 patients were diagnosed with Parkinson’s disease, resulting in a case-to-control ratio of 2%. The UK Biobank was used predominantly to train and benchmark the predictor algorithms, whereas the 103 PPMI samples were used as an independent external validation dataset. The filtered PPMI dataset had a case-to-control ratio higher than the UK Biobank of approximately 24%. This class imbalance is common in a disease setting. Thus, distributions of cases were different, which impacted performance on the PPMI validation. This is primarily due to the PPMI data selecting for PD patients, while the UK Biobank is a population-level study. Deep learning models can be ill-suited for modeling rare case events due to overfitting (Bejani and Ghatee, 2021). Thus, methods were used to overcome this, including dropout, mini-batches, class weighting, and optimized learning rates.

Table 1

CharacteristicUK BiobankPPMI
Unique patients43,408103
Male20,07974
Female23,32929
Age (minimum, medium, and maximum)40, 59, 7031, 60, 83
Parkinson’s disease77826
Controls42,63077
Proteomic serum1,463 proteins1,463 proteins
SNPsAxiom ArrayNeuroX Array and Immunochip Array

Demographics and feature space of UK Biobank and PPMI cohorts profiled with the Olink Explore protein panel.

In terms of gender split, the UK Biobank has 54% female, and the PPMI has 28% female. Sex bias may influence the biomarkers studied, such as PRL and growth hormone (GH1), which are generally more abundant in women (Thapa and Bhusal, 2023; Span et al., 2000).

Another important difference between the two cohorts is their treatment status. The PPMI is selected for levodopa-naïve patients, indicating that the majority of the PD patients have not received dopaminergic therapy yet. This gives a proteomic profile of the disease state without the confounding effect of treatment. In contrast with the UK Biobank, roughly 4%, or 30 of 778 PD patients, of the PD patients in the UK Biobank are on levodopa or similar medications (levodopa, co-careldopa, Sinemet-LS tablets, Half-Sinemet CR M/R tablets, and Sinemet-62.5 tablets). This treatment, although small, is a limitation that can alter proteomic profiles and confound biomarker identification and model performance. Currently, levodopa is known to impact DDC levels, which was one of the features identified in our study (Paslawski et al., 2023).

2.3 Multimodal proteomic and genetic models

Using the subset of harmonized genomic variants and proteomic data, we developed three computational models to classify PD status for individual subjects. We used neural network, SVM, and ridge regression models that input proteomic and variant profiles and produce a probability value for PD diagnosis that is evaluated by the ROC analysis of classifier performance (Figure 1). Neural networks (NN) are a general framework for estimating complex, non-linear functions, making them ideal for capturing intricate relationships in high-dimensional data. Support vector machines (SVM) identify a dividing hyperplane to maximize class separability, while ridge regression is a linear model that uses a penalized linear combination of features to mitigate overfitting. The area under the receiver operating curve (AUC-ROC) measures were comparable across all three models on both the training UK Biobank and the validation PPMI datasets (Table 2; Figure 2). All three methods had comparable performance of ~0.77 AUC-ROC on the held-out test data from the UK Biobank, with the NN having the highest performance of 0.79 (Figure 2). The three models performed similarly on the PPMI external validation with an AUC-ROC of ~0.67. The performance drop on the external validation set is potentially due to differences in serum data capture and differences in case-to-control ratios of datasets. Additional factors impacting validation performance could be related to the PPMI being drug naïve vs. the UK Biobank having a larger number of patients under treatment. When considering sensitivity and specificity, the ridge regression model had the highest sensitivity at 0.76, while the SVM model had the highest specificity at 0.69. The neural network achieved the best overall balance, with sensitivity and specificity of 0.73 and 0.69, respectively. These metrics were both within 3% of the other models, despite a slight reduction in performance on these metrics.

Table 2

Performance metricRidge regressionSVMNeural network
AUC-ROC test set (UK Biobank)0.770.750.79
Sensitivity0.760.670.73
Specificity0.640.700.69
AUC-ROC validation (PPMI)0.690.680.66

Performances of the multiomic prediction models.

The three models’ performance was evaluated by the AUC, sensitivity, and specificity measures for PD classification performance on the UK Biobank held-out data as well as AUC performance on the PPMI external validation set.

Figure 2

Overall, model performances are comparable on both the held-out test set in the UK Biobank and the validation PPMI test set. While the neural network had slightly better performance on the training data, it was lower than the other models in the validation set, suggesting potential overfitting. In contrast, the ridge regression model had better performance on the validation set (Table 2), which, through regularization, often trades off some amount of training accuracy for improved generalization.

A possible confounding factor is the prior Parkinson’s disease (PD) and Levodopa (L-DOPA) treatment history that the majority of PD patients receive. The PPMI cohort did not receive prior treatment, and of the 778 UK Biobank patients, 30 had records of L-DOPA treatment (4%). To investigate the impact of these cases on model performance, we retrained the ridge regression models, excluding the levodopa-treated patients from the UK Biobank training set, and found that AUC-ROC values remained similar at 0.76 in the test for the ridge regression model compared to 0.77 and 0.69 in external validation for the PPMI in both L-dopa naïve and original (Supplementary Figure 1). This suggests that the presence of levodopa treatment did not significantly impact the overall model performance.

2.4 Genetic-only model

We created a variant-only model to analyze baseline performance without proteomic data that is akin to a polygenic risk score model. While all three models had reduced performance the neural network AUROC of 0.62 on the training set is significant reduction by over 25% relative to the multimodal model (Supplementary Table 1). This was expected, as no genetic variants appear in the top 20 feature set in the combined proteomic and single-nucleotide polymorphisms (SNP) neural network (Figure 3A). Although several genetic variants appeared in the top 85 of features, external validation performance on the PPMI cases of the variant-only model declined further to random performance of 0.50.

Figure 3

Overall, the use of only genotypic and demographic information, primarily sex and age, led to substantially reduced prediction performance relative to the multimodal modeling that includes the serum proteomic measurements. This is expected, as these features are typically used for polygenic risk scores to identify individuals with high risk, but their genetic penetrance is low. In contrast, proteomic data is a more immediate marker for PD state diagnostics.

2.5 Feature evaluation

We used saliency techniques to identify and better understand the most impactful features of both the Olink and SNP for the neural network. We investigated the magnitude of feature impact using the Shapley additive explanation (SHAP) score that quantifies feature impact on the model performance (Figures 3A,B). After investigating the top 20 features, we found neuronal-specific markers such as DDC and CALB2 that were expected outcomes, given that they are related to dopamine function (Devos et al., 2014; Mouatt-Prigent et al., 1994). Unexpectedly, we also identified hormonal markers such as prolactin and GH1, which are some of the most influential features. SHAP scores allow us to see individual scores from specific patients as well as groups of individuals, such as all women. For example, in one subject from the UK Biobank study, PRL’s negative value indicates a protective contribution, whereas GH1’s positive value indicates a harmful effect (Figure 3C).

Further biological investigation through KEGG analysis of these top markers yielded strong enrichment for the prolactin signaling pathway and the JAK–STAT and PI3K-Akt pathways (Figure 4A). JAK–STAT and PI3k-Akt pathways were validated in other studies in which it was found that the PI3k-Akt pathway could be one of the most influential in a rapidly progressing PD subtype (Su et al., 2024). The hormonal signaling protein PRL, one of the most widely studied hormones, prolactin, also emerged as a top feature, suggesting involvement of a hormonal component to PD pathogenesis through the activation of JAK–STAT pathways and prolactin signaling pathways. Additionally, the protein–protein interaction enrichment test showed that six of the top features were interrelated hormones that are highly interconnected (p < 6.15 e-11) (Figure 4B; Franceschini et al., 2012). Our findings suggest that the JAK–STAT and PI3K-AKT pathways may act as central hubs in PD pathology, potentially linking neuroinflammation, hormonal signaling, and neuronal survival.

Figure 4

2.6 Feature correlation with PD severity score

We next investigated the shared features that the different models identified as significant to identify the core biological pathways that are predictive of PD. We subset the top 85 (12.5%) features that were common across the ridge regression and neural net models, excluding the SVM model due to slightly lower performance (see Methods section). These 85 common features were used to construct a linear regression model to predict the UPDRS score (Movement Disorder Society Task Force on Rating Scales for Parkinson's Disease, 2003), which is the accepted severity score for PD among physicians (Figure 5; Supplementary Table 2). We selected 85 features as the optimal number of descriptive features without overfitting based on empirical analysis of the coefficient of determination (R2) for various numbers of features across different adjusted R2 values (Supplementary Figure 2). We opted to use a straightforward linear regression model for interpretability and to investigate the relationship between model-derived PD classification scores and PD severity. While more complex predictive models may capture non-linear patterns with higher accuracy, the simplicity of the linear regression model facilitates clearer insights into how model outputs relate to clinical measures. This approach supports the broader aim of demonstrating that classification frameworks can produce outputs that meaningfully reflect disease severity in a clinically relevant and interpretable manner.

Figure 5

We found that these top features are well correlated with UPDRS values in the PPMI dataset (R2 = 0.86). Several SNPs were included in this top feature set as highly predictive of PD that were previously validated in the literature. These include rs12951632, rs11158026, and rs10748818, indicating their established association with Parkinson’s disease and further supporting their potential role in disease susceptibility and progression (Farrow et al., 2022; Fan et al., 2021). Two of the variants, rs12951632 and rs11158026, were identified in related genome-wide association studies as variants that are strongly associated with PD. The third variant, rs10748818, was previously identified as causing changes in the GBF1 gene, which is a critical factor in PD pathogenesis (Fan et al., 2021).

2.7 Feature signatures associated with increased PD

Next, we were interested in investigating the contributions of the features to the likelihood of PD. K-nearest neighbors clustering of the PPMI cohort using the shared 85 features identified three clusters of distinct signatures reflecting their underlying expression patterns. While 85 features were used to model UPDRS severity, patient clustering based on these 85 features was largely driven by 14 proteins that had varying degrees of expression. High abundance in cluster 0 was associated with low PD/control ratio, medium expression in cluster 2, and low expression in cluster 1 associated with high PD/control ratio (Figure 6A). Cluster 1 is strongly distinct from other clusters by a protein signature that is driven by 14 proteins that are negatively correlated with the PD (Figure 6B). One of these proteins is FABP5, a fatty-acid-binding protein, which was recently identified as a PD marker in an independent proteomic study of 99 PD patients (Hällqvist et al., 2024). In addition, a similar study by Hallqvist et al. used a mass spectrometry approach to identify protein markers from CSF and serum that predict PD in the prodromal phase among a small cohort (Hällqvist et al., 2024). While not all significant proteins from their study overlapped with the proteins measured in the Olink assay, four overlapping proteins (CST3, DKK3, PTGDS, and VCAM1) were found as significant PD markers in our study (adjusted p-value < 0.01, by t-test).

Figure 6

2.8 Feature correlations

To gain insights into the cross-feature correlation structure and specifically the 14 proteins predominant in the sample clusters, we performed a protein pairwise correlation analysis using UPDRS scores (Supplementary Figure 3). Two distinct, highly correlated protein groups were identified. The first group of proteins is characterized by cytokine–cytokine enrichment as well as involvement in the PI3K-AKT signaling pathway (Supplementary Figure 4A). This signature has a strong immune and inflammatory component, which has been a hallmark of PD (Ferrari and Tarelli, 2011; McGeer and McGeer, 2004), and every feature was positively correlated with UPDRS scores. The second group (marked in blue, Supplementary Figure 3) is associated with the MAPK signaling pathway as well as other apoptotic pathways (Supplementary Figure 4B). MAPK signaling has been strongly implicated in PD with connections to high-penetrance variants such as LRRK2 (Obergasteiger et al., 2018). This signature is similar to the downregulated signature associated with increased PD cases and is inversely correlated with UPDRS scores, suggesting a protective role in PD pathogenesis (Figure 6B).

The most correlated protein with UPDRS severity is angiopoietin-like 4 (ANGPTL4) (Pearson coefficient = 0.57), a lipid homeostasis modulator where elevated levels were recently associated with neurodegeneration (Supplementary Figure 3; Zhao et al., 2024). One possible mechanism is that hypoxic regions in the brain due to PD result in the upregulation of ANGPTL4 to compensate for this and modulate either inflammatory components or stress (González-Muniesa et al., 2011). Conversely, the most negatively correlated protein (Pearson coefficient = −0.56) is HPGDS, an enzyme that is involved in proteinoid synthesis and anti-inflammatory response (Ouyang et al., 2022). A possible relevance is that lower Hematopoietic prostaglandin D synthase (HPGDS) means less PGD2, leading to more unchecked neuroinflammation and worsening UPDRS scores (Corwin et al., 2018). Overall, feature clustering on the PPMI dataset shows strong protein signatures associated with PD cases and UPDRS scores. The high overlap between the cluster signature and the signature in the feature–feature correlation is possibly a core protein signature that can inform future studies to better understand PD mechanisms.

3 Discussion

Our PD prediction algorithms from plasma proteomics confirmed several established biomarkers as well as potentially novel biomarkers that warrant further investigation. The top features in both the ridge regression and neural network models are DDC and CALB2, which have previously been implicated in PD (Rutledge et al., 2024; Poulin et al., 2014). DDC is involved in dopamine synthesis (Pedersen et al., 2023) and is directly related to massive cell death of dopamine neurons in the striatum (Burkhard et al., 2001). CALB2 is heavily featured in a single-cell analysis of PD as a key marker for A10 dopamine neurons as compared to A9 dopamine neurons (Poulin et al., 2014). Ridge regression coefficients show CALB2 as negatively associated with PD, which is consistent with its expression in A10 mDA cells that are mostly spared from cell death during PD progression (Grealish et al., 2010).

In our study, several other less direct markers, such as hormonal components, were significantly correlated with PD incidence. These hormonal features, especially prolactin, were significantly correlated in multiple models and cluster analyses. Our findings illustrate a strong hormonal component, with 6 of the top 20 proteins identified by the neural network being hormonally related through heavy neurodevelopmental involvement and links to apoptotic functions. Further evidence from a protein interaction analysis identified these six proteins as closely functionally related (p < 6.15e-11; Figure 4B).

Another significant marker, human growth hormone (GH1), has several known functions ranging from growth of various tissues to neurogenesis (Martínez-Moreno and Arámburo, 2020). Notably, GH1 has been used to treat amyotrophic lateral sclerosis in vitro as well as in mouse models, demonstrating strong neurological protection (Chung et al., 2015). This is consistent with our findings, where GH1 is enriched in the control group relative to PD cases, making it a potentially protective marker. Prolactin produced by lactotrophs in the pituitary gland is another hormone that could be critical to understanding PD (Lamberts and Macleod, 1990). Prolactin (PRL) is known for its multifaceted impacts, including neuroinflammation, which is part of the pathogenesis of PD (Çınar et al., 2022), and modulation of PRL may offer a new therapeutic path for PD. Studies have also shown that prolactin release is regulated by dopamine (Binart et al., 2019), and therefore, loss of PRL may indicate loss of dopamine.

One common inflammatory model used to study neuroinflammation in mouse models is through intranasal lipopolysaccharide injections. This model led to selective dopamine substantia nigra (SN) loss and increased alpha synuclein in the SN (He et al., 2013). Furthermore, it was shown that in this lipopolysaccharide model, PRL is part of a full-autocrine loop that enhances inflammatory response that may contribute to further worsening PD through enhanced inflammation (López-Rincón et al., 2013).

PRL and GH1 play a strong role in the brain with widespread expression. This role especially extends past simply neurodevelopmental. PRL and GH1 are highly expressed in the cerebellum (Supplementary Figure 5A). Findings of cerebellar circuitry alteration in PD highlight the importance the cerebellum plays in PD progression and disease pathology (Li et al., 2023). The cerebellum is perhaps more involved in PD pathology than previously thought and may serve as an understudied area that contains biomarkers indicative of PD.

We performed a KEGG pathway enrichment analysis with the top 20 neural network features (see Methods). The core enriched pathways highlight the role of JAK–STAT and AKT pathways in PD pathogenesis. These are broad signaling pathways, but JAK2/STAT5 are commonly associated with developmental embryogenesis (Morris et al., 2018). These pathways are likely associated with PRL and GH1 activities, as both receptor bindings can lead to JAK2 activation (Morris et al., 2018; Supplementary Figure 5B).

While our study focused on PD, it represents a new framework to investigate other diseases through multiomic integration of proteomics and variants in the UK Biobank and other public data resources (Biobank, U, 2014; Denny et al., 2019; Nagai et al., 2017).

We have laid the groundwork to continue building on this multiomic model to provide an analysis framework to integrate proteomic Olink data and variant data in the UK Biobank. This approach can identify new associations with other neurodegenerative diseases. For example, through phenotype–gene enrichment analysis, we found several phenotypic PD features associated with height, which has some suggestive correlation with PD (Supplementary Figure 6; Saari et al., 2022; Ragonese et al., 2007). With an increase in Olink data in the UK Biobank, such new insights into the mechanisms of these diseases can now be investigated.

Plasma proteomics emerges as a powerful, minimally invasive alternative to cerebrospinal fluid-based biomarkers, achieving comparable classification performance while offering broader accessibility for large-scale screening. By leveraging machine learning models, we identified biomarkers, including hormonal markers such as prolactin and GH1, that suggest an underexplored hormonal axis in PD pathogenesis. Additionally, pathway analysis implicates the JAK–STAT and PI3K-AKT pathways in linking neuroinflammation, hormonal signaling, and neuronal survival. These findings not only demonstrate the feasibility of integrating plasma proteomics into cost-effective, population-level screening programs for early diagnosis and intervention but also provide a framework that can be extended to investigate biomarkers for other neurodegenerative diseases.

4 Materials and methods

4.1 Study population

The study population consisted of two groups from the UK Biobank cohort and Parkinson’s Progressive Marker Initiative cohort (Biobank, U, 2014; Marek et al., 2011). The UK Biobank is a large-scale biomedical database containing de-identified information of over 500,000 patients, including genotyping as well as serum analysis. Diagnosis of Parkinson’s disease was made by the International Classification of Diseases coded as ICD-9 and ICD-10 codes. Single-nucleotide polymorphisms were taken from the Axiom Array. The UK Biobank, after processing for missingness and filtering for Olink data, consisted of 43,408 patients: 23,329 were female and 20,079 were male, with a median age of 59 years. The PPMI dataset is an international consortium across 12 countries that is primarily sponsored by the Michael J. Fox Foundation (Marek et al., 2011). The PPMI uses two arrays for SNPs, including Immunochip and NeuroX, specially designed for PD. After processing, there were approximately 103 individuals, consisting of 29 females and 74 males, with a median age of 60 years. Diagnosis of PD was made with the addition of Dopamine Transporter Imaging (DAT) imaging to enhance accuracy (Marek et al., 2011).

We subset these datasets for participants that had Olink serum data collected. This substantially limited the number of participants. Both datasets used Olink Explore 1,536, so feature alignment was possible. A total of 1,463 proteins were measured from serum via the proximity extension assay (PEA), a high-multiplex immunoassay, after filtering. Olink Explore 1,536 contains 4 separate panels: oncology, cardiometabolic, inflammation, and neurology (Zannad et al., 2022). The assay works by double antibody binding to proteins leading to DNA hybridization, which can be PCR amplified, sequenced, and quantified. Proprietary normalization of these values provides protein quantification, allowing comparison across individuals and studies.

4.2 Variant preprocessing and alignment

Genomic variants were preprocessed using PLINK 2.0 software, a genome association toolkit (Chang et al., 2015). We aligned the variants in both arrays by position, major allele, and minor allele to Genome Reference Consortium Human Build 37. The UK Biobank used the Axiom array, while the PPMI dataset used the Immunochip and NeuroX arrays. We filtered for the top 324 overlapping variants and used an additional 311 variants identified in Kim et al. (2024). These were taken from the Polygenic Score (PGS) catalog, as these were previously found to have signal in a non-publicly accessible 23andMe dataset with p < 1.35E-03 (Lambert et al., 2021). This proprietary dataset allowed for more power, leading to the identification of associated variants. In total, 635 filtered variants with overlap across both the UK Biobank and PPMI validation set were used in our prediction models.

Several QC steps were performed at the variant level. We applied a series of standard filters to ensure data quality and reliability. We excluded variants that did not conform to the Hardy–Weinberg equilibrium, applying a p-value threshold of < 1e-6 to maintain population consistency. Variants with a call rate below 95% were filtered out due to missing data concerns, while ambiguous SNPs (A/T or C/G alleles) were removed to avoid strand ambiguity. A minor allele frequency (MAF) threshold of >1% was set to focus on variants with sufficient population representation. Additionally, we filtered variants based on imputation quality, using an Imputation Info score > 0.8 to ensure high-confidence imputation calls. Relatedness was assessed using a kinship coefficient, excluding individuals with relationships stronger than first-degree (>0.125). Finally, we removed samples exhibiting sex discordancy between self-reported and genotypic sex to ensure consistency in a sex-based genetic analysis.

4.3 Study design

The UK Biobank was used for training and testing, and PPMI data were used as an independent external test to measure classifier performance. Patients were filtered for European ancestry and missing features. In addition, features were filtered for missing values. The resulting multiomic dataset combined demographic data, genomic data, and proteomic data. We used several baseline models, such as a ridge regression model, which performs L2 penalization to minimize coefficients (Wu and Xu, 2020). Additionally, an SVM model with a linear kernel was used as a second baseline model. For training and testing, a split of 60 and 40% was used, respectively. The ratio of PD cases to control was maintained in training and test splits. We chose a 60 and 40% split for training and testing to balance a robust training process with a large and independent test set. A larger training set allows the model to learn more generalizable patterns from the data, which is especially important for complex tasks such as disease classification. Given the small number of PD cases in our dataset, using 60% of the data for training maximized the number of cases for the model to learn while still having a large enough test set to give reliable performance metrics. By using the same 60 or 40% split across different models (e.g., SNP-only model, ridge regression model, and SVM model), we were able to compare their performance on the same datasets and ensure consistency and fairness in model evaluation. The genetic single-nucleotide polymorphism-only model used the same split of 60 and 40% for the training and test dataset. Ridge regression and SVM were tuned by grid search to identify the best model. Both used class weighting to address imbalance common with medical data.

The neural network architecture is a hidden two-layer neural network. In this fully connected neural network, the size of the first hidden layer is 120, and the size of the second hidden layer is 40. Activation was done by leaky ReLU. A sigmoid function was then applied for the final output layer binary prediction value. The model uses binary cross entropy (BCE) as the loss function that assigns a larger error for mislabeled cases as compared to mean square loss, which is more appropriate for training data with a relatively low fraction of positive cases.

where N is the number of samples, yi is the label (i.e., PD or control), and pi is the predicted probability of yi. Several steps were taken to deal with imbalanced cases, including class weights and model architecture adjustments. Dropout was used to address overfitting, which was emboldened by imbalanced cases. Furthermore, a large batch size was used to give more positive PD cases for each stochastic gradient descent step using the Adam optimizer. The neural network architecture and training parameters were selected through a grid search, which identified two hidden layers with 120 and 40 units, respectively, a dropout rate of 0.2, a learning rate of 0.001, a weight decay of 1e-4, and a training duration of 40 epochs as the optimal configuration.

4.4 Feature selection and predicting UPDRS scores

A linear model was built to associate the common predictive features from the neural network and ridge regression with the UPDRS severity score. Using an increasing number of common features (from 20 to 200), we found 85 features with optimal R2 value and minimized overfitting (Supplementary Figure 2).

The samples were clustered using these 85 features by the K-means approach, with K = 3, as determined by a silhouette analysis (Supplementary Figure 7).

4.5 Feature correlation analysis

A feature–feature correlation analysis was performed on these top 85 features using Pearson’s correlation. The UPDRS score was included in these correlation calculations and hierarchical clustering of the correlation values (Supplementary Figure 3).

4.6 Model interpretation

Feature weights on the ridge regression model were set through the coefficients placed on the optimized model. The neural network used a game-theoretic approach to feature importance called SHAP scores (Lundberg and Lee, 2017). These scores create a power set of features to determine the marginal contribution of each feature. The mean absolute SHAP scores were taken across all combinations of models on the testing set to determine the feature importance. The distribution was visualized through a bee swarm plot. Additionally, a phenotype–gene enrichment analysis was performed to detect statistically significant phenotypes from our top 20 features. The top 20 features were run for feature enrichment analysis. Ultimately, the results for top KEGG phenotypes were similar across 20–100 feature combinations. To focus on the most impactful features, we used the top 20 and highlighted these using SHAP scores and KEGG analyses.

4.7 KEGG analysis

For the KEGG pathway analysis, we used an overrepresentation analysis with hypergeometric tests to find the pathways associated with the features selected from our models. KEGG pathway analyses were conducted using two sets of features: the top 20 features identified by our neural network model and a broader subset of 85 features jointly selected from both the neural network and ridge regression models. These 85 features represented the most predictive markers shared across models. This approach allowed us to identify enriched pathways that are both statistically significant and mechanistically relevant to Parkinson’s disease pathology. This way we capture a wider range of pathways while maintaining enough statistical power to interpret the results.

4.8 Limitations

Several limitations are present, including GH1 levels that are known to decrease with age (known as somatopause) and PRL levels that are typically higher in women and inhibited by dopamine (Ben-Jonathan and Hnasko, 2001; Camanni et al., 1977; Reyes et al., 1977). These factors represent significant confounders that must be considered in the analysis, although they were mitigated as best as possible. In addition, while we demonstrated that prior L-dopa treatment had minimal impact on our model performance, we cannot exclude the possibility that with more detailed medical history, L-dopa and other treatments can emerge as confounders.

Additionally, age- and endocrine-related confounding were not fully controlled in the current analyses, and, therefore, the observed associations for GH1 and PRL should be interpreted with caution. While cohort-level sex distributions differed between the UK Biobank and the PPMI, sex and age were included as covariates, suggesting robustness to these differences. Future studies with tighter age and gender stratification are possible in the future with a larger caseload, as there was already a heavy class imbalance we were concerned with.

Olink also has a problem of hemolysis, which is the rupture of red blood cells during testing, which can cause intracellular proteins to leak out to plasma and bias results.

In addition, batch effects exist from the UK Biobank to the PPMI dataset. Blood was collected in different facilities, although with the same assay and normalization process. Furthermore, the UK Biobank is a population-level study, while the PPMI is a hand-selected longitudinal PD patient population, creating different underlying distributions.

Statements

Data availability statement

The UK Biobank and PPMI genomic and phenotypic data is restricted and requires prior approval for access and therefore cannot be provided in this publication. Requests to access these datasets should be directed to https://www.ukbiobank.ac.uk/; https://www.ppmi-info.org/.

Ethics statement

The studies involving humans were approved by the UK Biobank, North West Multicentre Research Ethics Committee (REC reference 11/NW/0382). The PPMI study was approved by the Institutional Review Boards of all participating sites. The studies were conducted in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required from the participants or the participants’ legal guardians/next of kin in accordance with the national legislation and institutional requirements.

Author contributions

FC: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Software, Supervision, Validation, Writing – original draft, Writing – review & editing. TK: Investigation, Methodology, Supervision, Visualization, Writing – review & editing. OE: Data curation, Formal analysis, Funding acquisition, Project administration, Resources, Visualization, Writing – review & editing. DB: Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Writing – original draft, Writing – review & editing.

Funding

The author(s) declared that financial support was received for this work and/or its publication. DB and FC were supported by R01 NS118067 from NIA. FC was supported by NIH grant T32 GM132038. TK was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea Government (MSIT) (no. RS-2024-00351442). Data used in the preparation of this article were obtained on 31 January 2023 from the Tier 1 Parkinson’s Progression Markers Initiative (PPMI) database (www.ppmi-info.org/access-data- specimens/download-data), RRID: SCR_006431. For up-to-date information on the study, visit www.ppmi-info.org. PPMI—a public-private partnership—is funded by the Michael J. Fox Foundation for Parkinson’s Research and funding partners, including the Aligning Science Across Parkinson’s (ASAP) initiative. Other partners include a consortium of industry players, non-profit organizations, and private individuals. This research has been conducted using the UK Biobank Resource under Application Number 47137.

Acknowledgments

We thank members of the Betel lab for helpful discussion and members of the Lorenz Studer and OE groups for helpful discussion and data access.

Conflict of interest

OE was a scientific adviser for, and an equity holder in, Freenome, Owkin, Volastra Therapeutics, OneThree Biotech, Genetic Intelligence, Acuamark DX, Harmonic Discovery, and Champions Oncology, and has received funding from Eli Lilly, Johnson & Johnson–Janssen, Sanofi, AstraZeneca, and Volastra.

The remaining author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that Generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fnagi.2026.1730550/full#supplementary-material

References

  • 1

    AleksovskiD.MiljkovicD.BraviD.AntoniniA. (2018). Disease progression in Parkinson subtypes: the PPMI dataset. Neurol. Sci.39, 19711976. doi: 10.1007/s10072-018-3522-z,

  • 2

    BachJ. P.ZieglerU.DeuschlG.DodelR.Doblhammer-ReiterG. (2011). Projected numbers of people with movement disorders in the years 2030 and 2050. Mov. Disord.26, 22862290. doi: 10.1002/mds.23878,

  • 3

    BallN.TeoW.-P.ChandraS.ChapmanJ. (2019). Parkinson's disease and the environment. Front. Neurol.10:218. doi: 10.3389/fneur.2019.00218,

  • 4

    BejaniM. M.GhateeM. (2021). A systematic review on overfitting control in shallow and deep neural networks. Artif. Intell. Rev.54, 63916438. doi: 10.1007/s10462-021-09975-1

  • 5

    Ben-JonathanN.HnaskoR. (2001). Dopamine as a prolactin (PRL) inhibitor. Endocr. Rev.22, 724763. doi: 10.1210/edrv.22.6.0451,

  • 6

    BinartN.YoungJ.ChansonP. (2019). “Prolactin assays and regulation of secretion: animal and human data” in Prolactin disorders: from basic science to clinical management, (Cham, Switzerland: Springer Nature) 5578.

  • 7

    Biobank, U. About uk biobank (San Francisco, CA, USA: Public Library of Science) (2014)

  • 8

    BovenziR.SancesarioG. M.ContiM.GrilloP.CerroniR.BissaccoJ.et al. (2023). Sex hormones differentially contribute to Parkinson disease in males: a multimodal biomarker study. Eur. J. Neurol.30, 19831990. doi: 10.1111/ene.15801,

  • 9

    BurkhardP.DominiciP.Borri-VoltattorniC.JansoniusJ. N.MalashkevichV. N. (2001). Structural insight into Parkinson's disease treatment from drug-inhibited DOPA decarboxylase. Nat. Struct. Biol.8, 963967. doi: 10.1038/nsb1101-963

  • 10

    CamanniF.MassaraF.BelforteL.RosatelloA.MolinattiG. M. (1977). Effect of dopamine on plasma growth hormone and prolactin levels in normal and acromegalic subjects. J. Clin. Endocrinol. Metab.44, 465473. doi: 10.1210/jcem-44-3-465,

  • 11

    ChangC. C.ChowC. C.TellierL. C.VattikutiS.PurcellS. M.LeeJ. J. (2015). Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience4:s13742-015-0047-8. doi: 10.1186/s13742-015-0047-8

  • 12

    ChungJ.-Y.SunwooJ. S.KimM. W.KimM. (2015). The neuroprotective effects of human growth hormone as a potential treatment for amyotrophic lateral sclerosis. Neural Regen. Res.10, 12011203. doi: 10.4103/1673-5374.162690,

  • 13

    ÇınarE.TelB. C.ŞahinG. (2022). Neuroinflammation in parkinson’s disease and its treatment opportunities. Balkan Med. J.39:318. doi: 10.4274/balkanmedj.galenos.2022.2022-7-100,

  • 14

    CorwinC.NikolopoulouA.PanA. L.Nunez-SantosM.VallabhajosulaS.SerranoP.et al. (2018). Prostaglandin D2/J2 signaling pathway in a rat model of neuroinflammation displaying progressive parkinsonian-like pathology: potential novel therapeutic targets. J. Neuroinflammation15, 118. doi: 10.1186/s12974-018-1305-3

  • 15

    DennyJ. C.RutterJ. L.GoldsteinD. B.PhilippakisA.SmollerJ. W.JenkinsG.et al. (2019). The “all of us” research program. N. Engl. J. Med.381, 668676. doi: 10.1056/NEJMsr1809937

  • 16

    DevosD.LejeuneS.Cormier-DequaireF.TahiriK.Charbonnier-BeaupelF.RouaixN.et al. (2014). Dopa-decarboxylase gene polymorphisms affect the motor response to L-dopa in Parkinson's disease. Parkinsonism Relat. Disord.20, 170175. doi: 10.1016/j.parkreldis.2013.10.017,

  • 17

    Escott-PriceV.NallsM. A.MorrisH. R.LubbeS.BriceA.GasserT.et al. (2015). Polygenic risk of P arkinson disease is correlated with disease age at onset. Ann. Neurol.77, 582591. doi: 10.1002/ana.24335,

  • 18

    EuesdenJ.LewisC. M.O’reillyP. F. (2015). PRSice: polygenic risk score software. Bioinformatics31, 14661468. doi: 10.1093/bioinformatics/btu848,

  • 19

    FanL.ShiC.HuX.ZhangZ.ZhengH.LuoH.et al. (2021). Analysis of 12 GWAS-linked loci with Parkinson's disease in the Chinese Han population. Front. Neurol.12:623913. doi: 10.3389/fneur.2021.623913,

  • 20

    FarrowS. L.SchierdingW.GokuladhasS.GolovinaE.FadasonT.CooperA. A.et al. (2022). Establishing gene regulatory networks from Parkinson's disease risk loci. Brain145, 24222435. doi: 10.1093/brain/awac022,

  • 21

    FerrariC. C.TarelliR. (2011). Parkinson′ s disease and systemic inflammation. Parkinson’s Disease2011:436813. doi: 10.4061/2011/436813

  • 22

    FranceschiniA.SzklarczykD.FrankildS.KuhnM.SimonovicM.RothA.et al. (2012). STRING v9. 1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res.41, D808D815. doi: 10.1093/nar/gks1094,

  • 23

    González-MuniesaP.de OliveiraC.Pérez de HerediaF.ThompsonM. P.TrayhurnP. (2011). Fatty acids and hypoxia stimulate the expression and secretion of the adipokine ANGPTL4 (angiopoietin-like protein 4/fasting-induced adipose factor) by human adipocytes. J. Nutrigenet. Nutrigenomics4, 146153. doi: 10.1159/000327774,

  • 24

    GrealishS.JönssonM. E.LiM.KirikD.BjörklundA.ThompsonL. H. (2010). The A9 dopamine neuron component in grafts of ventral mesencephalon is an important determinant for recovery of motor function in a rat model of Parkinson’s disease. Brain133, 482495. doi: 10.1093/brain/awp328,

  • 25

    HällqvistJ.BartlM.DaknaM.SchadeS.GaragnaniP.BacaliniM. G.et al. (2024). Plasma proteomics identify biomarkers predicting Parkinson’s disease up to 7 years before symptom onset. Nat. Commun.15:4759. doi: 10.1038/s41467-024-48961-3,

  • 26

    HanY.TeepleE.ShankaraS.SadeghiM.ZhuC.LiuD.et al. (2020). Genome-wide polygenic risk score identifies individuals at elevated Parkinson’s disease risk. MedRxiv:2020.10. 16.20212944. doi: 10.1101/2020.10.16.20212944

  • 27

    HawkesC. H. (2008). The prodromal phase of sporadic Parkinson's disease: does it exist and if so, how long is it?Mov. Disord.23, 17991807. doi: 10.1002/mds.22242,

  • 28

    HeQ.YuW.WuJ.ChenC.LouZ.ZhangQ.et al. (2013). Intranasal LPS-mediated Parkinson’s model challenges the pathogenesis of nasal cavity and environmental toxins. PLoS One8:e78418. doi: 10.1371/journal.pone.0078418,

  • 29

    JangY.PletnikovaO.TroncosoJ. C.PantelyatA. Y.DawsonT. M.RosenthalL. S.et al. (2023). Mass spectrometry– based proteomics analysis of human substantia nigra from Parkinson's disease patients identifies multiple pathways potentially involved in the disease. Mol. Cell. Proteomics22:100452. doi: 10.1016/j.mcpro.2022.100452

  • 30

    KamathT.AbdulraoufA.BurrisS. J.LangliebJ.GazestaniV.NadafN. M.et al. (2022). Single-cell genomic profiling of human dopamine neurons identifies a population that selectively degenerates in Parkinson’s disease. Nat. Neurosci.25, 588595. doi: 10.1038/s41593-022-01061-1,

  • 31

    KimJ. J.VitaleD.OtaniD. V.LianM. M.HeilbronK.23andMe Research Teamet al. (2024). Multi-ancestry genome-wide association meta-analysis of Parkinson’s disease. Nat. Genet.56, 2736. doi: 10.1038/s41588-023-01584-8,

  • 32

    KwonE. H.TennagelsS.GoldR.GerwertK.BeyerL.TöngesL. (2022). Update on CSF biomarkers in Parkinson’s disease. Biomolecules12:329. doi: 10.3390/biom12020329,

  • 33

    LambertS. A.GilL.JuppS.RitchieS. C.XuY.BunielloA.et al. (2021). The polygenic score catalog as an open database for reproducibility and systematic evaluation. Nat. Genet.53, 420425. doi: 10.1038/s41588-021-00783-5,

  • 34

    LambertsS.MacleodR. (1990). Regulation of prolactin secretion at the level of the lactotroph. Physiol. Rev.70, 279318. doi: 10.1152/physrev.1990.70.2.279,

  • 35

    LiT.LeW.JankovicJ. (2023). Linking the cerebellum to Parkinson disease: an update. Nat. Rev. Neurol.19, 645654. doi: 10.1038/s41582-023-00874-3,

  • 36

    López-RincónG.Pereira-SuárezA. L.Del Toro-ArreolaS.Sánchez-HernándezP. E.Ochoa-ZarzosaA.Muñoz-ValleJ. F.et al. (2013). Lipopolysaccharide induces the expression of an autocrine prolactin loop enhancing inflammatory response in monocytes. J. Inflamm.10, 112. doi: 10.1186/1476-9255-10-24

  • 37

    LundbergS. M.LeeS.-I. (2017). “A unified approach to interpreting model predictions” in Advances in neural information processing systems, (Red Hook, NY, USA: Curran Associates, Inc.) vol. 30.

  • 38

    MaS.-X.LimS. B. (2021). Single-cell RNA sequencing in Parkinson’s disease. Biomedicine9:368. doi: 10.3390/biomedicines9040368,

  • 39

    MahlknechtP.MariniK.WerkmannM.PoeweW.SeppiK. (2022). Prodromal Parkinson's disease: hype or hope for disease-modification trials?Transl. Neurodegener.11:11. doi: 10.1186/s40035-022-00286-1,

  • 40

    MakariousM. B.LeonardH. L.VitaleD.IwakiH.SargentL.DaduA.et al. (2022). Multi-modality machine learning predicting Parkinson’s disease. NPJ Parkinsons Dis.8:35. doi: 10.1038/s41531-022-00288-w,

  • 41

    MandyburG.Parkinsons disease. Mayfieldclinic.com (2018)

  • 42

    MarekK.JenningsD.LaschS.SiderowfA.TannerC.SimuniT.et al. (2011). The Parkinson progression marker initiative (PPMI). Prog. Neurobiol.95, 629635. doi: 10.1016/j.pneurobio.2011.09.005

  • 43

    Martínez-MorenoC. G.ArámburoC. (2020). “Growth hormone (GH) and synaptogenesis” in Vitamins and hormones (Amsterdam, Netherlands: Elsevier), 91123.

  • 44

    McGeerP. L.McGeerE. G. (2004). Inflammation and neurodegeneration in Parkinson's disease. Parkinsonism Relat. Disord.10 Suppl 1, S3S7. doi: 10.1016/j.parkreldis.2004.01.005,

  • 45

    MorrisR.KershawN. J.BabonJ. J. (2018). The molecular details of cytokine signaling via the JAK/STAT pathway. Protein Sci.27, 19842009. doi: 10.1002/pro.3519,

  • 46

    Mouatt-PrigentA.AgidY.HirschE. C. (1994). Does the calcium binding protein calretinin protect dopaminergic neurons against degeneration in Parkinson's disease?Brain Res.668, 6270. doi: 10.1016/0006-8993(94)90511-8,

  • 47

    Movement Disorder Society Task Force on Rating Scales for Parkinson's Disease (2003). The unified Parkinson's disease rating scale (UPDRS): status and recommendations. Mov. Disord.18, 738750. doi: 10.1002/mds.10473

  • 48

    NagaiA.HirataM.KamataniY.MutoK.MatsudaK.KiyoharaY. (2017). Overview of the BioBank Japan project: study design and profile. J. Epidemiol.27, S2S8. doi: 10.1016/j.je.2016.12.005

  • 49

    NallsM. A.BlauwendraatC.VallergaC. L.HeilbronK.Bandres-CigaS.ChangD.et al. (2019). Identification of novel risk loci, causal insights, and heritable risk for Parkinson's disease: a meta-analysis of genome-wide association studies. Lancet Neurol.18, 10911102. doi: 10.1016/S1474-4422(19)30320-5

  • 50

    ObergasteigerJ.FrapportiG.PramstallerP. P.HicksA. A.VoltaM. (2018). A new hypothesis for Parkinson’s disease pathogenesis: GTPase-p38 MAPK signaling and autophagy as convergence points of etiology and genomics. Mol. Neurodegener.13:40. doi: 10.1186/s13024-018-0273-5,

  • 51

    OuyangL.QiuD.FuX.WuA.YangP.YangZ.et al. (2022). Overexpressing HPGDS in adipose-derived mesenchymal stem cells reduces inflammatory state and improves wound healing in type 2 diabetic mice. Stem Cell Res Ther13:395. doi: 10.1186/s13287-022-03082-w,

  • 52

    PaslawskiW.KhosousiS.HertzE.MarkakiI.BoxerA.SvenningssonP. (2023). Large-scale proximity extension assay reveals CSF midkine and DOPA decarboxylase as supportive diagnostic biomarkers for Parkinson’s disease. Transl. Neurodegener.12:42. doi: 10.1186/s40035-023-00374-w,

  • 53

    PedersenC. C.UshakovaA.SkogsethR. E.AlvesG.TysnesO. B.AarslandD.et al. (2023). Inflammatory biomarkers in newly diagnosed patients with Parkinson disease and related neurodegenerative disorders. Neurol. Neuroimmunol. Neuroinflamm.10:e200132. doi: 10.1212/NXI.0000000000200132,

  • 54

    PhaniS.LoikeJ. D.PrzedborskiS. (2012). Neurodegeneration and inflammation in Parkinson's disease. Parkinsonism Relat. Disord.18, S207S209. doi: 10.1016/S1353-8020(11)70064-5,

  • 55

    PoulinJ.-F.ZouJ.Drouin-OuelletJ.KimK. Y.CicchettiF.AwatramaniR. B. (2014). Defining midbrain dopaminergic neuron diversity by single-cell gene expression profiling. Cell Rep.9, 930943. doi: 10.1016/j.celrep.2014.10.008,

  • 56

    RagoneseP.D'AmelioM.CallariG.AielloF.MorganteL.SavettieriG. (2007). Height as a potential indicator of early life events predicting Parkinson's disease: a case-control study. Mov. Disord.22, 22632267. doi: 10.1002/mds.21728,

  • 57

    ReyesF.WinterJ.FaimanC. (1977). Pituitary-ovarian relationships preceding the menopause. I. A cross-sectional study of serum follice-stimulating hormone, luteinizing hormone, prolactin, estradiol, and progesterone levels. Am. J. Obstet. Gynecol.129, 557564. doi: 10.1016/0002-9378(77)90097-7,

  • 58

    RoetersS. J.IyerA.PletikapićG.KoganV.SubramaniamV.WoutersenS. (2017). Evidence for intramolecular antiparallel beta-sheet structure in alpha-synuclein fibrils from a combination of two-dimensional infrared spectroscopy and atomic force microscopy. Sci. Rep.7:41051. doi: 10.1038/srep41051,

  • 59

    RoosD. S.KleinM.DeegD. J. H.DotyR. L.BerendseH. W. (2022). Prevalence of prodromal symptoms of Parkinson's disease in the late middle-aged population. J. Parkinsons Dis.12, 967974. doi: 10.3233/JPD-213007,

  • 60

    RutledgeJ.LehallierB.ZarifkarP.LosadaP. M.Shahid-BesantiM.WesternD.et al. (2024). Comprehensive proteomics of CSF, plasma, and urine identify DDC and other biomarkers of early Parkinson’s disease. Acta Neuropathol.147, 118. doi: 10.1007/s00401-024-02706-0

  • 61

    SaariL.BackmanE. A.WahlstenP.GardbergM.KaasinenV. (2022). Height and nigral neuron density in Parkinson’s disease. BMC Neurol.22:254. doi: 10.1186/s12883-022-02775-2,

  • 62

    SantacruzC. A.VincentJ. L.BaderA.Rincón-GutiérrezL. A.Dominguez-CurellC.CommuniD.et al. (2021). Association of cerebrospinal fluid protein biomarkers with outcomes in patients with traumatic and non-traumatic acute brain injury: systematic review of the literature. Crit. Care25:278. doi: 10.1186/s13054-021-03698-z,

  • 63

    ShulmanL. M. (2002). Is there a connection between estrogen and Parkinson's disease?Parkinsonism Relat. Disord.8, 289295. doi: 10.1016/s1353-8020(02)00014-7,

  • 64

    SpanJ.SpanJ. P. T.PietersG. F. F. M.SweepC. G. J.HermusA. R. M. M.SmalsA. G. H. (2000). Gender difference in insulin-like growth factor I response to growth hormone (GH) treatment in GH-deficient adults: role of sex hormone replacement. J. Clin. Endocrinol. Metab.85, 11211125. doi: 10.1210/jcem.85.3.6463,

  • 65

    StojkovskaI.WagnerB. M.MorrisonB. E. (2015). Parkinson's disease and enhanced inflammatory response. Exp. Biol. Med.240, 13871395. doi: 10.1177/1535370215576313,

  • 66

    SuC.HouY.XuJ.XuZ.ZhouM.KeA.et al. (2024). Identification of Parkinson’s disease PACE subtypes and repurposing treatments through integrative analyses of multimodal data. NPJ Digital Med.7:184. doi: 10.1038/s41746-024-01175-9,

  • 67

    SudlowC.GallacherJ.AllenN.BeralV.BurtonP.DaneshJ.et al. (2015). UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med.12:e1001779. doi: 10.1371/journal.pmed.1001779,

  • 68

    ThapaS.BhusalK. (2023). “Hyperprolactinemia” in StatPearls (Treasure Island, FL, USA: StatPearls Publishing).

  • 69

    TsukitaK.Sakamaki-TsukitaH.KaiserS.ZhangL.MessaM.Serrano-FernandezP.et al. (2022). Changes in the cerebrospinal fluid proteome precede and stratify the course of Parkinson’s disease. medRxiv:2022.06. 08.22276035. doi: 10.1101/2022.06.08.22276035

  • 70

    TunbridgeE. M.NarajosM.HarrisonC. H.BeresfordC.CiprianiA.HarrisonP. J. (2019). Which dopamine polymorphisms are functional? Systematic review and meta-analysis of COMT, DAT, DBH, DDC, DRD1–5, MAOA, MAOB, TH, VMAT1, and VMAT2. Biol. Psychiatry86, 608620. doi: 10.1016/j.biopsych.2019.05.014,

  • 71

    UmT.T.PfisterF. M. J.PichlerD.EndoS.LangM.HircheS.et alData augmentation of wearable sensor data for parkinson’s disease monitoring using convolutional neural networks, in Proceedings of the 19th ACM International Conference on Multimodal Interaction. (2017), Association for Computing Machinery: Glasgow, UK. p. 216220.

  • 72

    WilliamsC. T.FostN. (1992). Ethical considerations surrounding first time procedures: a study and analysis of patient attitudes toward spinal taps by students. Kennedy Inst. Ethics J.2, 217231. doi: 10.1353/ken.0.0108,

  • 73

    WinchesterL.BarberI.LawtonM.AshJ.LiuB.EvettsS.et al. (2022). Identification of a possible proteomic biomarker in Parkinson’s disease: discovery and replication in blood, brain and cerebrospinal fluid. Brain Commun.5:fcac343. doi: 10.1093/braincomms/fcac343,

  • 74

    WuD.XuJ. (2020). “On the optimal weighted $\ell_2 $ regularization in overparameterized linear regression” in Advances in neural information processing systems, (Red Hook, NY, USA: Curran Associates, Inc.) vol. 33, 1011210123.

  • 75

    YangW.HamiltonJ. L.KopilC.BeckJ. C.TannerC. M.AlbinR. L.et al. (2020). Current and projected future economic burden of Parkinson’s disease in the US. NPJ Parkinsons Dis.6:15. doi: 10.1038/s41531-020-0117-1

  • 76

    ZannadF.FerreiraJ. P.ButlerJ.FilippatosG.JanuzziJ. L.SuminM.et al. (2022). Effect of empagliflozin on circulating proteomics in heart failure: mechanistic insights into the EMPEROR programme. Eur. Heart J.43, 49915002. doi: 10.1093/eurheartj/ehac495,

  • 77

    ZhangT.HanX.ZhangH.LiX.ZhouX.FengS.et al. (2024). Identification of molecular markers for predicting the severity of heart failure after AMI: an Olink precision proteomic study. Clin. Chim. Acta555:117825. doi: 10.1016/j.cca.2024.117825,

  • 78

    ZhaoJ.ZhangS.WangX.WangM.WangZ.LiR.et al. (2024). Correlation between serum ANGPTL4 levels and white matter hyperintensity and cognitive impairment in patients with cerebral small vessel disease. Brain Behav.14:e3401. doi: 10.1002/brb3.3401,

Summary

Keywords

biomarkers, deep learning, neurodegenerative disease, Parkinson’s disease, proteomics

Citation

Chaudhry F, Kim TW, Elemento O and Betel D (2026) Machine learning analysis of population-wide plasma proteins identifies hormonal biomarkers of Parkinson’s disease. Front. Aging Neurosci. 18:1730550. doi: 10.3389/fnagi.2026.1730550

Received

22 October 2025

Revised

10 February 2026

Accepted

10 February 2026

Published

10 March 2026

Volume

18 - 2026

Edited by

Junhong Zhou, Harvard Medical School, United States

Reviewed by

Lucas Caldi Gomes, NanoTemper Technologies GmbH, Germany

Sharma Pratibha, University of Delhi, India

Updates

Copyright

*Correspondence: Doron Betel,

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics