Variable Effects of PD-Risk Associated SNPs and Variants in Parkinsonism-Associated Genes on Disease Phenotype in a Community-Based Cohort

Genetic risk factors for Parkinson's disease (PD) risk and progression have been identified from genome-wide association studies (GWAS), as well as studies of familial forms of PD, implicating common variants at more than 90 loci and pathogenic or likely pathogenic variants at 16 loci. With the goal of understanding whether genetic variants at these PD-risk loci/genes differentially contribute to individual clinical phenotypic characteristics of PD, we used structured clinical documentation tools within the electronic medical record in an effort to provide a standardized and detailed clinical phenotypic characterization at the point of care in a cohort of 856 PD patients. We analyzed common SNPs identified in previous GWAS studies, as well as low-frequency and rare variants at parkinsonism-associated genes in the MDSgene database for their association with individual clinical characteristics and test scores at baseline assessment in our community-based PD patient cohort: age at onset, disease duration, Unified Parkinson's Disease Rating Scale I-VI, cognitive status, initial and baseline motor and non-motor symptoms, complications of levodopa therapy, comorbidities and family history of neurological disease with one or more than one affected family members. We find that in most cases an individual common PD-risk SNP identified in GWAS is associated with only a single clinical feature or test score, while gene-level tests assessing low-frequency and rare variants reveal genes associated in either a unique or partially overlapping manner with the different clinical features and test scores. Protein-protein interaction network analysis of the identified genes reveals that while some of these genes are members of already identified protein networks others are not. These findings indicate that genetic risk factors for PD differentially affect the phenotypic presentation and that genes associated with PD risk are also differentially associated with individual disease phenotypic characteristics at baseline. These findings raise the intriguing possibility that different SNPs/gene effects impact discrete phenotypic characteristics. Furthermore, they support the hypothesis that different gene and protein-protein interaction networks that underlie PD risk, the PD phenotype, and the neurodegenerative process leading to the disease phenotype, and point to the significance of the genetic background on disease phenotype.

Genetic risk factors for Parkinson's disease (PD) risk and progression have been identified from genome-wide association studies (GWAS), as well as studies of familial forms of PD, implicating common variants at more than 90 loci and pathogenic or likely pathogenic variants at 16 loci. With the goal of understanding whether genetic variants at these PD-risk loci/genes differentially contribute to individual clinical phenotypic characteristics of PD, we used structured clinical documentation tools within the electronic medical record in an effort to provide a standardized and detailed clinical phenotypic characterization at the point of care in a cohort of 856 PD patients. We analyzed common SNPs identified in previous GWAS studies, as well as low-frequency and rare variants at parkinsonism-associated genes in the MDSgene database for their association with individual clinical characteristics and test scores at baseline assessment in our community-based PD patient cohort: age at onset, disease duration, Unified Parkinson's Disease Rating Scale I-VI, cognitive status, initial and baseline motor and non-motor symptoms, complications of levodopa therapy, comorbidities and family history of neurological disease with one or more than one affected family members. We find that in most cases an individual common PD-risk SNP identified in GWAS is associated with only a single clinical feature or test score, while gene-level tests assessing low-frequency and rare variants reveal genes associated in either a unique or partially overlapping manner with the different clinical features and test scores. Protein-protein interaction network analysis of the identified genes reveals that while some of these genes are members of already identified protein networks others are not. These findings indicate that genetic risk factors for PD differentially affect the phenotypic presentation and that genes associated with PD risk are also differentially associated with individual disease phenotypic characteristics at baseline. These findings raise the intriguing possibility that different SNPs/gene effects impact discrete phenotypic characteristics. Furthermore, they support the hypothesis that different gene and protein-protein interaction networks that underlie PD risk, the PD phenotype, and the neurodegenerative process leading to the disease phenotype, and point to the significance of the genetic background on disease phenotype.

INTRODUCTION
Parkinson's disease (PD), the second most common neurodegenerative disease, has an insidious onset and a long pre-symptomatic and symptomatic course. Four cardinal features that include resting tremor, bradykinesia, rigidity, and postural instability define the motor aspects of the disease. The constellation of clinical symptoms however is variable both in terms of symptom combination and temporal profile. This variability has led to phenotypic classification according to different disease characteristics. A commonly accepted classification is based on motor symptoms: disease subtypes include a tremor-predominant, akinetic/rigid, and mixed subtype (1). More recently, additional classifications have emerged based on different clinical features such as non-motor features, disease progression, a combination of motor and non-motor features, combination of clinical features and comorbidities, multimodal imaging and genetic burden. More specifically, Sauerbier et al. (2) in their review proposed the existence of a distinct non-motor subtype (NMS) of NMS-dominant PD based on the burden of non-motor symptoms in early PD including cognitive dysfunction, anosmia, anxiety, depression, sleep disorders, and autonomic dysfunction observed either alone or in varying combinations. Simuni et al. (3) reported that, for the Primary Progression Markers Initiative (PPMI) PD cohort, higher baseline non-motor scores were associated with female sex and a more severe motor phenotype. Longitudinal increase in non-motor score severity was associated with older age and lower CSF aβ1-42 at baseline. Lawton et al. (4) identified four phenotypic clusters in their cohort: (1) fast motor progression, (2) mild motor and non-motor disease, (3) severe motor disease, poor psychological well-being and poor sleep with intermediate motor progression, and (4) slow motor progression with tremordominant unilateral disease. Mollenhauer et al. (5) in their analysis of the De Novo Parkinson (DeNOPA) cohort, reported that baseline predictors of worse progression of motor symptoms included male sex, orthostatic blood pressure drop, diagnosis of coronary artery disease, arterial hypertension, elevated serum uric acid, and CSF neurofilament light chain.
A variable temporal profile of motor symptom appearance and progression has been reported in different cohorts that have been followed longitudinally for different lengths of time and identified predictors of disease progression and phenotypic clusters. In the DeNOPA cohort, predictors of cognitive decline in PD included previous heavy alcohol abuse, current diagnoses of diabetes mellitus, arterial hypertension, elevated periodic limb movement index during sleep, decreased hippocampal volume by MRI, and higher baseline levels of uric acid, C-reactive protein, high density lipoprotein (HDL) cholesterol, and glucose. In their cohort, risk markers for faster disease progression included cardiovascular risk factors, deregulated blood glucose, uric acid metabolism and inflammation. In the PPMI cohort, Aleksovski et al. (6) reported that the postural instability gait disorder (PIGD) subtype, compared to the tremor-predominant subtype, was characterized by more severe disease manifestations at diagnosis, greater cognitive progression, and more frequent psychosis (5). In the PPMI cohort, Latourelle et al. (7) found that higher baseline MDS-UPDRS motor score, male sex, and increased age, as well as a novel Parkinson's disease-specific epistatic interaction, were indicative of faster motor progression. In their retrospective review of a cohort of 100 autopsy confirmed PD cases, Pablo-Fernandez at al. (8) reported that the presence of autonomic dysfunction defined as autonomic failure on autonomic testing or the presence of at least two symptoms such as urinary symptoms, constipation, orthostatic hypotension, or sweating abnormalities was associated with a more rapid progression and shorter survival.
Other classifications of disease subtypes have been proposed in addition to motor, non-motor symptom and disease coursebased classifications. Inguanzo et al. (9) employed a radiomics and hybrid machine learning approach to identify mild, intermediate and severe disease subtypes based on a combination of dopaminergic deficit by imaging and escalating motor and non-motor manifestations.
In the last two decades, genome-wide association studies (GWAS) of common genetic variants and dissection of the low frequency and rare variants contributing to familial forms of PD has implicated an increasing number of genetic loci in disease risk and severity. This has cemented the view that PD is a complex and heterogeneous genetic disorder, with variants at many genes impacting disease phenotype and course. We are just beginning to understand whether PD-risk variants are differentially associated with baseline features or disease subtype. Tan et al. (10) performed a GWAS of motor and cognitive progression in PD and reported that ATPBB2, a phospholipid transporter related to vesicle formation, is associated with motor progression, and that variants at APOE drive cognitive progression, whereas there was no overlap of variants associated with PD risk and PD age-at-onset with disease progression. Iwaki et al. (11) demonstrated sex-specific SNP associations with features of the PD phenotype: female patients had a higher risk of developing dyskinesias and a lower risk of developing cognitive impairment. Periñán et al. (12) reported an association of the TT genotype at the PICALM SNP rs3851179 with a decreased risk of cognitive impairment in PD. GBA variants have been associated with PD and generally are associated with faster progression and more severe phenotypes (13,14). Blauwendraat et al. (15) reported that in a large PD patient cohort, GBA risk variants decrease age at onset in PD.
Genetic factors that increase the risk of PD and genetic factors that affect disease severity and progression are not necessarily identical. Furthermore, individual genetic factors that influence disease severity and progression may not have an immediately identifiable impact in the clinical practice setting. It is therefore important to consider the predictive ability and significance of the impact of genetic variation on individual phenotypic characteristics and parameters that are clinically relevant and may have treatment implications (16)(17)(18). If one or a set of genetic variants contribute differentially to a particular phenotypic characteristic, it will be challenging to discover them using GWAS or gene-level association tests in a genome-wide screen since phenotypically well-characterized cohorts are typically modest in size, making it unlikely to discover genome-wide significant associations. We have therefore taken a focused approach, choosing to evaluate possible associations with SNPs that have been previously demonstrated to show significant associations with PD using large GWAS and low frequency and rare variants at parkinsonism-associated genes identified in the MDSgene database (19), hypothesizing that these genetic variants may differentially contribute to baseline clinical parameters/symptoms. Under this hypothesis, evaluating their association in a smaller cohort of subjects where individual clinical symptoms and objective test scores are obtained at baseline using structured clinical documentation support (SCDS) tools embedded in the electronic medical record (EMR) in a routine clinical practice setting (20) could allow for the discovery of significant associations. This would not be possible in the context of a case-control GWAS.
Indeed, we find that common SNPs from PD-risk genes identified in GWAS are individually associated with a range of clinical features: family history of dementia, the presence of hallucinations, bradykinesia, depression, orthostatism, disease subtype, and complications of levodopa therapy. When lowfrequency and rare variants at PD-risk genes and parkinsonismassociated genes are analyzed in gene-level tests, associations with clinical characteristics such as presence of bradykinesia, depression, autonomic symptoms (orthostatism, constipation) UPDRS motor scores, mentation, complications of therapy scores, H&Y stage, and a family history of dementia are identified. All of the associations we report survive Bonferroni correction and some approach or reach genome-wide significance. It is interesting to note that the gene associations identified from the analysis of individual common SNPs do not always overlap with those identified in gene-level tests using low-frequency and rare variants suggesting an important role of the genetic background on the phenotypic manifestations.

Subjects and Clinical Information
Eight hundred and fifty-six subjects with clinically definite or clinically probably Parkinson's disease (Bower criteria) (21) enrolled in two previously described patient cohorts [Molecular Epidemiology of Parkinson's Disease, MEPD (22), N = 201; DodoNA (23), N = 655] were included in this study. All patients in these cohorts had a diagnosis of PD at study entry and were residents of Cook and Lake Counties in Illinois, USA. Though both cohorts include individuals with diverse ancestries, the filtering described in the following section restricted the analysis to 786 individuals of European ancestry: 504 males, 282 females. Blood samples were collected in the majority of cases at an initial baseline visit or within a 3-month window following the initial visit. Data on clinical parameters were obtained from SCDS developed to standardize clinical assessment and retained within the EMR as described (20,23). Given the community-based practice setting, our cohort included both de novo and previously diagnosed PD patients.
The following phenotypic characteristics were analyzed in our cohort: initial motor and non-motor symptoms as reported by the patient, as well as motor and non-motor symptoms identified by the clinician at their baseline encounter. Objective clinical assessment at the baseline encounter included scores on the Mini-mental Status Evaluation (MMSE) / Montreal Cognitive Assessment (MoCA) or Short Test of Mental Status (STMS) (24)(25)(26). Due to copyright limitations, cognitive status was assessed initially using the MMSE, at a later timepoint the MoCA, and finally the STMS. The individual test scores on the MoCA and STMS were converted to MMSE scores using established normograms prior to analysis (26,27). Objective clinical assessments at the baseline encounter also included scores on the Unified Parkinson's Disease Rating Scale (UPDRS) (28) [I -Mentation, Behavior and Mood; II -Activities of Daily Living; III -Motor Examination; IV -Complications of Therapy; V -Hoehn &Yahr stage; VI -Schwab & England Activities of Daily Living Scale], Epworth sleepiness scale (ESS) (29) and Geriatric Depression scale (GDS) (30), information on family history of PD, dementia, stroke, epilepsy, multiple sclerosis, and neuropathy, as well as information on comorbidities including diabetes, cardiovascular disease, migraine, schizophrenia, anxiety, depression, peripheral neuropathy and sleep apnea. Supplementary Table 1 presents the list of clinical parameters and descriptive statistics for these parameters. Treatment details including medical and surgical therapy were collected but not included in the analysis presented here.

Genotyping and Quality Control Measures
Blood samples were stored at −80 • C until DNA was extracted. Genotypes were obtained by interrogating an Affymetrix Axiom TM genome-wide human array containing 531,674 variants that included custom content, specifically variants at genes associated with PD and other neurological disorders. Prior to imputation using IMPUTE2 (31) against the 1,000 Genomes Phase 3 CEU genome, subjects were filtered in PLINK 1.07 (32) or 1.9 (33) for low overall genotyping rates (<95%) and sexdiscordance, and variants with >5% missing calls were removed from the analysis. Imputed SNPs were retained only if R 2 ≥ 0.90. Only subjects with European ancestry were retained by using principal components one and two (PC1 and PC2) from a principal components analysis (PCA) with 103 ancestry informative markers (AIMs). For association tests with single variants, variants were also filtered by Hardy-Weinberg test statistic (1 × 10 −4 ) and to have a minor allele frequency (MAF) > 1%.

Association Tests
Genes and variants initially identified for testing association with clinical parameters were selected based on a prior demonstrated association with PD/parkinsonism or disease progression [MDSgene.org; (10)(11)(12)(34)(35)(36)(37)(38)], or because the gene harbors pathogenic variants that cause PD/parkinsonism [for review, see (39, 40)]. Of 168 variants with a previously reported association and a MAF > 1%, 138 variants (Supplementary Table 2) were present in our data after filtering as described above. These were tested using PLINK for association applying logistic regression for binomial variables if at least 3% of subjects (N = 24) displayed the clinical parameter, or linear regression with standardized (mean 0, standard deviation 1) scaled variables and reverse scoring the MMSE so that worse scores indicate poorer performance. Associations were evaluated for both sexes jointly and for each sex separately. Sex, age-at-encounter and, since our community-based cohort includes both de novo and previously diagnosed patients, years-from-diagnosis were included as covariates for associations evaluated in both sexes, age-at-encounter, and years-from-diagnosis as covariates for associations evaluated in just one sex, and years-of-education added as an additional covariate for tests of association with cognitive measures (MMSE).

Protein-Protein Interaction Network Evaluation
To evaluate whether the genes whose variants exhibited significant associations with clinical parameters identify protein products that are members of a functional protein-protein interaction network, those genes were entered into the Search Tool for the Retrieval of Interacting Genes, STRING, v.11 (43).

RESULTS
We hypothesized that SNPs which have been previously demonstrated to show significant associations with PD-risk using large GWAS and low frequency and rare variants at parkinsonism-associated genes identified in the MDSgene database (19) differentially contribute to discrete baseline clinical parameters/symptoms. To test this hypothesis, we evaluated their association in two well-characterized patient cohorts [MEPD (20) and DodoNA (23)] where individual clinical symptoms and objective test scores were obtained at baseline using SCDS tools embedded in the EMR. The findings are presented in the following two sections.

Single SNP Association Analyses
We initially evaluated whether common SNPs that have been previously associated with PD-risk in large GWAS are also associated with distinct binomial clinical phenotypic features of PD at their baseline presentation. We find significant associations that are at times sex-specific, and that the significant SNPs are typically located in non-overlapping genes/regions ( Table 1).
Using an additive model, female PD patients carrying the minor allele (T) at SNP rs429358 at APOE, or having an APOE ε4 allele have an ∼8-fold increased risk of having a positive family history of more than one family member with dementia. Individuals with the minor allele (T) at SNP rs3431186 at TMEM175, which encodes a potassium channel that regulates lysosomal membrane potential and pH stability in neurons (44), are about twice as likely to have reduced arm swing, a manifestation of bradykinesia. Male PD patients with the minor allele (T) at SNP rs5396167 in KPNA1, which encodes importin α5 and is involved in lysosomal biogenesis and autophagy (45,46), have a 2.8-fold increased risk to have hallucinations at baseline. Males with the minor (T) allele at SNP rs12528068 108.6 kb from the RIMS1 gene, which encodes one of four isoforms of presynaptic scaffolding proteins involved in synaptic transmission (47), have a 2.1-fold increased risk of a history of essential tremor. Individuals carrying the minor allele (G) at SNP rs186798 in ELOVL7 have a 3.8-fold increased risk to also have a prior diagnosis of peripheral neuropathy. The ELOVL7 gene is a PD risk factor that also confers regional vulnerability, i.e., it is a Braak stage-related gene with an altered expression pattern in the brains of PD cases, with down regulated expression in endothelial cells and oligodendrocytes (48) ( Table 1).
Additional associations are identified using the GENO-2DF model, which considers both additive and dominance effects. SNP rs2280194 in BIN3 and rs10253857 in an intergenic region near SNX13 are associated with a family history of dementia. SNP rs2074404 in WNT3 is associated with a family history of stroke. SNP rs2694528 in NDUFAF2, which is near ELOVL7, is associated with the presence of neuropathy. SNPs rs8192591 in NOTCH4 and rs1293298 in CTSB are associated with bradykinesia as an initial motor symptom. SNPs rs117615688 in CRHR1 and rs382940 in SLC44A1 are associated the nonmotor symptoms of insomnia and restless leg syndrome (RLS), respectively ( Table 1).
We also identified significant associations between common SNPs conferring risk of PD in GWAS and test scores that reflect an objective assessment of the PD patient ( Table 2). The minor allele (T) at SNP rs12528068 in an intergenic region 108.6 kb from RIMS1 that is associated with a history of essential tremor in males is also associated with increased dyskinesia scores in females. SNPs rs113343 and rs6497339 at SYT17, which encodes synaptotagmin-17, are associated with higher GDS scores. SNP rs12283611 at DLG2, which functions in the clustering of receptors, ion channels and associated signaling proteins, is associated with lower UPDRS-VI scores.
We included years-from-diagnosis as a covariate in the above analyses since our community-based cohort includes previously diagnosed patients. It is interesting that some results that trended toward significance survive Bonferonni correction if this measure of disease duration is not included as a covariate (Supplementary Tables 4, 5). Individuals with the minor (A) allele at the SNP rs9468199 in an intergenic region 3.2 kilobases (kb) from LOC1005071, an uncharacterized non-coding RNA, are twice more likely to present with the tremor-predominant PD subtype and not the akinetic/rigid or mixed disease subtype. The minor (C) allele at SNP rs12813102 in GPR19, which encodes a proton-sensing G-protein coupled receptor abundant in skin and brain (49), has a relatively strong effect on higher H&Y stage (β ∼1.7 on standardized H&Y scores) in both sexes or just males.
In contrast, other SNPs have less strong effect sizes (β range 0.24-0.47 on standardized scores). The presence of the minor allele (C) at SNP rs823118 in NUCKS1, which is involved in homologous recombination DNA repair (50), is associated with higher MMSE baseline scores only in males. Its small effect is not unexpected given that early in the disease process, cognitive impairment is not prominent in typical PD. Finally, the minor allele (T) at SNP rs224750 located 167.5 kb from PARD3 is associated with higher UPDRS-IVc scores only in females. PARD3 is a gene involved in the regulation of cellular junction formation in ependymal cells, cilia, tumor suppression (49). It will be useful to evaluate these variants in longitudinal follow-up studies.
In summary, these results collectively demonstrate that some of the PD-risk SNPs identified in case-control GWAS are also associated with the differential presentation of PD and discrete phenotypic characteristics at baseline.

Gene-Level Association Analysis
We employed gene-level association tests (sequence kernel association tests) to evaluate whether the set of rare (MAF < 1%) or both rare and less common (MAF < 5%) variants present in the PD-associated genes of our cohorts also exert differential effects on baseline clinical features. Significant findings from these gene-level association tests in our cohorts are presented in Table 3. The following findings are notable: LRRK2 is associated with a prior diagnosis of essential tremor (ET). NUCKS1, a gene that shows allele-specific gene expression in the human brain (51), is significantly associated with UPDRS-III motor scores and with UPDRS-V (H&Y stage). Of note, the PD-risk SNP rs823118 in the same gene was associated with higher MMSE scores in males when disease duration was not included as a covariate (Supplementary Table 5). TOX3, a transcriptional co-activator (52) previously associated with periodic leg movements during sleep (53), and SULT1C2, a cytosolic sulfotransferase (54), are associated with the UPDRS-IV total score. TRIM40, a gene whose protein product may function as a E3 ubiquitin-protein ligase (55) and inhibit NF-kB activity, is associated with UPDRS-VI (Schwab & England score). SNCA and FAM184A are associated with dyskinesias at baseline encounter, and SNCA is also associated with cognitive impairment. CHD9, which encodes a transcriptional activator (56) and GPNMB, which encodes a transmembrane glycoprotein (57), are associated with the initial motor symptoms of micrographia (bradykinesia manifestation) and rigidity, respectively. GPNMB, demonstrating genome-wide significance, is also associated with bradykinesia at baseline, as are THSD4, which attenuates TGFβ signaling, and MCCC1, which is used in NFκB signaling (58). STK39, which encodes a protein kinase that may mediate stress-activated signals (51), is associated with RLS.
Certain comorbid conditions often seen in PD patients are associated with the different genes. BIN3, which encodes a protein involved in cytokinesis (59) is associated with anxiety disorder. VAMP4 (60) is associated with sleep apnea. PET117, which encodes a mitochondrial protein homolog (61), is associated with traumatic brain injury.
Similar results are obtained when sequence kernel association tests are performed without including sex, age at encounter, disease duration and, for MMSE, years of education, as covariates (Supplementary Table 6). In these analyses, different measures of complications of levodopa therapy are associated with some of the genes described above: TOX3 and SULT1C2 are associated with the UPDRS-IVa-Dyskinesia subscore; SULT1C2, MCCC1, TOX3, and BAG3 are associated with the UPDRS-IVb-Fluctuations subscore; and STBD1 is associated with the UPDRS-IVc-Other subscore.
In summary, these results demonstrate that variants in PDassociated genes are differentially associated with the following phenotypic features: history of essential tremor, initial motor and non-motor symptoms, test scores, motor and non-motor symptoms at baseline study entry, family history of essential tremor and of dementia, and comorbidities including anxiety, sleep apnea, and traumatic brain injury (TBI). These associations raise the possibility of underlying links between PD, essential tremor, mood disorders, and TBI.

Protein-Protein Interaction Network Analysis
The protein products of the genes included in this analysis s are involved in many different cellular processes implicated in neurodegeneration. To assess whether the significant associations between SNPs/genes with baseline clinical parameters identified here reflect functional interactions between the genes, we entered all of the genes identified as having significant associations with a phenotypic feature (i.e., all genes listed in Tables 1-3, Supplementary Tables 4-6) into the Search Tool for the Retrieval of Interacting Genes (STRING) and evaluated their participation in protein-protein interaction (PPI) networks. The network shown in Figure 1 was obtained using 0.4 as the minimum required interaction score (medium confidence) and allowed up to 20 second-shell interactions to reveal indirect interactions among these proteins. The network contains 69 nodes and 113 edges (cf. 46 expected) with an average node degree of 3.28 and an enrichment p-value of 1.10 × 10 −16 . Sixteen nodes are unconnected to the protein-interaction network.
The top 30 Gene Ontology (GO) processes in which these genes and their interactors are implicated are shown in Table 4, with the genes having significant SNP or gene-level associations highlighted in bold. It is interesting to note that this analysis reveals three interaction patterns: one in which proteins encoded by genes such as APOE, KPNA1, LRRK2, TMEM175, MCCC1, FAM49B, and SNCA are members of closely interacting networks, a second one in which genes such as PARD3 or NUCKS1 are members of more remotely interacting networks, and a third one in which genes such as ELOVL7, GPR19, LCORL, FAM184A, and BIN3 are not nodes in these protein-interaction networks.
Several genes occupy central nodes in the protein network: APOE occupies a central node in the protein network and in our analysis is associated with the family history of dementia. APOE is a well-established AD risk factor (62) with an important role in normal brain function (63) and the APOE e4 allele has been associated with cognitive decline in PD (10,64,65). LRRK2 is also occupying a central node in the protein network: LRRK2 has a dual role as a PD risk factor and a gene involved in PD pathogenesis (66,67) and encodes a protein kinase involved in autophagy. SNCA also occupies a central node in the PPI and is a key player in PD pathogenesis (68).
Taken together with the results of the association analyses, these results are consistent with the hypothesis that genetic variation that affects the functioning of protein-protein interaction networks can contribute to the differential presentation of PD symptoms. In addition, it is important to note that a number of these genes are members of known networks and hubs, whereas others are not.

DISCUSSION
Here we present the results of association analyses of baseline clinical features in PD with genetic variants that have been shown to be significant in prior case-control GWAS to confer PD risk or have been identified as PD-associated genes in the MDSgene database. We analyzed discrete clinical phenotypic features and test scores in a two-pronged approach: in the first, we evaluated their association with individual common SNPs that have been demonstrated in case-control GWAS to confer PD-risk; in the second we used gene-level tests to evaluate the association of these phenotypic features with low frequency (1-5% MAF) and rare (<1% MAF) variants in both pathogenic PD genes and the genes conferring PD-risk identified by casecontrol GWAS. The rationale of this approach is based on the  hypothesis that individual discrete phenotypic characteristics may be differentially affected by the action of individual SNPs that tag a particular PD-risk haplotype, and/or multiple variants at a particular gene. Furthermore, the observed associations may reflect the effects of variants with different MAF. The alternative to this hypothesis is that the single SNPs and variants within a gene that confer PD-risk affect groups of clinical features or test scores more uniformly.
Our results support this hypothesis: individual common SNPs conferring PD risk are associated with phenotypic traits mostly in a non-overlapping manner, and gene-level tests reveal associations with individual clinical features and test scores that are often differentially affected, though at times have overlapping effects. This raises the intriguing possibility that individual phenotypic characteristics of a neurodegenerative disease such as PD that are associated with a specific gene may be related with the same phenotypic characteristic in a different neurodegenerative disease/syndrome. This may allow for the development of a "polyphenic" risk score to complement polygenic composite risk scores that already have been developed for Alzheimer's disease and other diseases (69).
It is interesting to point out certain associations that may hint to pathogenetic links between PD and other disorders. The relationship between PD and ET has long been a matter of debate (70). In our cohort, gene-based tests reveal an association between LRRK2 and history of essential tremor. This finding suggests that genetic variation at LRRK2 may provide a link between long-standing ET and the development of PD at least  in some cohorts. The presence of neuropathy in our cohorts is associated with variants in the ELOVL7 and NDUFAF2 genes that are located in the same region on chromosome 5. Clinically, peripheral neuropathy has been reported in PD, however, its cause remains unclear, potentially reflecting medication adverse effects (71). Another striking association in our cohort is that of SNCA with cognitive impairment. The role of common variants at SNCA as PD risk factors, as well as rare gene variants as pathogenic mutations has been clearly demonstrated over the last two decades. Our findings suggest that multiple, less common variants at SNCA, not necessarily pathogenic variants, may affect cognition in PD patients.
The reported prevalence and incidence estimates in PD show a 1.5:1 male to female ratio (72). Here we find that sex often differentially affects an association with a particular phenotypic trait, either in the form of a symptom or a test score: some of the associations are significant for males or females, whereas others in both sexes. This suggests that sex may have a differential effect on the phenotypic manifestation of genetic PD risk.
As would be expected from our current understanding of the genetic mechanisms underlying PD, protein-protein interaction network analysis demonstrates that about two-thirds of the genes with significant associations are members of previously identified networks. However, about a third of the genes appear unconnected to these networks. This raises the interesting possibility that as yet unidentified gene networks and connections may be implicated in phenotypic manifestations, in either a deleterious or protective role.
It is important to stress that the analyses presented here are based on patient-reported initial symptoms and symptoms at baseline encounter, as well as objective test scores determined at the baseline encounter. Longitudinal evaluation of this and other cohorts through a standardized assessment at annual intervals will enable the extension of this analysis to determine whether the impact of the genotypes on the clinical phenotype and test scores is among other factors dependent on disease subtype, severity and duration. It also will be informative to undertake additional analyses that cluster individual symptoms and analyze their associations with genetic risk factors.
One limitation to our study is the inclusion of both de novo and previously diagnosed patients. Therefore, our cohort is likely more heterogeneous than an exclusively de novo cohort such as the PPMI cohort. However, given that the study participation originates in a community-based cohort, it is likely more representative of the phenotypic spectrum that is typically observed in clinician practices. Furthermore, the PD diagnosis in our cohort according to published diagnostic criteria (21) is ascertained at the baseline visit and can also be reliably ascertained at annual intervals using the EMR-based SCDS, thus providing high clinical diagnostic accuracy. In addition, the use of SCDS allows for detailed and accurate clinical data collection in a routine clinical practice, thus more accurately reflecting the clinical course.
A second limitation of this study is that the sample size of our cohort limits its power to detect associations. While none of the associations with common PD-risk SNPs reach genome-wide significance (∼5 × 10 −8 ) ( Tables 1, 2,  Supplementary Tables 4, 5), gene-level tests using rare variants identify four associations with baseline clinical features that approach or reach significance for the number of mapped genes (2.81 × 10 −6 ): TOX3 and SULT1C2 with UPDRS IV-total score, GPNMB with bradykinesia, CATSPER3 (73) with anxiety ( Table 3, Supplementary Table 6). It is important to point out in this context that the genes included in this analysis have been previously clearly associated with PD-risk in case-control GWAS. Nevertheless, given the size of our cohort, it will be informative to evaluate the reproducibility of our findings in other cohorts.
In summary, our analysis shows that common SNPs conferring PD-risk, as well as low-frequency and rare variants in genes implicated in PD/parkinsonism are associated with distinct phenotypic characteristics at baseline presentation in our PD cohorts, supporting the hypothesis that the genetic background significantly affects disease presentation and raising the possibility that it also affects disease course and severity. The associations observed are often, but not always, dependent on sex. It is conceivable that this is related to the observed PD prevalence and incidence estimates that point to PD-risk differences based on sex. Finally, this analysis identifies different patterns in protein interaction networks that may underlie disease phenotype and pathogenesis. Longitudinal studies of this and other PD cohorts using this approach can provide insights on the impact of genetic risk factors on disease severity and progression, and enhance our understanding of the underlying pathogenetic mechanisms contributing to PD.

DATA AVAILABILITY STATEMENT
The datasets presented in this article are not readily available because this would jeopardize patient confidentiality. Additional information can be made available to qualified researchers after completing a material transfer agreement that maintains patient confidentiality with NorthShore University HealthSystem.
Requests to access the datasets should be directed to the corresponding author.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by NorthsShore University HealthSystem Institutional Review Board. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
KM designed the study and wrote the manuscript. BC performed data analysis and contributed to the writing. KM, DM, and RF designed clinical instruments used in the study. KM, DM, APP, BS, and NK provided the clinical assessment. AP, LG, and RV provided research assistance. JW, AE, and HY processed genomic and clinical data. KM, BC, RF, and DM edited the manuscript. All authors contributed to the article and approved the submitted version.