Whole Exome Sequencing Study of Parkinson Disease and Related Endophenotypes in the Italian Population

Parkinson Disease (PD) is a complex neurodegenerative disorder characterized by large genetic heterogeneity and missing heritability. Since the genetic background of PD can partly vary among ethnicities and neurological scales have been scarcely investigated in a PD setting, we performed an exploratory Whole Exome Sequencing (WES) analysis of 123 PD patients from mainland Italy, investigating scales assessing motor (UPDRS), cognitive (MoCA), and other non-motor symptoms (NMS). We performed variant prioritization, followed by targeted association testing of prioritized variants in 446 PD cases and 211 controls. Then we ran Exome-Wide Association Scans (EWAS) within sequenced PD cases (N = 113), testing both motor and non-motor PD endophenotypes, as well as their associations with Polygenic Risk Scores (PRS) influencing brain subcortical volumes. We identified a variant associated with PD, rs201330591 in GTF2H2 (5q13; alternative T allele: OR [CI] = 8.16[1.08; 61.52], FDR = 0.048), which was not replicated in an independent cohort of European ancestry (1,148 PD cases, 503 controls). In the EWAS, polygenic analyses revealed statistically significant multivariable associations of amygdala- [β(SE) = −0.039(0.013); FDR = 0.039] and caudate-PRS [0.043(0.013); 0.028] with motor symptoms. All subcortical PRSs in a multivariable model notably increased the variance explained in motor (adjusted-R2 = 38.6%), cognitive (32.2%) and other non-motor symptoms (28.9%), compared to baseline models (~20%). Although, the small sample size warrants further replications, these findings suggest shared genetic architecture between PD symptoms and subcortical structures, and provide interesting clues on PD genetic and neuroimaging features.

Parkinson Disease (PD) is a complex neurodegenerative disorder characterized by large genetic heterogeneity and missing heritability. Since the genetic background of PD can partly vary among ethnicities and neurological scales have been scarcely investigated in a PD setting, we performed an exploratory Whole Exome Sequencing (WES) analysis of 123 PD patients from mainland Italy, investigating scales assessing motor (UPDRS), cognitive (MoCA), and other non-motor symptoms (NMS). We performed variant prioritization, followed by targeted association testing of prioritized variants in 446 PD cases and 211 controls. Then we ran Exome-Wide Association Scans (EWAS) within sequenced PD cases (N = 113), testing both motor and non-motor PD endophenotypes, as well as their associations with Polygenic Risk Scores (PRS) influencing brain subcortical volumes. We identified a variant associated with PD, rs201330591 in GTF2H2 (5q13; alternative T allele: OR [CI] = 8.16 [1.08; 61.52], FDR = 0.048), which was not replicated in an independent cohort of European ancestry (1,148 PD cases, 503 controls). In the EWAS, polygenic analyses revealed statistically significant multivariable associations of amygdala-[β(SE) = −0.039(0.013); FDR = 0.039] and caudate-PRS [0.043(0.013); 0.028] with motor symptoms. All subcortical PRSs in a multivariable model notably increased the variance explained in motor (adjusted-R 2 = 38.6%), cognitive (32.2%) and other non-motor symptoms (28.9%), compared to baseline models (∼20%).
Although, the small sample size warrants further replications, these findings suggest shared genetic architecture between PD symptoms and subcortical structures, and provide interesting clues on PD genetic and neuroimaging features.
Large-scale genomic studies carried out so far have scarcely investigated inter-individual variation in PD endophenotypes like neurological scales (3,4,(17)(18)(19)(20)(21)(22)24). A GWAS study of age-at-onset in 25,568 PD cases reported two genome-wide significant associations within SNCA and TMEM175 (25), while other preliminary GWAS of cognitive performance and motor symptoms progression are ongoing (26,27). Other SNP-based genomic studies tested associations of Polygenic Risk Scores (PRS) for PD with alpha-synuclein levels in the cerebrospinal fluid, age-at-onset of the disease, motor/cognitive symptoms and PD status [as reviewed in (28)], detecting significant associations with PD risk (29), earlier PD onset (29,30), and faster motor and cognitive decline (31). In addition, the largest case-control GWAS on PD carried out so far-involving the analysis of ∼56,000 cases and 1.4 million controls-identified significant genetic correlations with structural neuroimaging measures like intracranial and putamen volume (24). However, PRS analyses of Parkinson neuroimaging correlates were never reported. Overall, no NGS study so far focused on identifying genetic variants associated with PD endophenotypes, and there is a paucity of genomic studies doing so, in particular with motor, cognitive and non-motor scales, as well as with neuroimaging traits related to PD risk and symptoms, like subcortical volumes (32)(33)(34)(35).
Here, we present the first Whole Exome Sequencing (WES) analysis investigating continuous PD endophenotypes, and PD genetic susceptibility in mainland Italy. Through an exploratory multi-stage approach, we first performed rare variant prioritization and case-control association testing, attempting replication of findings in an international cohort of PD cases and controls of European ancestry (22). Then, we carried out Exome-Wide Association Scans with continuous neurological scales related to PD, assessing both motor and non-motor symptoms, to identify common variants potentially affecting these domains. Finally, we performed PRS analyses to test associations between polygenic scores influencing subcortical volumes and the above mentioned scales. Our study provides a contribution to the research on the genetic basis of PD, focusing on motor, nonmotor, and neuroimaging measures related to the disease.

PD Cohorts
Inclusion criteria for the participants to the study were reported Italian ancestry and a clinical diagnosis of PD by a qualified neurologist, according to published diagnostic criteria (see Supplementary Methods and (36).
Four hundred and seventy-two PD patients [288 males; 196 familiar cases; mean (SD) age of 66.6 (8.8) years] were recruited at the Parkinson Center of the specialized clinics IRCCS Neuromed, Pozzilli, Italy, between June 2015 and December 2017. They underwent a detailed phenotypic assessment and diagnostic protocol, which included neurological examination and evaluation of non-motor domains (see Supplementary Methods for details). The mean (SD) age at diagnosis was 58.3 (10.0) years. Along with patients, 121 nonconsanguineous family members with no neurological signs or symptoms of PD at the time of recruitment were involved in the study, by donating blood samples for targeted genetic analyses [mean (SD) age 62.9 (9.1) years; 44 males].
An additional cohort from mainland Italy was involved in the study, recruited at the Parkinson Institute of Istituti Clinici di Perfezionamento in Milan (hereafter called ICP). This included 82 related FPD patients of Italian ancestry, coming from 42 families with two or more first-degree relatives affected by PD [mean age 66.7 (10.4) years; mean (SD) age at diagnosis 60.69 (10.62) years; 41 males]. Further details on these cohorts are reported in Table S1a.
The project was approved by the ethical committees of IRCCS Neuromed, Pozzilli, and of ICP, Milan, and written informed consent was obtained from all the participating subjects.

Variants Prioritization, Validation, and Genetic Association Analysis With PD Status
Among 334,671 variants passing QC, we attempted to detect rare variants potentially associated with PD status in our dataset (123 PD cases). To this purpose, we applied the following bioinformatic pipeline (resumed in Figure 1): 1. We selected variants with predicted high or moderate impact on protein function, based on VEP annotation (41). These included 2,334 variants assumed to have high (disruptive) impact on the protein, probably causing protein truncation, loss of function or triggering nonsense mediated decay (hereafter called HIGH variants), and 67,047 nondisruptive variants that might change protein effectiveness (hereafter called MODERATE variants; see Table S2 for a detailed classification). 2. We retained variants with an alternative allele frequency (AF) at least five times higher than in three WES databases representative of the European population, namely 1,000 Genomes EUR (European Samples of the 1,000 Genomes project, phase 3 v5; N = 503) (

Exome-Wide Association Study With PD Endophenotypes
We tested common variants detected through WES for association with three continuous scales which assessed different domains usually affected in PD (Tables S5a,b) (Table S5b), we carried out a multivariate genetic association analysis on all the three scales together, through TATES (53), to identify relational pleiotropic effects on the domains assessed. These analyses were adjusted for different covariates, including PD familiarity, sex, age, pharmacological treatment status (ON/OFF), years of disease, daily L-Dopa dosage, and 10 genetic ancestry components. The significance threshold of the multivariate analysis was corrected for the number of LD-independent SNPs tested (α = 0.05/56,588 = 8.84 × 10 −7 ), as computed by the Genetic Type I error calculator (GEC) (54), while for univariate tests we applied an additional Bonferroni correction for the number of scales tested (α = 2.95 × 10 −7 ).
To follow-up on the results of the exome-wide association study (EWAS), we performed a targeted genotyping and association testing of the top hit identified in the whole Neuromed cohort (472 PD cases), through linear regression models with adaptive permutations in PLINK, using the same covariates as above except for genetic ancestry (see Supplementary Methods). After single univariate association tests with PD endophenotypes, we then combined the results into a multivariate association analysis through TATES.

Polygenic Risk Score (PRS) Analyses of PD Endophenotypes
We used exome-wide genetic data to build Polygenic Risk Scores (PRSs) influencing brain subcortical volumes, which were trained using summary statistics of a previous large independent GWAS (N max = 30,717) (55). This analysis was motivated by previous literature reporting both structural and functional alterations of several subcortical structures in PD (32)(33)(34)(35), for which these are often considered useful neuroimaging correlates. First we computed standardized best-fit polygenic scores through PRSice-2 (56), over varying association significance thresholds in the training GWAS (ranging from 5 × 10 −8 to 1), for nucleus accumbens, caudate, putamen, pallidum, amygdala, hippocampus and thalamus volume. Then we tested the resulting scores for association with UPDRS, MoCA and NMS scales through generalized linear models [glm() function in R], in a random extraction of one individual per family in our cohort (four relatives removed). These models were adjusted for PD familiarity, sex, age, pharmacological treatment status (ON/OFF), years of disease, daily L-Dopa dosage, and 10 genetic ancestry components, as above. Since these subcortical structures are thought to be functionally connected in complex connectivity networks and the pattern of atrophy is often spread across these networks in PD pathology (32,57), we performed multivariable models, testing all the PRSs built simultaneously, for each PD endophenotype. First, we assessed multivariate associations of each subcortical PRS to detect evidence at the specific structure level, applying a Benjamini-Hochberg correction for three different PD scales and five independent latent subcortical traits tested (58), as revealed by Matrix Spectral Decomposition applied to the phenotypic correlation matrix of these measures (59). Then, to have a measure of the total variance of PD endophenotypes explained by the polygenic scores, we compared adjusted Nagelke's R 2 values of the multivariable models including all the subcortical PRSs and covariates (hereafter called full PRS models) with the baseline models including only covariates (see above).

RESULTS
Exome-wide, we prioritized three validated variants showing the highest alternative allele frequency among those with high and moderate impact on protein function, in our cohort of patients. These included rs772162369 in MFSD6L and rs56407180 in KALRN among HIGH variants (AF = 2.03%), and rs201330591 in GTF2H2 among MODERATE variants (AF = 4.66%) ( Table S4). These variants were further tested for association with PD in the whole Neuromed cohort, which revealed a significant associations with PD for rs201330591 [uc011crt.  Table S6).

DISCUSSION
In this paper, we report an exploratory WES analysis of 123 PD cases from Italy. Although a previous study analyzed PD cases from an Italian genetic isolate, Sardinia (17), this represents the first WES study focused on PD patients from mainland Italy, the largest ever carried out in the country, and the richest in terms of phenotypes assessed. Indeed, in spite of the relatively small sample size sequenced and of the availability of exome (rather than whole genome) data, which represent the main limitations of the present study, we exploited the wealth of neurological scales assessed to carry out an exome-wide association study of motor and non-motor PD endophenotypes. Moreover, we tested associations of the scales available-namely UPDRS, MoCA and NMS-with PRSs known to influence subcortical volumes, which have long been considered as neuroimaging correlates of PD and neurodegeneration (32)(33)(34)(35). To our knowledge, this study represents the first attempt to test genetic associations with neurological scales in PD at the exome-wide level.
Through a stepwise approach, we identified a genetic variant with a frequency notably higher than in published WES databases and a significant association with PD in an extended analysis of our cohort, namely rs201330591, encoding a Serine-to-Threonine change in GTF2H2 (General Transcription Factor IIH Subunit 2; 5q13). GTF2H2 has been previously implicated as a modifier gene in spinal muscular atrophy (SMA), an autosomal recessive neurodegenerative disorder characterized by progressive death of motor neurons, implying proximal muscle weakness, and wasting in the absence of sensory signs. This gene is located not far from the causative gene of SMA, SMN1 (Survival Motor Neuron 1), and deletions involving this gene have been detected in severe forms of the disease (60). It encodes a subunit of the TFIIH transcription factor, which has been also implicated in Cockayne syndrome, a rare disease characterized by progeria and nervous system abnormalities, among other signs (61). However, the association detected was not replicated in the IPDGC cohort (22), which suggests caution in the interpretation of this finding. This may be due to several reasons, including different recruitment and filtering criteria of the two studies, the extreme genetic heterogeneity of PD, the lack of power in our analyses or the possibility that false positives were detected in the discovery cohort, due to its small sample size. Further replication attempts are warranted to support this finding.
The exome-wide analysis of continuous PD endophenotypes revealed an association approaching exome-wide significance at rs3835072, both in a univariate setting with MoCA score (representing general cognitive performance) and at the multivariate level, including other motor (UPDRS) and  non-motor PD endophenotypes (NMS). rs3835072 is an intronic indel predicted to alter splicing in the CCT7 gene (chaperonin containing TCP1, subunit 7; 2p13.2). This gene encodes a member of the chaperonin containing TCP1 (CCT) complex, which is impaired in severe neuropathies and in neurodegenerative disorders like Alzheimer's Disease (AD), where it is thought to promote toxic protein aggregates and cell death (62). Interestingly, the leading association signal identified at rs3835072 was with cognitive performance, which is impaired both in AD and in PD (63). However, this association only approached significance in an extended follow-up analysis of our PD cohort, which does not support a significant influence of this gene on cognitive performance. The most interesting findings of the present work come from associations analyses of polygenic risk scores (PRSs) influencing brain subcortical volumes with the continuous PD endophenotypes available. Multivariable regression models analyzing all the subcortical polygenic scores together (full PRS models) revealed a notable increase in the proportion of variance explained for the PD scales tested, compared to the baseline models including only covariates (see Methods section). In particular, the fraction of variance explained almost doubled for motor symptoms (UPDRS), increasing from ∼20% in the baseline model to ∼39% in the full model. For non-motor symptoms scales (MoCA and NMS), the increase in variance explained by the full PRS model was less sharp (32 and 29%, respectively), but still evident. This suggests that the genetic underpinnings of brain subcortical structures may be important in influencing PD symptoms, especially for the motor domain.
Among the seven subcortical PRS tested in the multivariable model, we observed significant associations of amygdala-and caudate-PRS with UPDRS. These findings do not support the significant bivariate genetic correlation between putamen volume and PD risk recently reported (24), a discrepancy which may be explained by the different methodologies used to investigate genetic overlap and by the low power provided by our study. Moreover, while the inverse association observed between the amygdala-PRS and motor symptoms is in line with its reported atrophy in PD (64,65), the positive association observed for the caudate polygenic score is in contrast with previous neuroimaging observations of reduced caudate volumes in Parkinson patients (66,67), although these associations are often localized and not always consistent (33,67). A potential explanation for this discrepancy may be again type I error, due to the small sample size of our study. Alternatively, since caudate hypertrophy has been associated with vascular parkinsonism (68) and compensatory hypertrophy mechanisms have been reported for some subcortical structures in PD (64, 66), we may hypothesize that some PD patients may have a genetic predisposition to atrophy/hypertrophy in different subcortical structures, each representing a unique "mosaic" in terms of liability to motor and non-motor neurological symptoms. Although we are still far from a comprehensive view of structural brain changes in PD, multi-omic studies involving neuroimaging, clinical and genetic levels may help to verify this hypothesis.
Overall, the evidence reported here suggests that it is likely low power the current bottleneck in the research on the genetic bases of such a heterogeneous disorder like PD (20), and underlines the need of collaborative efforts to homogenize genetic analyses and increase sample size in WES studies of the disease. In addition, studies exploiting diverse phenotypic, pharmacological and clinical information can provide clues into the neurobiological basis of the disease. Overall, this paper represents an exploratory attempt in this sense, providing interesting insights into the shared genetic bases of PD symptoms and brain subcortical structures, in spite of the small sample size. This suggests further collaborative investigations in order to elucidate the genetic underpinnings of Parkinson Disease, its neurological endophenotypes and neuroimaging correlates.

DATA AVAILABILITY STATEMENT
Raw WES data which were analyzed in the present manuscript will be made available upon request to the corresponding author, in a way which does not affect privacy of the patients involved in the present study.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by IRCCS Neuromed. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
TE, AS, and MC designed and supervised the study. NM, SV, SP, and MR recruited the patients, carried out phenotypic assessment, and collected the data. MR and AT carried out database curation, and performed sample management and bio-banking, along with AL and CD. AG and TN performed quality control and analysis of WES and other genotype data, under the supervision of MC. The International Parkinson's Disease Genomic Consortium (IPDGC) provided the replication data set and relevant statistics. MR and AT performed wet lab experiments, under the supervision of TE. AG wrote the manuscript, with contributions and final approval by all the co-authors.