The Association Between Lysosomal Storage Disorder Genes and Parkinson’s Disease: A Large Cohort Study in Chinese Mainland Population

Background: Recent years have witnessed an increasing number of studies indicating an essential role of the lysosomal dysfunction in Parkinson’s disease (PD) at the genetic, biochemical, and cellular pathway levels. In this study, we investigated the association between rare variants in lysosomal storage disorder (LSD) genes and Chinese mainland PD. Methods: We explored the association between rare variants of 69 LSD genes and PD in 3,879 patients and 2,931 controls from Parkinson’s Disease & Movement Disorders Multicenter Database and Collaborative Network in China (PD-MDCNC) using next-generation sequencing, which were analyzed by using the optimized sequence kernel association test. Results: We identified the significant burden of rare putative LSD gene variants in Chinese mainland patients with PD. This association was robust in familial or sporadic early-onset patients after excluding the GBA variants but not in sporadic late-onset patients. The burden analysis of variant sets in genes of LSD subgroups revealed a suggestive significant association between variant sets in genes of sphingolipidosis deficiency disorders and familial or sporadic early-onset patients. In contrast, variant sets in genes of sphingolipidoses, mucopolysaccharidoses, and post-translational modification defect disorders were suggestively associated with sporadic late-onset patients. Then, SMPD1 and other four novel genes (i.e., GUSB, CLN6, PPT1, and SCARB2) were suggestively associated with sporadic early-onset or familial patients, whereas GALNS and NAGA were suggestively associated with late-onset patients. Conclusion: Our findings supported the association between LSD genes and PD and revealed several novel risk genes in Chinese mainland patients with PD, which confirmed the importance of lysosomal mechanisms in PD pathogenesis. Moreover, we identified the genetic heterogeneity in early-onset and late-onset of patients with PD, which may provide valuable suggestions for the treatment.


INTRODUCTION
As the second most common neurodegenerative disease, Parkinson's disease (PD) is characterized by the pathological changes of the misfolding and accumulation of α-synuclein, as well as the progressive loss of dopaminergic neurons in the substantia nigra (He et al., 2018). In addition, the other currently known pathogenic molecular mechanisms of PD include autophagy-lysosomal and ubiquitin-proteasome system defects, mitochondrial dysfunction, neuroinflammation, and oxidative stress (Nguyen et al., 2019;Jankovic and Tan, 2020). Studies in recent decades supported that genetic factors play essential roles in PD (Blauwendraat et al., 2020), including the common risk variants and rare damaging mutations. Although the mechanism involved in the development of PD is not yet clear, more and more studies have indicated that PD has a strong connection with lysosomes at the genetic, biochemical, and cellular pathway levels (Robak et al., 2017;Klein and Mazzulli, 2018;Hopfner et al., 2020). In addition, some studies suggested that the lysosomal enzyme activity in biological fluids could even be considered as a biomarker for the diagnosis of synucleinopathies, such as PD Parnetti et al., 2019).
Lysosomal storage disorders (LSDs) comprised more than 70 diseases characterized by lysosomal dysfunction (Platt et al., 2018), most of which are inherited in an autosomal recessive manner. Mutations in LSD pathogenic genes may affect the function of the encoded protein, which may lead to lysosomal dysfunction, accumulation of substrates inside the lysosome, and further cell dysfunction and cell death. Since many monogenic LSDs have been described, they could be further classified according to the biochemical types of stored substances (such as sphingolipids, mucopolysaccharides, and glycoproteins) (Platt et al., 2018).
The burden of cumulative rare variants in LSD genes was identified and validated in a large European population with PD (Robak et al., 2017), even excluding GBA, the established most common risk factor of PD (Sidransky et al., 2009). Heterozygous, homozygous, or compound heterozygous mutations in the GBA gene have been robustly confirmed as the most important genetic risk factor for PD, increasing the risk of PD by more than five times (Sidransky et al., 2009;Chahine et al., 2013). Furthermore, mutations in the SMPD1 gene have also been reported to be associated with an increased risk of PD. SMPD1 gene encodes the lysosomal enzyme acid sphingomyelinase (ASMase), which is a lysosomal gene involved in sphingolipid metabolism (Foo et al., 2013;Gan-Or et al., 2013;Wu et al., 2014;Li et al., 2015;Mao et al., 2017;Alcalay et al., 2019). In addition, the rare damaging variants of other LSD genes, including CTSD, SLC17A5, ASAH1, LAMP1, and TMEM175, were found to be associated with PD as well (Robak et al., 2017;Hopfner et al., 2020). Recent years have witnessed the findings of the link between LSD genes, including PSAP (Oji et al., 2020), ARSA (Lee et al., 2019), and the Mendelian inheritance form of PD. Specifically, common variants in GUSB, GRN, and NEU1 loci may alter the susceptibility to PD (Nalls et al., 2019). Nowadays, more and more genetic studies have shown that PD may have a more complex inheritance pattern, including oligogenic and polygenic inheritance of the same or related PD pathway genes, which further strengthens the synergistic relationship between them (Lubbe et al., 2016;Smolders and Van Broeckhoven, 2020).
Overall, these results indicated an important role of lysosomal dysfunction in PD, which has also been replicated and reviewed by more and more researchers (Klein and Mazzulli, 2018;Ysselstein et al., 2019;Hopfner et al., 2020). However, previous studies have found that there was significant genetic heterogeneity among different populations. For example, p.N370S exclusively increased PD risks in European and West Asians, as were p.R120W in Japanese populations, whereas p.L444P increased the risk of PD in mainland China, as well as the other non-Ashkenazi Jewish ethnicity (Sun et al., 2010;Zhang et al., 2018). In this study, we explored the association between rare variants in LSD genes and PD in a large Chinese mainland cohort.

Study Participants
The patients with PD were recruited from the Xiangya Hospital (Central South University) and other cooperating hospitals of Parkinson's Disease & Movement Disorders Multicenter Database and Collaborative Network in China (PD-MDCNC) 1  and were diagnosed by neurologists according to the UK brain bank criteria (Hughes et al., 1992) or Movement Disorders Society clinical diagnostic criteria (Postuma et al., 2015). Matched controls who were free from neurological diseases were also recruited. The protocol used in this study was approved by the Ethics of Committee of Xiangya Hospital, Central South University; written informed consent was obtained from all subjects.
We have mined the sequencing data from two cohorts. The first one is our previous whole-exome sequencing (WES) cohort , which includes familial patients or sporadic early-onset patients and ethnic-matched controls (this cohort has also expanded its sample size in the past year). The other one is our follow-up whole-genome sequencing (WGS) cohort, which included sporadic late-onset patients and sex/age/ethnicmatched controls. Early-onset is defined as the age-at-onset (AAO) ≤ 50 years old; late-onset is defined as the AAO > 50 years old. The primary demographic data of the two cohorts are listed in Supplementary Table 1.

Genotyping and Quality Control
Subjects from the WES cohort follow these settings: library preparation with an Agilent SureSelect Human All Exon V6 Kit, sequencing on the Illumina X10 platform in a pairedend 2 × 150 bp sequencing mode, with 123-fold of average sequencing depth; and subjects from the WGS cohort follow these settings: library preparation with an Illumina TruSeq Library Preparation Kit, sequencing on the Illumina Nova platform in a paired-end 2 × 150 bp sequencing mode, with 12-fold of average sequencing depth. Then, the sequencing data were analyzed using the BWA-GATK-ANNOVAR pipeline as follows: the sequencing data were aligned to the human reference genome (hg19 version) using the Burrows-Wheeler Aligner (BWA) (Li, 2014), then variant calling was performed using the Genome Analysis Toolkit (GATK) (Van Der Auwera et al., 2013), subsequent variants were annotated using ANNOVAR  and Varcards 2 (Li et al., 2018a), as described in our previous study . Similar to the quality control standards used in our earlier study , individuals with low genotype rate, ambiguous gender, or cryptic relatedness were excluded from this study.
Our analyses were included in 69 LSD genes, which were reviewed or reported in previous studies (30275469, 29140481, and 32036093) or OMIM database, 3 and then according to the biochemical types of stored substances of the corresponding genes, they are divided into 11 subtypes, as shown in Supplementary Table 2.
Specifically, our previous study suggested that REVEL (Ioannidis et al., 2016) and VEST3 (Carter et al., 2013) have the best overall performance among several prediction software, so that the ReVe (Li et al., 2018b) program was developed to predict destructive missense variants. Moreover, in this study, those likely damaging missense variants and LoF variants were defined as putative damaging variants. Within these categories, variants were filtered based on two minor allele frequency (MAF) thresholds of 0.01 and 0.03. Usually, only MAF < 0.01 was used for the analysis of rare variants. But in this study, we considered including more common LSD variants (MAF < 0.03), which could provide better sensitivity for detecting significant aggregate variant associations (Robak et al., 2017). Of note, patients with pathogenic/likely pathogenic variants of 23 PD-causing genes from the WES cohort, including carriers of ATP13A2 biallelic variants, were excluded from this study, as described in our previous study .

Statistical Analysis
Gene-based or gene-set-based burden analyses were analyzed using sequence kernel association test-optimal (SKAT-O) tests. The sex, age/AAO, and top five principal components were treated as covariates in the WGS cohort, whereas only the sex and the top five principal components were treated as covariates in the WES cohort (given the controls in the WES cohort were the healthy elderly people, we did not include the age/AAO as the covariates) (Lee et al., 2012;Pan et al., 2020). To adjust for multiple comparisons, we applied the Bonferroni correction to control the family-wise error rate based on a significance level, α of 0.05, and results with a P-value of < 0.05, but not surviving the Bonferroni correction, as "suggestive." To assess the potential oligogenic inheritance of the LSD genes in PD, we explored the proportion of PD cases and controls carrying two or more rare damaging variants of LSD genes.

RESULTS
For the familial or sporadic early-onset PD cohort, a total of 3,569 unrelated subjects were included, consisting of 1,917 patients with PD and 1,652 controls. To investigate whether there is an aggregate burden of LSD gene variant set on the risk of PD, we first conducted the SKAT-O analysis on the WES cohort. We identified suggestive significant associations among LSD gene different variant sets, especially in the rare or low-frequency putative damaging variant category (P [MAF < 1%] = 1.02E-09, P [MAF < 3%] = 1.07E-06) ( Table 1). To confirm the involvement of other LSD genes other than GBA, given the significant effects of GBA, a secondary burden analysis was performed after the exclusion of all variants in GBA. The results still showed the significant evidence of association between remaining genes and PD in the rare or low-frequency putative damaging variant category (P [MAF < 1%] = 0.009, P [MAF < 3%] = 0.033) ( Table 1). Then, to explore the variant set in which the LSD subgroup gene played an important role in the mechanism of PD, we performed SKAT-O analysis with 11 different subgroups of LSD gene sets on the 1 | Analysis of lysosomal storage disorder (LSD) associated gene rare variant burden in Parkinson's disease (PD).

WES cohort WGS cohort
Variants group a Number, total number of LSD variant (number of variants excluding GBA). The number of variants excluding those in GBA is shown in parentheses. b P-value was calculated by sequence kernel association test-optimal (SKAT-O). Similar with original article, primary analyses consider the variant burden among 69 LSD genes, and secondary analyses were performed excluding all variants in GBA (shown in parentheses). MAF, minor allele frequency; LoF, loss of function; Dmis, damaging missense (ReVe > 0.7). The estimated number of independent tests was 6 in the analysis of LSD gene sets, corresponding to a Bonferroni-corrected significance threshold of P < 0.0083. We considered those results with a P-value < 0.05, but not surviving the Bonferroni correction, as "suggestive".
WES cohort. Specifically, the subgroup of LSDs was classified according to the biochemical types of stored substances of the corresponding genes. We only identified significant association among sphingolipidosis deficiency disorder gene variant sets, especially in the rare or low-frequency putative damaging variant category (P [MAF < 1%] = 8.72E-16) ( Table 2). Similarly, a secondary burden analysis was conducted after excluding all GBA variants, which confirmed the involvement of other sphingolipidosis deficiency disorder genes. It showed suggestive significant evidence between remaining genes and PD only in the rare or low-frequency missense variant category (P [MAF < 1%] = 0.025) ( Table 2).
For the sporadic late-onset PD cohort, 1,962 patients with PD and 1,279 controls were included. The suggestive difference was still detected between LSD gene variant sets and PD in the rare putative damaging variant category (P [MAF < 1%] = 0.014, P [MAF < 3%] = 0.010) ( Table 1). Nevertheless, this suggestive difference was not persisted after excluding the GBA variants. Then, we also conducted SKAT-O analysis with different subgroups of LSD gene sets on the WGS cohort. We identified a significant association among variant sets in genes of sphingolipidosis deficiency disorders, especially in the rare or low-frequency missense variant category (P [MAF < 1%] = 0.00046). Nevertheless, this suggestive difference was not persisted after excluding the GBA variants, either. Furthermore, we also detected suggestive significant association among variant sets in genes of mucopolysaccharidosis deficiency disorders (missense variants category, P [MAF < 1%] = 0.002) and post-translational modification defect disorders (missense variant category, P [MAF < 1%] = 0.031).
For the gene-based analysis, we focused on the rare variants with MAF < 1%. The results showed the most robust association between the LSD gene set and PD. Among the WES cohort with familial PD or sporadic early-onset patients with PD, except for the established GBA (P = 1.58E-22), the rare putative damaging variant category of additional genes also shown suggestively significant difference between cases and controls, including the potential susceptibility gene SMPD1 (P = 0.015), as well as other four novel genes: GUSB (P = 0.003), CLN6 (P = 0.015), PPT1 (P = 0.022), and SCARB2 (P = 0.034) ( Table 3). With the similar analysis in the WGS cohort (sporadic late-onset PD cases), except for the established GBA (P = 5.80E-06), the rare putative damaging variant category of additional genes also shown a suggestive significant difference between cases and controls in our cohort, including GALNS (P = 0.01) and NAGA (P = 0.05) ( Table 3). Specifically, the variants for those 69 genes included in this study are shown in Supplementary Table 3.
Finally, the distribution of rare damaging variants of the LSD gene in our cohort was also analyzed (MAF < 1%): the average variant burden among cases (0.879 alleles per individual) was slightly higher than that seen in controls (0.751 alleles per individual) within our cohort (Figure 1). Of note, only 13 of 1,962 patients with PD carried homozygous putative damaging variants of LSD genes (Supplementary Table 4). Additionally, 56.3% of cases have at least rare likely damaging variants in one LSD gene, whereas 24.0% carry multiple alleles.

DISCUSSION
In this study, our results strengthen the vital connection between the LSD genes and PD in the Chinese mainland population, which was supported by the burden of non-synonymous alleles of 69 LSDs causing genes, especially putative damaging variants, in the association with PD. Specifically, the associations were statistically significant in the early-onset cohort of patients with PD, even excluding individuals carrying GBA variants. Then, the new potential specific subgroups of LSD and novel potential susceptibility genes identified in this study further implicate the importance of lysosomal mechanisms in PD pathogenesis. Moreover, the results also indicated certain genetic differences between early-onset PD and late-onset PD.
As described above, we identified suggestive associations between sphingolipidosis storage disease genes and PD in both cohorts. Specifically, among the sporadic late-onset PD cohort, we also identified the potential connection between post-translational modification defects, mucopolysaccharidosis storage disease genes, and PD. Considering the combined consideration of LSDs or their subgroup genes may improve statistical power over a single gene (Zuk et al., 2014),  MAF, minor allele frequency; LoF, loss of function; Dmis, damaging missense (ReVe > 0.7). The estimated number of independent tests was 6 in the analysis of LSD gene sets, corresponding to a Bonferroni-corrected significance threshold of P < 0.0083. We considered those results with a P-value < 0.05, but not surviving the Bonferroni correction, as "suggestive".
Bold text indicates a statistically significant assciation with a p-value < 0.05.
we should pay more attention to the relationship between the mechanisms of post-translational modification defects, mucopolysaccharidosis storage disease, and PD, especially in the lysosomal mechanism of late-onset patients with PD. Neurodegeneration is a remarkable phenotype of almost all LSDs (Platt et al., 2018), indicating the correlation between lysosomal degradation and the maintenance of neuronal health. Several studies have proved that activities of GBA and several other sphingolipid hydrolase enzymes are reduced in the brain or other body fluid of PD (Balducci et al., 2007;Alcalay et al., 2015;Chiasserini et al., 2015;Huebecker et al., 2019). Interestingly, it has been shown that phosphorylation, SUMOylation, and ubiquitination play a role in protein aggregation, exocytosis, and degradation (Junqueira et al., 2019). Specifically, for several mucopolysaccharidosis storage diseases, enzyme replacement therapy or hematopoietic stem cell therapy has been available (Platt et al., 2018), which may have reference value for the treatment of late-onset patients with PD. Lysosomal storage disorders were mainly childhood-onset Mendelian disorders, but PD was a neurodegenerative disease significantly associated with aging . As an essential pathway for α-synuclein degradation, lysosomal biology plays an important role in PD (Moors et al., 2016). This study supported shared genetic factors between the LSDs and PD in the Chinese mainland population, as the similar results from previous studies (Robak et al., 2017;Hopfner et al., 2020), which may provide design ideas for future experimental research on the pathogenesis of idiopathic PD. Understanding the pathophysiology of the endosome-lysosome-autophagy system is of great significance for the development of new therapeutic strategies not only for LSDs but also for PD .
Then, we verified the robust association of GBA and PD, as well as the suggestive link between different additional susceptible genes and PD in the familial or sporadic earlyonset PD cohort (SMPD1, GUSB, CLN6, PPT1, and SCARB2) and sporadic late-onset PD cohort (GALNS and NAGA). As we could see, GBA and SMPD1 were identified as the susceptibility genes in both European-ancestry populations and our population, which suggested that they might play a robust role in the diverse populations. Specifically, SMPD1 encodes for acid sphingomyelinase (ASM), and its deficiency may affect the hydrolysis of sphingomyelin to phosphocholine and ceramide in late endosomes and lysosomes . The evidence in our study and previous studies supported that SMPD1 variants increase the risk for PD, and then it may also suggest the potential role of the reduced ASMase activity in α-synuclein accumulation (Alcalay et al., 2019). However, there are still very few studies about it. Therefore, it needs more attention in the follow-up experimental functional studies in PD. Specifically, the association of variants of the SCARB2 gene and PD was previously reported (Nalls et al., 2014;Alcalay et al., 2016;Rudakou et al., 2021) in European cohorts. SCARB2 encodes the membrane protein required for correct targeting of glucocerebrosidase (GCase) to the lysosome (Zeigler et al., 2014). The association between the SCARB2 gene and PD was hypothesized to be due to differential trafficking of GCase to the lysosome (Dardis et al., 2009). Furthermore, the  Bold text indicates a statistically significant assciation with a p-value less than 0.05. differences in the additional susceptible genes identified in our cohort and European-ancestry populations might implicate the genetic heterogeneity of the different populations and different AAO ranges . The genetic heterogeneity of different subtypes of PD may also indicate that their underlying mechanisms could be different, and future genetic research may also need further analysis for different subtypes of PD. Interestingly, it suggestively supported that six genes were associated with the familial or sporadic early-onset cases, whereas only three were linked to the sporadic late-onset cases. We may speculate the possibility that familial or early-onset cases were more likely to be caused by putative damaging variants with statistically significant amounts in certain susceptibility genes. Nevertheless, only associated genes of one LSD subgroup were suggestively associated with the familial or sporadic earlyonset cases, whereas associated genes of three LSD subgroups were linked to the sporadic late-onset cases. We may infer that sporadic late-onset cases may be prone to be caused by the accumulation of a small amount of putative damaging variants in plenty of LSD genes. However, this idea still needs further replication in a larger multiethnic population. In addition, considering the cumulative effect of putative damaging variants of the LSD genes in PD, the role of oligogenic variants for missing heritability in PD (Pihlstrom and Toft, 2011) may need more attention.
Of note, for several potential novel PD risk genes identified in our study and specific genes implicated from other studies, it may also help provide valuable suggestions for the treatment of PD based on the novel treatment of LSDs. Mutations in GUSB, CLN6, and PPT1 cause Sly disease, neuronal ceroid lipofuscinosis (CLN6), and neuronal ceroid lipofuscinosis (CLN1), respectively, whereas mutations in GALNS and NAGA cause Morquio A disease and Schindler disease/Kanzaki disease, respectively. Among them, the proportion of neuronal ceroid lipofuscinosis (NCL) is relatively large. Moreover, previous studies have also implicated the association between NCL and PD, including the evidence of the analysis of the brain tissues of Atp13a2 conditional-knockout mice (Sato et al., 2016) and the identification of the reduced dopamine transporter density in the striatum in juvenile NCL patients with parkinsonism FIGURE 1 | Distribution of lysosomal storage disorder (LSD) variants in all subjects. The number of rare damaging LSD variants (MAF < 1%, ReVe score > 0.7) in each individual is shown as the proportion of cases (blue) or controls (orange). Many subjects harbor multiple LSD variants, and all variants of 69 LSD genes were considered. (Aberg et al., 2000). Overall, the potential mechanism between these LSD genes and PD needs further experimental verification. Furthermore, there was plenty of evidence supporting the relationship between autophagy-lysosomal pathway damage and misfolded α-synuclein aggregation and propagation, so compounds targeting autophagic-lysosomal pathway restoration may serve as a promising disease-modifying strategy for PD treatment (Bellomo et al., 2020).
However, there are still several limitations in this study. First, two different methods of sequencing (e.g., WES vs. WGS) were used for the studied cohorts, and the WES variants or the WGS variants were not validated by a second independent sequencing technology and may include certain false positives. Second, no further verification of the novel susceptibility genes discovered in another larger cohort and a larger PD sequencing data set with higher statistical power are needed to resolve specific LSD genes that cause PD risk. Third, there was no in-depth exploration of the new and known susceptibility genes in terms of functional mechanisms.

CONCLUSION
Our results supported the association between LSD genes and PD, which may help design experimental studies to elucidate the lysosomal mechanisms in the pathogenesis of PD. However, a larger sample size is needed to confirm these associations in multiethnic populations, and the risk of a single LSD gene should be cautiously interpreted, especially in different ethnicities. Moreover, we revealed several novel risk genes in patients with PD from mainland China. We identified the genetic heterogeneity in early-onset and late-onset patients with PD, which may provide valuable suggestions for treating different subtypes of PD.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Xiangya Hospital, Central South University. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
B-ST: obtained funding, study supervision, had full access to all the data in this study, and takes responsibility for the integrity of the data and the accuracy of the data analysis. Y-WZ and B-ST: study concept and design and drafting of the manuscript. Y-WZ, H-XP, ZL, YW, QZ, Z-HF, KX, ZW, RH, BL, GZ, QX, Q-YS, X-XY, and J-FG: acquisition, analysis, or interpretation of data. J-CL, J-FG, and B-ST: administrative, technical, or material support. All authors: critical revision of the manuscript for important intellectual content.

ACKNOWLEDGMENTS
We are grateful to all subjects for their participation in our study.