Identification of Disease-Associated Variants by Targeted Gene Panel Resequencing in Parkinson's Disease

Background: Recent advanced technologies, such as high-throughput sequencing, have enabled the identification of a broad spectrum of variants. Using targeted-gene-panel resequencing for Parkinson's disease (PD)-associated genes, we have occasionally found several single-nucleotide variants (SNVs), which are thought to be disease-associated, in PD patients. To confirm the significance of these potentially disease-associated variants, we performed genome association analyses, using next-generation target resequencing, to evaluate the associations between the identified SNVs and PD. Methods: We obtained genomic DNA from 766 patients, who were clinically diagnosed with PD, and 336 healthy controls, all of Japanese origin. All data were analyzed using Ion AmpliSeq panel sequences, with 29 PD- or dementia-associated genes in a single panel. We excluded any variants that did not comply with the Hardy–Weinberg equilibrium in the control group. Variant frequencies in the PD and control groups were compared using PLINK. The identified variants were confirmed to a frequency difference of P < 0.05, after applying the Benjamini–Hochberg procedure using Fisher's exact test. The pathogenicity and prevalence of each variant were estimated based on a public gene database. Results: We identified three rare variants that were significantly associated with PD: rs201012663/rs150500694 in SYNJ1 and rs372754391 in DJ-1, which are intronic variants, and rs7412 in ApoE, which is an exonic variant. The variants in SYNJ1 and ApoE were frequently identified in the control group, and rs201012663/rs150500694 in SYNJ1 may play a protective role against PD. The DJ-1 variant was frequently identified in the PD group, with a high odds ratio of 2.2. Conclusion: The detected variants may represent genetic modifiers or disease-related variants in PD. Targeted-gene-panel resequencing may represent a useful method for detecting disease-causing variants and genetic association studies in PD.

Many genome-association studies, including GWAS, have been conducted for PD; however, some unrevealed genetic background remains, referred to as "missing heritability" (13,14,16). Missing heritability is the difference between heritability estimated from twin studies and GWAS, as GWAS has only been able to detect some of the heritability estimated from twin studies (17). Many explanations for missing heritability have been proposed, including unrevealed variants with smaller effects, rarer variants that are poorly detected by the currently available genotyping arrays, copy number variants that cannot be detected by available arrays, and the low power to detect genegene interactions (16). Variants of GBA are known to be strong risk factors for sporadic PD but have not been detected by GWAS, likely due to a low minor allele frequency.
We have developed a targeted-gene-panel resequencing protocol to screen 29 PD-associated genes, simultaneously. Panel resequencing has both advantages and disadvantages because it can identify multiple types of variants, including pathogenic variants, risk-associated variants, and rare variants of uncertain significance. Therefore, determining which variants are disease-associated can be difficult. A previous report describing Mendelian genes showed that rare functional variants occurred more frequently in sporadic PD cases than in control cases, indicating that Mendelian genes may be associated not only with familial PD but also with sporadic PD, which may be assessable using panel resequencing (18). In our analyses, through targeted-gene-panel resequencing, rare variants were identified in ∼40% of PD patients with a family history or early-onset PD (data not shown), and pathogenic variants were found in an even smaller percentage of patients. We also identified several putative disease-associated variants in PD patients. We hypothesized that these variants may play a role in PD onset and could account for some degree of missing heritability. Thus, we aimed to implement target-panel resequencing, to identify associations between SNVs and familial or early-onset PD. Our method contributes to expanding the understanding of missing heritability among familial and earlyonset PD patients.

Participants
The present study was approved by the ethics committee of Juntendo University, Tokyo, Japan, and all participants provided written informed consent to participate in the genetic research. We collected DNA samples from the Juntendo PD DNA bank, which included 766 patients with PD, who were clinically diagnosed using standard criteria (1), and 336 healthy control subjects. Among these, 407 PD patients had a family history of PD (average age at onset: 54.6 ± 15.77 years, range 6-88), and the remaining 359 PD patients were without family history (average age at onset: 42.0 ± 11.22 years, range 9-83). We also collected data regarding the Hoehn and Yahr stages for each PD patient. The healthy controls were defined as individuals without any individual or family history of neurodegenerative disorders. An overview of the clinical characteristics of the included PD patients and healthy controls is shown in Table 1.

Processing Data Output From the Ion Torrent System
The sequencing analysis of the Ion AmpliSeq panel (Thermo Fisher Scientific, Waltham, MA, USA) was performed using the Ion Chef System (Thermo Fisher Scientific) and the Ion S5 Sequencer (Thermo Fisher Scientific), according to the manufacturer's instructions. Our Ion AmpliSeq panel (Thermo Fisher Scientific, IAD103177_182) included 29 PD-and dementia-related genes ( Table 2), and its coverage was 98.34% (829 amplicons, missed: 1,646 bp) (manuscript in preparation). The output data were obtained as a variant call format (VCF) file from the Ion torrent system. VCF files were processed using vcftools (19).

Statistical Analysis to Compare the Frequencies of Non-rare Variants
We confirmed all samples with a mean depth >100 and excluded those amplicons with read depths smaller than 10. The analyzed variants were confirmed to exist among the target sequences and to have read depth of coverages >45. We also calculated the coverage percentage. During the variant-screening stage, we excluded all variants that did not comply with Hardy-Weinberg equilibrium (HWE; P < 0.05) within the control group (Figure 1). We analyzed only the control group during the variant-screening stage because performing HWE analysis while including PD patients would introduce bias. During the analysis stage, the variant frequencies observed for the PD and healthy non-PD groups were compared using PLINK 1.9 (20). To verify this comparison, variants with a frequency difference of P < 0.05, based on the performance of the Benjamini-Hochberg procedure and Fisher's exact test, were analyzed using the genotyping data available in 4.7KJPN, from the Japanese Multi Omics Reference Panel (jMorp) (21), and a genome aggregation database (gnomAD) (22). The scheme used for the analysis is presented in Figure 1. To confirm the presence of significant variants identified during the association study, we conducted Sanger sequencing on three cases with the variant and three cases without the variant, during the panel resequencing experiment.

RESULTS
The percentage of coverage was calculated, showing that 99.7% of the total dataset was read at a depth of least 1×, 98.9% at 20×, 97.9% at 100×, and 84% at 500×. During the variant screening stage, we identified 796 variants in our healthy controls, of which 749 were retained after screening for HWE compliance (P < 0.05) and were included in the association analysis performed using PLINK. We conducted Sanger sequencing on nine significant variants with p-values below 0.05 after performing the Benjamini-Hochberg procedure for Fisher's exact test, and five of them (chr1:65830299 T>G, chr1:65830300 T>G, chr3:184033555, chr2:233620927-233620929, and chr1:205743943) were not validated and excluded from the analysis. All of the false-positive variants were positioned around the tandem repeat of mononucleotides that was considered to cause false positives. Table 3 shows the top 15 variants that had the lowest p-values based on Fisher's exact test. Four variants were significantly associated with PD: rs201012663 and rs150500694 in SYNJ1, rs372754391 in DJ-1, and rs7412 in ApoE ( Table 4). The two SYNJ1 variants, rs201012663 and rs150500694, were considered to represent a single variant because they are located four bases apart and demonstrated the same frequency in our subjects and public gene databases, which suggests that these variants are strongly linked (Tables 3, 4). The SYNJ1 variants are both located in an intron, with an odds ratio of 0.37. The DJ-1 variant (rs372754391) was also intronic and was more frequently identified in the PD cohort than in controls, with an odds ratio of 2.2. However, its frequency in the public database was quite large compared with the frequency in our data. The ApoE variant was exonic and was more frequently observed in the control group than in the PD group, with an odds ratio of 0.39. The ApoE variant was one of the single-nucleotide polymorphisms (SNPs) that determine the ApoE genotype. The E2 ApoE genotype was

Genes related to PD
Genes related to dementia more frequently observed in the control group, whereas the E4 genotype was more frequently observed in the PD group  ( Table 5). No significant differences in age, age at onset, or Hoehn and Yahr scores were observed between patients with and without detected variants ( Table 6). We do not have data for four variants (rs16856139, rs11931532, rs11931074, and rs1994090) that were previously identified in a GWAS performed in Japanese PD patients because these variants were absent from our target panel (13). LRRK2 G2385R (rs34778348), which is a risk factor for PD in East Asian individuals, was the 21st most significant variant identified among our cohort (23). Except for rs34778348, none of the currently known risk variants for PD were detected.

DISCUSSION
We performed a genetic case-control analysis, using NGS data from our Ion AmpliSeq panel. We identified three variants in three different genes: the combination of rs201012663 and rs150500694 in SYNJ1, rs372754391 in DJ-1, and rs7412 in ApoE. None of these three variants were reported as PDrelated variants when we searched a GWAS catalog on June 8, 2020 (24). Our identified variants might account for missing heritability in PD. Targeted resequencing could perform deeper reads of selected genes associated with phenotypes than the microarrays that are normally used in GWAS. Thus, targeted resequencing-based association studies may be able to identify risk variants that have not been previously identified by GWAS (17).
The three identified variants have never previously been reported as variants associated with PD. In our study, variants in SYNJ1 (rs201012663 and rs150500694) showed a higher frequency in the control group than in the PD group. SYNJ1 is known to be a causative gene for early-onset Parkinsonism, with atypical characteristics, such as seizures, dystonia, and dementia, with an autosomal-recessive inheritance pattern (25,26). This gene encodes the protein Synaptojanin 1, a polyphosphoinositide phosphatase that is concentrated at synapses (27,28). Synaptojanin 1 is associated with synaptic vesicle endocytosis. The variants identified in SYNJ1 (rs201012663/rs150500694) in this study have not previously been reported to be pathogenic variants. Synaptojanin 1 is also known to play a role in the pathogenesis of Alzheimer's disease (AD), associated with a PI (4, 5)P 2 imbalance. The haploinsufficiency of SYNJ1 protects cells from the neurotoxic actions of Aβ42 (29). The variants rs201012663/rs150500694 might play a similarly protective role against alpha synucleinmediated neurotoxicity.
The identified variant in DJ-1 might be interesting, due to the high odds ratio of 2.2. However, this variant may be specific to ethnicity because the frequency of this variant among our healthy controls was lower than that observed in public databases. This variant was not recorded in jMorp, one of the largest genomic databases in Japan, suggesting its rarity in the Japanese population. DJ-1 was initially identified as an oncogene and was later found to cause familial PD (30). DJ-1 has also been associated with other disorders, including stroke, familial amyloidotic polyneuropathy, and type 2 diabetes The variant in SYNJ1 is recorded separately as rs201012663 and rs150500694 in dbSNP153, in gnomAD, and in jMorp. The variant in DJ-1 is recorded as rs372754391 in dbSNP153 and registered separately as two single variants in gnomAD. (30-33). DJ-1 has several functions, including transcriptional regulation, antioxidative stress reactions, chaperone, protease, and mitochondrial regulation (30). DJ-1 is expressed in almost all cells, including neurons and glial cells. DJ-1 protein contains three cysteine residues, C46, C56, and C106. C106 is likely to be influenced by oxidative stress and oxidized into SOH, SO 2 H, and SO 3 H (34-36). DJ-1 containing a C106 residue that has been oxidized to SO 3 H is thought to represent an inactive form (37). In the brains of PD patients, excessively oxidized forms of DJ-1 have been observed (38). The identified mutation might facilitate oxidation, inactivating DJ-1. APOE genotypes have previously been associated with an increased risk of AD (39,40). rs7412 is one of two SNVs that have been defined in common allelic APOE variants. APOE4 is known to represent a strong risk factor for AD. The variant (rs7412) identified in our study is included in APOE1 or APOE2, which are known to decrease the risk of AD. rs7412 was significantly rare in the PD group in our study. In our study, APOE2 was significantly rare in the PD group, whereas APOE4 was significantly frequent in the PD group. Larger research studies have concluded that APOE epsilon had no association with PD onset (41). Differences between our study and past studies may be due to the smaller sample size included in our study and differences in the ethnicities of the participants.
In our study, SNVs detected in previous GWAS were not identified in our cohort because most of the reported riskassociated SNVs have been identified in non-coding regions, which were not included in our targeted panel (42). Targeted resequencing can cover more SNVs within the targeted exons than DNA microarrays, which are commonly used in GWAS. Our method might enable the detection of SNVs in exons or near exons that are not included in the SNP chips used for GWAS. Our target panel was designed to include all exons and the 25 bp up-and downstream of the exon-intron boundaries. Therefore, our method allowed the discovery of PD-related variants that were not detected by GWAS. The inclusion of patients with a family history or early-onset PD in our cohort might facilitate the detection of susceptibility-associated variants, with deep genetic backgrounds. For example, mutations in GBA are more frequently identified in familial PD patients than in sporadic PD patients (43). However, our panel resequencing approach also has several disadvantages. This approach cannot be used to identify novel genes associated with PD and does not cover the majority of introns and transcriptional regulatory regions. The variants detected in this study may also be associated with sporadic PD, similar to GWASs that identified causative genes associated with sporadic PD that were previously reported to be causative genes for familial PD (SNCA, MAPT, and LRRK2) (44).
Our study includes the following limitations: (i) the sample size is too small to satisfy genome-wide significance, (ii) the lack of a second cohort to confirm our results, (iii) the possibility of sampling bias in the control group because the allele frequencies of variants in the public database were different from those identified in our healthy control group, (iv) the absence of any functional analysis to support our results, and (v) the lack of copy number variant evaluations.
We developed a new approach for surveying susceptibilityassociated variants by using targeted resequencing, which may represent an effective method for revealing hidden diseaseassociated variants. Further studies that include additional patients remain necessary to confirm the suitability of this approach for the identification of disease-associated variants.

DATA AVAILABILITY STATEMENT
The DNA sequence data of 1,102 participants used in this study are based on the informed consent of genetic testing from all participants according to the formal procedure approved by the Juntendo University School of Medicine Ethics Committee. However, some participants have refused to publish their DNA sequence data in public databases. Therefore, if the reader wishes to use the raw data used in this paper, please request directly to the corresponding author Manabu Funayama, funayama@juntendo.ac.jp.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Juntendo University School of Medicine Institutional Review Board (No. 2019227).
The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
KD designed and performed the experiments, analyzed the data, wrote the first manuscript, and revised the manuscript. MF designed the study, wrote the first manuscript, and revised the manuscript. YL, HY, AH, AI, KO, and KN performed the experiments, analyzed the data, and revised the manuscript. NH directed the research project and revised the manuscript. All authors contributed to the article and approved the submitted version.

FUNDING
This work was supported by JSPS KAKENHI (Grant Numbers 19K08003 to MF, 18K07536 to HY, 20K16504 to AI, 19K17047 to KO, 20K07893 to KN, and 18H04043 to NH), by the Brain/MINDS Beyond program from AMED (JP20dm0307024 to KO), and by the Advanced Genome Research and Bioinformatics Study to Facilitate Medical Innovation (GRIFIN) from AMED (20km0405206h0005 to NH).