Whole Exome Sequencing Study in Isolated South-Eastern Moravia (Czechia) Population Indicates Heterogenous Genetic Background for Parkinsonism Development

Parkinsonism belongs to the most common neurodegenerative disease. Genetic predisposition could be one of the significant risk factor for disease development. It has been described higher prevalence of parkinsonism in large pedigree from southeastern Moravia region. The study aims were to select accessible subfamily trios from the pedigree suitable for segregation genetic analyses to perform whole exome sequencing (WES) in trio individuals and further to evaluate genetic variants in the each trio. We used IonTorrent platform for WES for five subfamily trios (1–5). Each trio included two affected and one healthy person (as control). Found variants were filtered with respect to MAF < 1% (minor allele frequency), variants effect (based on prediction tools) and disease filter (Parkinsonism responsible genes). Finally, the variants from each trio were assessed with respect to the presence in the patients. There were found no one founder mutation in the subfamilies from the pedigree. Trio 1 shares two variants with trio 2:MC1R:c.322G > A (p.A108T) and MTCL1:c.1445C > T (p.A482V), trio 3 shares two variants with trio 5: DNAJC6:c.1817A > C (p.H606P) and HIVEP3:c.3856C > A (p.R1286W). In trios 4 and 5, there were found two variants in gene CSMD1:c.3335A > G (p.E1112G) and c.4071C > G (p.I1357M) respectively. As the most potentially damaging, we evaluated the non-shared variant SLC18A2:c.583G > A (p.G195S). The variant could affect dopamine transport in dopaminergic neurons. The study of the parkinsonism genetic background in isolated Moravian population suggested that there could be significant accumulation of many risk genetic factors. For verification of the variants influence, it would be appropriate to perform a more extensive population study and suitable functional analysis.

Parkinsonism belongs to the most common neurodegenerative disease. Genetic predisposition could be one of the significant risk factor for disease development. It has been described higher prevalence of parkinsonism in large pedigree from southeastern Moravia region. The study aims were to select accessible subfamily trios from the pedigree suitable for segregation genetic analyses to perform whole exome sequencing (WES) in trio individuals and further to evaluate genetic variants in the each trio. We used IonTorrent platform for WES for five subfamily trios (1-5). Each trio included two affected and one healthy person (as control). Found variants were filtered with respect to MAF < 1% (minor allele frequency), variants effect (based on prediction tools) and disease filter (Parkinsonism responsible genes). Finally, the variants from each trio were assessed with respect to the presence in the patients. There were found no one founder mutation in the subfamilies from the pedigree. Trio 1 shares two variants with trio 2:MC1R:c.322G > A (p.A108T) and MTCL1:c.1445C > T (p.A482V), trio 3 shares two variants with trio 5: DNAJC6:c.1817A > C (p.H606P) and HIVEP3:c.3856C > A (p.R1286W). In trios 4 and 5, there were found two variants in gene CSMD1:c.3335A > G (p.E1112G) and c.4071C > G (p.I1357M) respectively. As the most potentially damaging, we evaluated the non-shared variant SLC18A2:c.583G > A (p.G195S). The variant could affect dopamine transport in dopaminergic neurons. The study of the parkinsonism genetic background in isolated Moravian population suggested that there could be significant accumulation of many risk genetic factors. For verification of the variants influence, it would be appropriate to perform a more extensive population study and suitable functional analysis.
The next generation sequencing methods can reveal further genes that could be potential risk factors for parkinsonism.
Despite the many already described causal genes and mutations, genetic predisposition is still unclear in the most of patients. Therefore, it is suitable to change the method strategy from targeted to whole exome (WES) or genome sequencing. It would prepare a field for finding of novel causal or risk genes and variants. NOTCH4 (Yemni et al., 2019), TNK2, TNR (Farlow et al., 2016), NUS1 (Guo et al., 2018), and SORL1 (Xiromerisiou et al., 2021) belong to the potential candidate genes recently identified by WES. Combination of WES data and linkage analysis can be used for identification of novel candidate genes in many diseases (Gazal et al., 2016;Toma et al., 2020).
In our previous epidemiology study, we described higher prevalence of parkinsonism in southeastern Moravia region (Hornacko) compared with general population. This region includes 10 villages, where the local people have their own specific traditions (such as dances, folk art, local dialect and religion) and migration out of the region was rare. Due to many years of territorial and social isolation, it was hypothesized that the accumulation of genetic factors may contribute to higher prevalence of PD in the region (Mensikova et al., 2013).
Thanks to our detailed study, 11 generation pedigree from the Hornacko was compiled with the help of witnesses, registry offices and local general practitioners (Figure 1; Mensikova et al., 2013). Based on that, we were looking for patients from pedigree to receive material for genetic analysis. It was possible to select 5 family trios (two affected and one healthy individual) in subfamilies from the large pedigree.
The study aims were to choose accessible trios from the large pedigree suitable for segregation genetic analyses and perform WES and to call and evaluate variants using two software (Ion Reporter and Ingenuity Variant Analysis) and filtering based on genes association with the disease (parkinsonism and other neurodegenerative diseases) and variants co-occurrence in the patients within particular trio and across the whole pedigree.

MATERIALS AND METHODS
The study was approved by the Ethics Committee of the Palacký University and University Hospital Olomouc, Czechia. The patients were informed in detail about the study and they all signed informed consent. In our study, 10 patients (8 females and 2 males) and 5 unaffected individuals (3 females and 2 males) were included. The average age of female patients was 67 ± 12.2 years and the average age of male patients was 71 ± 14.1 years. The youngest patient was 56 years and the oldest was 88 years. The average age of controls was 71.8 ± 14.9 years. The youngest control was 51 and the oldest was 87 years. The each trio was composed from 3 family members: two patients and one healthy individual (case assessment is described in Mensikova et al., 2014). We assume autosomal dominant inheritance with reduced penetrance and variable expressivity. The relationships in individual trios: trio 1 (patient number 1 is mother; number 2 is her daughter; control is mother's brother), trio 2 (patient number 3 is father; number 4 is his daughter; control is father's brother), trio 3 (patients number 5 and 6 are siblings; control is healthy mother), trio 4 (patient number 7 is mother; number 8 is her daughter; control is mother's brother), trio 5 (patient number 9 is daughter; number 10 is her mother; control is FIGURE 1 | Pedigree of family from Hornacko. Clear circle/square sign living unaffected female/male; black circle/square sign living affected female/male; symbol with a diagonal line is for deceased individual. Highlighted individuals were accessible for WES. healthy son). The detailed demographic data are in Table 1. Patients clinical data are described in Table 2. The DNA was isolated in all patients and controls from peripheral blood using salting out method (Miller et al., 1988). WES was performed by commercial company (SEQme, s.r.o., Dobris, Czechia) on Ion Torrent platform. Libraries were prepared using Ion Ampliseq Exome kit (Ion AmpliSeq 2.0 Library, according to manual). Emulsion PCR was done with template kit Ion PI Hi-Q OT2 200. Samples were barcoded to enable to load 3 samples on one Ion PI TM Chip. For sequencing, Ion PI Hi-Q Sequencing 200 kit was used. As reference genome was determinated GRCh37. Sequencing data process includes two parts. For the first part is used Torrent Suite server, where are loaded raw data from sequencer. Raw data (received on the basis of pH change) are converted to single number per well per flow. The next step is base caller, when converted data are translated into base sequence into an unaligned BAM file. For alignment step, there is used Torrent Mapping Alignment Program which performs mapping against reference sequence and it creates BAM files. The second part includes uploading BAM files to Ion Reporter, where is performed variant calling anotation and variants filtering (Vodicka et al., 2020).
The first step of data analysis was selection of variants common only in affected individuals in each trio. All found variants were evaluated and filtered out by two independent software: 1. Ion Reporter -minor allele frequency (  Data about age of patients and controls in individual trios, patients' number in our study and its sex is given in brackets. The most important (potentially risk or pathogenic) variants were confirmed by Sanger sequencing.

RESULTS
In all 15 samples, 99% of targets were covered 1-20× and 90% of targets were covered more than 20×. Average analyzed variants number was about 70,000 with mean depth about 75 (the number of variant in each trio is described in Table 3). The variants potentially associated with neurodegenerative disorders (rare, undescribed, evolutionary conserved and variants assessed by at least one prediction tool as damaging) are described in Table 4. Moreover, we found sharing of some variants within individual trios across the pedigree. In the trio 1 and trio 2 were found two variants in gene

DISCUSSION
In our whole exome study, we did not find any founder mutation across the large pedigree. There were analyzed the gene regions known to be associated with parkinsonism.
Further, we found some rare variants which were found in more than one trio and which could contribute to development of neurodegeneration disorders.
MC1R gene (OMIM 155555) encodes melanocortin 1 receptor, it is important gene for pigmentation. Study Marti et al. (2015) described MC1R gene as a risk factor for PD and association of the variant p.R160W and PD in Spain population (Marti et al., 2015). But this finding was not confirmed (Gan-Or et al., 2016). Individuals with light hair and homozygotes for the variant p.R151C have higher risk for PD compared with wild type (Gao et al., 2009). The found variant c.322G > A was evaluated as benign according to used prediction tools and the genomic site is phylogenetic unconserved. The missense variant leads to exchange hydrophobic to polar amino acid. The variant has not yet been described.
MTCL1 gene (OMIM 615766) encodes protein which is important for microtubule bundles and it interacts with MARK2 (Sato et al., 2013). MARK2 kinase affected affinity of tau protein for microtubules by its phosphorylation (Schwalbe et al., 2013).
According to prediction tools, the found variant c.1445C > T is benign, but its frequency is very rare in population. The genomic site is weakly phylogenetic conserved. There has not been any publication about the variant association with neurodegenerative disorders.
HIVEP3 gene (OMIM 606649) encodes protein included in HIV enhancer-binding protein family. It can change transcription via the κB enhancer motif (Allen et al., 2002). In HIVs patients was described decreased levels of cerebrospinal fluid dopamine (Lopez et al., 1999).
The found variant c.3856C > A was evaluated as benign according to prediction tools LRT and Mutation Assessor, but SIFT evaluated it as damaging. The variant leads to exchange positive charged amino acid to hydrophobic. The genomic site is phylogenetic conserved. Thus, the variant c.3856C > A could affect the protein function.
The found variant c.1817A > C is located in genomic site which is phylogenetic conserved. The variant leads to exchange basic polar amino acid to non-polar. Prediction tools SIFT and Provean evaluated it as benign. The variant frequency is very rare in population.
CSMD1 gene (OMIM 608397) is associated with schizophrenia risk (Håvik et al., 2011). It is synthetized in developing CNS (central nervous system) and epithelial tissues. The protein is important for complement activation and inflammation in developing CNS (Kraus et al., 2006). Based on WES, CSMD1 gene was described in association with familial PD (Ruiz-Martínez et al., 2017).
The found undescribed variant c.3335A > G is located in high phylogenetic conserved site and it leads to exchange acid to neutral amino acid. All of used prediction tools evaluated the variant as damaging. Based on that, the variant could leads to affect protein function.
Other found variant c.4071C > G leads to exchange within hydrophobic amino acid in weakly conserved genomic site. The prediction tool Provean evaluated the variant as benign, but LRT and SIFT evaluated it as damaging. Individual variants found in trios 1-5. Coordinate is particular location in genome (related to reference genome GRCh37), MAF 1000GP/ExAC/gnomAD is the minor allele frequency according to 1,000 Genome Project, ExAC (The Exome Aggregation Consortium), gnomAD (The Genome Aggregation Database). AA change is amino acid change, rs ID is the identifier for the described variant, PhyloP score evaluates phylogenetic conservation of particular site in the genome. The colored variants are shared between the individual trios. LRT, SIFT, Provean and Mutation Assesor are prediction tools for variants evaluation. The bold value scores indicate pathogenic variants evaluation. Symbol "-" is used for missing data.
Frontiers in Neuroscience | www.frontiersin.org SLC18A2 gene (OMIM 193001, also known as VMAT2) encodes ATPase antiporter transmitting monoamines-serotonin, dopamine and norepinephrine into vesicles to transport them out of cell (Liu and Edwards, 1997). Increased cytosolic dopamine and its metabolites are neurotoxic for neuronal cells (reduction leads to neuroprotection) (Mosharov et al., 2009). Brighina et al. (2013) described two SNPs in the VMAT2 promoter region in connection with a reduced risk of PD. It is assumed that increased levels of VMAT2 contribute to protection against the disease (Brighina et al., 2013).
The variant c.583G > A is located in high phylogenetic conserved genomic site and it is very rare in population. It leads to exchange hydrophobic to polar amino acid. All of used prediction tools evaluated it as damaging. In project GnomAD exomes, there were no found homozygous allele, it could indicates likely pathogenic effect (according to ACMG Classification, criteria for classifying pathogenic variants -PMS2 rule). The variant is located in disulfide bond domain, which is important for efficient monoamine transport (Thiriot et al., 2002). We assume that the variant could affect dopamine transport from dopaminergic cells and expose the cells to dopamine cytotoxicity.

CONCLUSION
The WES could contribute to the finding of variants responsible for development of many diseases. Generally, the evaluation of variants from the WES using different prediction tools is often not uniform and the final assessment should be taken with caution and in combination with functional assays or segregation analysis. However, even one prediction tool, strong evolution conservation or population rarity could indicate and highlight potentially risk variant.
Based on the study result, we suppose heterogenous genetic background in the development of parkinsonism in Hornacko region. According to the prediction tools, the most interesting variant seems to be SLC18A2:NM_003054.4 c.583G > A (p.G195S, rs148458078).
Our study is limited by amount of samples and it is not possible to exclude the effect of other genetic causes that were not detected by used method and filter setting. The WES cannot capture intronic variants and large genomic rearrangements. The larger population study is necessary for verification of our results.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://www.ncbi.nlm. nih.gov/bioproject/785725.