Regulatory SVA retrotransposons and classical HLA genotyped-transcripts associated with Parkinson’s disease

Introduction Parkinson’s disease (PD) is a neurodegenerative and polygenic disorder characterised by the progressive loss of neural dopamine and onset of movement disorders. We previously described eight SINE-VNTR-Alu (SVA) retrotransposon-insertion-polymorphisms (RIPs) located and expressed within the Human Leucocyte Antigen (HLA) genomic region of chromosome 6 that modulate the differential co-expression of 71 different genes including the HLA classical class I and class II genes in a Parkinson’s Progression Markers Initiative (PPMI) cohort. Aims and methods In the present study, we (1) reanalysed the PPMI genomic and transcriptomic sequencing data obtained from whole blood of 1521 individuals (867 cases and 654 controls) to infer the genotypes of the transcripts expressed by eight classical HLA class I and class II genes as well as DRA and the DRB3/4/5 haplotypes, and (2) examined the statistical differences between three different PD subgroups (cases) and healthy controls (HC) for the HLA and SVA transcribed genotypes and inferred haplotypes. Results Significant differences for 57 expressed HLA alleles (21 HLA class I and 36 HLA class II alleles) up to the three-field resolution and four of eight expressed SVA were detected at p<0.05 by the Fisher’s exact test within one or other of three different PD subgroups (750 individuals with PD, 57 prodromes, 60 individuals who had scans without evidence of dopamine deficits [SWEDD]), when compared against a group of 654 HCs within the PPMI cohort and when not corrected by the Bonferroni test for multiple comparisons. Fourteen of 20 significant alleles were unique to the PD-HC comparison, whereas 31 of the 57 alleles overlapped between two or more different subgroup comparisons. Only the expressed HLA-DRA*01:01:01 and -DQA1*03:01:01 protective alleles (PD v HC), the -DQA1*03:03:01 risk (HC v Prodrome) or protective allele (PD v Prodrome), the -DRA*01:01:02 and -DRB4*01:03:02 risk alleles (SWEDD v HC), and the NR_SVA_381 present genotype (PD v HC) at a 5% homozygous insertion frequency near HLA-DPA1, were significant (Pc<0.1) after Bonferroni corrections. The homologous NR_SVA_381 insertion significantly decreased the transcription levels of HLA-DPA1 and HLA-DPB1 in the PPMI cohort and its presence as a homozygous genotype is a risk factor (Pc=0.012) for PD. The most frequent NR_SVA_381 insertion haplotype in the PPMI cohort was NR_SVA_381/DPA1*02/DPB1*01 (3.7%). Although HLA C*07/B*07/DRB5*01/DRB1*15/DQB1*06 was the most frequent HLA 5-loci phased-haplotype (n, 76) in the PPMI cohort, the NR_SVA_381 insertion was present in only six of them (8%). Conclusions These data suggest that expressed SVA and HLA gene alleles in circulating white blood cells are coordinated differentially in the regulation of immune responses and the long-term onset and progression of PD, the mechanisms of which have yet to be elucidated.

Introduction: Parkinson's disease (PD) is a neurodegenerative and polygenic disorder characterised by the progressive loss of neural dopamine and onset of movement disorders.We previously described eight SINE-VNTR-Alu (SVA) retrotransposon-insertion-polymorphisms (RIPs) located and expressed within the Human Leucocyte Antigen (HLA) genomic region of chromosome 6 that modulate the differential co-expression of 71 different genes including the HLA classical class I and class II genes in a Parkinson's Progression Markers Initiative (PPMI) cohort.

Aims and methods:
In the present study, we (1) reanalysed the PPMI genomic and transcriptomic sequencing data obtained from whole blood of 1521 individuals (867 cases and 654 controls) to infer the genotypes of the transcripts expressed by eight classical HLA class I and class II genes as well as DRA and the DRB3/4/5 haplotypes, and (2) examined the statistical differences between three different PD subgroups (cases) and healthy controls (HC) for the HLA and SVA transcribed genotypes and inferred haplotypes.
Results: Significant differences for 57 expressed HLA alleles (21 HLA class I and 36 HLA class II alleles) up to the three-field resolution and four of eight expressed SVA were detected at p<0.05 by the Fisher's exact test within one or other of three different PD subgroups (750 individuals with PD, 57 prodromes, 60 individuals who had scans without evidence of dopamine deficits [SWEDD]), when compared against a group of 654 HCs within the PPMI cohort and when not corrected by the Bonferroni test for multiple comparisons.Fourteen of 20 significant alleles were unique to the PD-HC comparison, whereas 31 of the 57 alleles overlapped between two or more different subgroup comparisons.Only the expressed HLA-DRA*01:01:01 and -DQA1*03:01:01 protective alleles (PD v HC), the -DQA1*03:03:01 risk (HC v Prodrome) or protective allele (PD v Prodrome), the -DRA*01:01:02 and -DRB4*01:03:02 risk alleles (SWEDD v HC), and the NR_SVA_381 present genotype (PD v HC) at a 5% homozygous insertion frequency near HLA-DPA1, were significant (Pc<0.1)after Bonferroni corrections.The homologous NR_SVA_381 insertion significantly decreased the transcription levels of HLA-DPA1 and HLA-DPB1 in the PPMI cohort and its presence as a homozygous genotype is a risk factor (Pc=0.012) for PD.The most frequent

Introduction
Parkinson's disease (PD), familial and sporadic, is the second most common human neurodegenerative disease after Alzheimer's disease with almost 90,000 people in the USA diagnosed each year, and a 2019 world-wide prevalence rate of 8.5 million individuals that is increasing (1).PD pathology is age-related and characterised by progressive degeneration of dopaminergic neurons in the substantia nigra and other brainstem nuclei, with accumulation of tau and alpha-synuclein deposits (Lewy body inclusions) throughout the peripheral and central nervous systems (2)(3)(4)(5).Essential differential observations accompanying PD subtypes include loss of dopamine, bradykinesia (movement disorders), rigidity, tremor and a range of non-motor symptoms such as cognitive impairment and sleep disturbance (6,7).The primary and secondary causes of PD may involve genetic, environmental, metabolic and immunological factors with various nonneurological features and varying overlap with age-related autoimmune diseases such as multiple sclerosis, amyotrophic lateral sclerosis, thyroid diseases and rheumatoid myalgia (3,(8)(9)(10)(11).In regard to the effect of the environment and immunogenetics, Braak et al. (12) postulated that an unknown viral or bacterial infection in the neurons of the gut and/or nasal cavity initiated the onset of sporadic PD with specific alpha-synuclein spreading and eventual Lewy body formation and glial neuroinflammatory activation.Considerable preclinical, clinical and laboratory evidence supports Braak's hypothesis of PD progression, although the specific mechanisms, stages and pathways still have to be elucidated (13,14).Recent animal in vitro studies and human neuropathological examinations suggest that neuronal antigen presentation may have a role in PD and other neurodegenerative disorders (15).
Although the aetiology of sporadic PD remains unknown, the immune system has an important role in this disease (3,(8)(9)(10)(11).The protective effect of nonsteroidal anti-inflammatory drugs in animal models and epidemiological studies underscores the role of neuroinflammation in PD (16).Large numbers of microglia expressing human leucocyte antigen (HLA)-DR have been detected in the brain of PD patients, particularly in areas of maximal neurodegeneration (15,17).Leucine-rich repeat kinase 2 (LRRK2), a risk gene of PD, is highly expressed in microglia, monocytes and other immune cells (18), and has been reported to be associated with an increasing risk of Crohn's disease, an inflammatory bowel disease and other autoimmune diseases (19-21).Alpha-synuclein specific T cell reactivity is associated with HLA-DRB1*15:01 and -DRB5*01:01 (22, 23), and with preclinical and early PD (24,25), and the infiltration of CD4+ lymphocytes into the brain contributes to neurodegeneration in a mouse model of PD (15,26).At least 90 genetic loci have been associated with PD risk in genome-wide association studies (GWAS), including the HLA -DR A, -DRB, and -DQ genes within the Major Histocompatibility Complex (MHC) class II region on the short arm of chromosome 6 at 6p21.3 (27,28).
HLA class I and class II molecules are polymorphic cellmembrane-bound glycoproteins that present antigens to circulating CD8+ and CD4+ T-lymphocytes, and regulate the innate and adaptive immune responses including autoimmunity, infectious diseases and transplantation outcomes (29-31).The MHC or HLA genomic region encodes at least 160 genes within ~3 to 4 MB including three distinct structural regions designated as class I, class II and class III.Of the 32 HLA genes, the classical HLA class I genes, HLA-A, -B and -C, and the classical HLA class II genes, HLA-DR, -DQ and -DP, are characterised by an extraordinary large number of polymorphisms, whereas the nonclassical HLA class I genes, such as HLA-E, -F and -G, are differentiated by their tissue-specific expression and limited polymorphism (32,33).Several GWASs have shown an association between the HLA locus and the risk of PD especially involving the HLA class II gene SNPs of HLA-DQA1, -DQA2, -DQB1, -DRB1, and -DRB5 (27,(34)(35)(36).
Most studies of PD association with HLA class I and class II alleles are limited in scope and power mainly because of small sample numbers and limited resolution of HLA typing methods.Studies with more than 500 PD cases suggest that HLA genes have a role in risk or protection in PD progression.The study by Saiki et al. (37) (39).More studies of the association between HLA genotypes and PD are needed to understand the role of HLA in the disease processes of PD and how HLA genes and alleles might be interlinked with accompanying autoimmune diseases, especially those that show non-neurological symptoms associated with PD such as sleep disorder and a decrease in HLA-DR expression (40).
Apart from protein coding genes, numerous repeat elements (REs) within the human genome have been associated with PD including SINE-R-VNTR-Alu (SVA) retrotransposon insertion polymorphisms (RIPs), such as a SVA that is inserted in the TAF1 gene that has been associated with the disease X-linked dystoniaparkinsonism, and at least five other SVA inserted within the PD (PARK) gene loci of different chromosomes (41,42).Recently, expression quantitative trait loci (eQTL) of different SVAs and their effect on the regulation of gene expression were identified and described for a Parkinson's Progression Markers Initiative (PPMI) cohort using whole genome sequence and transcriptome data obtained from the blood of more than a thousand individuals (43).Also, there are SVAs within the MHC genomic region that are expressed and can regulate the expression of HLA genes (44).At least eighteen SVA polymorphic insertions were mapped previously within the MHC class I, II and III regions, and some were found to be haplotypic or haplospecific for particular HLA gene alleles that varied in frequency between European, Japanese and African American populations (45).For example, the SVA-HF, SVA-HA, and SVA-HC were inserted at a relatively low frequency (<0.2) in European populations and strongly associated with the HLA 7.1 ancestral haplotype, but not with the 8.1 haplotype (46,47).
A PPMI clinical protocol was established in 2010 to acquire comprehensive longitudinal within-participant clinical, imaging, genomic, transcriptomic and biomarker data for three main cohorts, (1) PD with and without genetic risk variants, (2) prodromes (nonmotor features) at risk of PD, and (3) healthy controls with no neurological disorder and no first degree relative, currently aimed at enrolling 4000 participants at about 50 sites worldwide (48).We associated the regulatory properties of 8 SVA RIPs located within the class I and class II MHC regions of the PPMI cohort with the differential co-expression of 71 genes within and 75 genes outside of the MHC region, including all the classical class I and class II genes (44).A limitation of this SVA-HLA eQTL study was the absence of HLA allelic data to associate with the SVA genotypes and for stratifying the statistical differences between PD, prodromes and healthy controls within the PPMI cohort.
The purpose of our current study was to undertake an analysis of the expression of ten classical class I (HLA-A, -B, -C) and class II (HLA-DRA, -DRB3/4/5 -DRB1, -DQB1, -DQA1, -DPA1, -DPB1) gene alleles in the context of the eight regulatory SVA RIPs expressed within the MHC genomic region (Figure 1) that we had previously studied (44).The main aims of this study were to determine: (a) the prevalence of the expressed HLA classical alleles and inferred haplotypes for the entire PPMI cohort, cases and controls.
(b) the HLA allelic and haplotypic statistical differences between PD, healthy controls (HC), prodromal PD and scans without evidence of dopamine deficits (SWEDD).
(c) the inferred SVA haplotypes and their association with HLA gene alleles.
Our RNA data analysis confirms that SVAs are eQTLs for classical HLA class I and class II alleles, and suggests that coordinated SVA and HLA gene expression might influence PD onset or progression via the adaptive immune system.

Parkinson's progression markers initiative datasets
The PPMI and database is an ongoing longitudinal, observational, multicentre study of PD with an overall goal to identify biological and genetic markers of disease progression, accelerate therapeutic trials and reduce progression of PD

SVA and HLA genotypes
Regulatory effects of SVA on HLA transcription levels were inferred statistically by eQTL analysis using the Matrix eQTL software (49) and described previously (44).Fastq files of wholeblood RNAseq were downloaded from the PPMI database and the referenced SVA (R_SVA) and non-referenced SVA (NR_SVA) (Figure 1) were located, genotyped and identified within or outside the MHC genomic region with the assistance of the software tools, Delly2 structural variant caller and the transcript counters Salmon and DESeq2, as previously described (41,44).All the transcripts' of 1521 individuals downloaded as PPMI blood RNAseq.bamfiles were used to identify the genotypes of ten classical class I and class II HLA genes using the arcasHLA software tool described by Orenbach et al. (50).DRB3, DRB4 and DRB5 were counted as a single locus or gene, including the designated 'DRB3DRB4DRB5 absent', which is the haplotype with no DRB3, DRB4, DRB5 locus.The HLA transcripts were 'genotyped' at least to the three-field resolution (eg., A*02:01:01) whereby the first field represents the ancestral allele group (e.g., A*02), the second field represents protein type and the third field represents synonymous changes in coding regions.

Statistical analysis
The p-values, odds ratios (OR), and 95% confidence intervals (CI) were calculated using Fisher's exact test using R software (R version 4.1.3).For multiple testing, the Bonferroni correction was applied, and the observed p-values were adjusted by multiplying them by the number of alleles at each HLA locus to obtain Pc values (Bonferroni-corrected Pc-values).The estimation of haplotypes was performed using the PHASE program v2.1.1 (51) and are referred to in this study as phased-haplotypes.

Common medical disorders associated between PD and HC
The aetiology of PD appears to be multifactorial involving aging, genetics, environmental factors (9), reflected by other inflammation-related disorders or autoimmune diseases (20,52,53).Table 1 shows a list of common diseases or disorders in 318 PD patients and 264 healthy controls (free of PD) in a subset of the PPMI cohort.The most significant risk factors (Pc<0.1)associated with PD in this subset of PD patients (average age of 61 years) is scoliosis (n, 9 v 0), and sleep disturbances (n, 72 v 33).Thyroid disease including hyperthyroidism is a risk factor in 75 of the PD patients by the Fisher's exact test with a p-value of 0.012.

HLA genotyped transcripts and statistical associations within casecontrol comparisons
The HLA genotypes of ten classical class I and class II genes of 1521 individuals within the PPMI cohort inferred from the transcription data are presented in Supplementary Table 1, and their overall frequencies are shown in Supplementary Table 2.The top six HLA-A, -B, -C, -DRB1, -DQB1, -DQA1, -DPA1, -DPB1 allele frequencies in the PPMI cohort are shown in Table 2, confirming that the PPMI cohort consists mostly of white European or North American ancestry (33).
The significant differences at p<0.05 detected by the Fisher's exact test for 20 different HLA alleles (7 HLA class I and 13 HLA class II alleles) up to the three-field resolution in the statistical comparison between HC (n, 654) and PD (n, 750) are shown in Table 3    after Bonferroni correction for multiple testing.Some notable allelic differences between PD and HC at the p<0.The HLA alleles frequencies in the PD group (n, 750) compared statistically against those in the Prodrome (n, 57) and SWEDD (n, 60) groups are shown in Table 4.There are 11 and 17 allelic differences at p<0.05 in the PD-Prodrome, and PD-SWEDD comparisons respectively, but only the protective HLA-DQA1*03:03:01 in the PD-Prodrome comparison is significant (Pc=0.066)after the Bonferroni correction.Although the expressed HLA-DRA*01:01:01 and -DQA1*03:01:01 are protective alleles in the PD-HC comparison at Pc<0.1 (Table 3), HLA-DQA1*03:01:01 is significant only at the p<0.05 level and HLA-DRA*01:01:01 is not significant (p>0.05) in the SWEDD-HC comparison (Table 4).Neither HLA-DRA*01:01:01 nor -DQA1*03:01:01 is significant (p>0.05) in the PD-Prodrome comparison (Table 4).
The significant differences at p<0.05 for 43 HLA alleles (25 HLA class I and 38 HLA class II alleles) up to the three-field resolution in two comparisons between HC and the two PD subgroups, Prodrome (A) and SWEDD (B), are shown in Table 5.There are 20 different alleles (18 risk and 2 protective) in the Prodrome-HC, and 23 (18   The statistical analyses of the HLA allele frequency differences at the p<0.05 level of significance show that the PD, Prodrome and SWEDD subgroups are markedly different from each other within the PPMI cohort (Tables 3-5).There are 57 significantly (p<0.05)different HLA alleles (21 class I and 36 class II) in the five statistical comparisons between the different PD subgroups (Tables 3-5).Twenty-six (9 class I and 17 class II) of the 57 different alleles are limited to a single subgroup comparison, mainly in the HC-PD (14 of 20 alleles), HC-Prodrome (6 of 20 alleles), HC-SWEDD (4 of 23 alleles) and PD-Prodrome (2 of 11 alleles) comparisons, whereas thirty-one (12 class I and 19 class II) of the 57 alleles overlap between two or more different subgroup comparisons (Supplementary Table 3).In addition, there are more protective alleles in the HC group than risk alleles in the PD group at a ratio of 12 to 8 (60%) in the PD-HC comparison (Table 3), whereas the Prodrome and SWEDD comparisons with HC have more risk alleles than protective alleles at ratios of 18 to 2 (90%), and 18 to 5 (78%), respectively (Table 5).In a statistical comparison between the HC group (n, 654) and the combined PD subgroups (PD, Prodrome and SWEDD, [n, 867]), presented in Supplementary Table 4, there are 17 risk and 15 protective HLA alleles (12 class I and 20 class II) with 8 of the 32 significant alleles (p<0.05)present only in this analysis, whereas the other 24 alleles are present in at least one of the other statistical comparisons (Tables 3-5).In this analysis, the HLA-DRA*01:01:01 protective allele is significant (P=0.0223)after a Bonferroni correction.
In summary, only five of the expressed HLA alleles shown in Tables 3

SVA genotyped transcripts and phasedhaplotypes within casecontrol comparisons
The eQTL SVA transcripts expressed at eight MHC loci (NR_SVA_377, R_SVA_24, R_SVA_25, R_SVA_26, NR_SVA_380, R_SVA_27, R_SVA_85, NR_SVA_381) that are shown in Figure 1   Significant differences for SVA genotypes were detected at p<0.05 by the Fisher's exact test between different subgroups (PD, Prodrome, SWEDD and HC) within the PPMI cohort for only four (R_SVA_25, NR_SVA_380, R_SVA_85, and NR_SVA_381) of the eight SVAs (Table 6).R_SVA_25 when absent (A) on both chromosomes is a homozygous AA referred to as the R_SVA_25 AA genotype.In the PD-SWEDD comparison, the R_SVA_25 AA genotype is a PD risk, whereas the R_SVA_25 PA genotype is protective.The homozygous presence (PP) of the NR_SVA_381 insertion is a significant risk in the PD-HC and HC-Combination comparisons both at the p<0.05 and Pc<0.1 levels, but only at the p<0.05 level in the HC-Prodrome comparison.

SVA and HLA phased-haplotypes within case-control comparisons
Phased haplotypes and statistical analysis (Fisher's exact test, pvalue; and Bonferroni correction, Pc-value) of HLA genotypes at 10loci and SVA genotypes at 8-loci are listed in Supplementary Table 7.In this haplotype analysis of 18 loci, we used the genotype data of only 1165 (66%) of the 1521 individuals because of missing or uncertain data at one or more loci in the excluded 365 cases.Of the 1540 different phased haplotypes (66%) from a total of 2330 haplotypes in this analysis, only six are significantly different between PD and HC at p<0.05 (Figure 2).However, none of these pvalues are significant when corrected by Bonferroni for multiple testing.The two most frequent HLA/SVA haplotypes shown in Figure 2 and listed in Supplementary Table 7 are phased-haplotype ID-797 (n, 37, 2.4%) and phased-haplotype ID-105 (n, 35, 2.3%).Moreover, there is only one risk haplotype (ID-100, n, 23, 1.5%) detected by the Fisher's exact test at p=0.025 that is more frequent in PD than HC.In this haplotype, the high frequency R_SVA_85 is absent (A) and the low frequency SVA_381 is present (P).The highrisk HLA haplotype reported by Wissemann et al. (35) with the B*07:02/C*07:02/DRB5*01/DRB1*15:01/DQA1*01:02/DQB1*06:02 alleles is split in our study between 30 different haplotypes by including the HLA-A, -DPA, -DPB alleles, and SVA genotypes, and therefore was not significant (Supplementary Table 8).In our study, the protective HLA-DRB1*04:04 allele reported by Wissemann et al. (35) is part of the 'protective' HLA/SVA phased-haplotype ID-1258 (Supplementary Table 7).
The SVAs that associated with the HLA allele groups at >73% are listed in Table 7.For example, in the MHC class I region, the low frequency NR_SVA_377 (6.9%) is associated almost exclusively with the HLA-A*11 allele group, whereas the moderately low frequency R_SVA_24 (26.5%) is associated mostly with three  The high frequency R_SVA_85 insertion (82.2%) is associated strongly with three DPA1 allele groups at 99.9% or 100% and with at least 13 DPB1 allelic groups at >96%.The relatively low frequency NR_SVA_381 (19.8%) that is significantly more prevalent in PD than healthy controls (Table 6) is strongly associated with the HLA-DPA1*02 and -DPA1*04 allele lineages, and with at least 9 HLA-DPB1 allele lineages (Table 7).Of these HLA-DPB1 allele groups, DPB1*01, DPB1*10, and DPB1*14 appear to imply a disease risk based on p<0.05 and high OR values >1 (Tables 3-5).
The  3-5) and NR_SVA_380 (Table 6) are significant (p<0.05) in the PPMI cohort subgroup comparisons.In this regard, on the basis of the phase-haplotype inferences, we constructed twelve       haplotypes of NR_SVA_380, and HLA-DRB1, -DQA1 and -DQB1 alleles to estimate their frequency and overall pattern of distribution (Table 8).There are 206 NR_SVA_380 insertions associated with 395 DRB1/DQA1/DQB1 haplotypes at 52.2%.

Modulation of HLA-DPA1 and -DPB1 by two MHC SVA RIPs in the PPMI cohort
Figure 4 shows box plots of the possible effects of R_SVA_85 and NR_SVA_381 on the expression of HLA-DPA1 and -DPB1 transcription.Homozygous R_SVA_85 insertion (PP) significantly increases (p=0.023) the transcription of HLA-DPB1, but has no significant effect (p=0.35) on the transcription of HLA-DPA1.The absence of R_SVA_85 appears to be a risk factor for the Prodrome cohort (Table 6).In contrast, homologous NR_SVA_381 insertion (PP) (Table 6) significantly decreases the transcription levels of HLA-DPA1 (p=0.037) and HLA-DPB1 (p=0.001) in the PPMI cohort, and its presence as a homozygous genotype (PP) is a risk factor (Pc=0.012) for PD (Table 6A; Figure 4).

Discussion
The regulatory effects of eight transcribed SVA RIPs on the differential co-expression of 71 genes within the MHC genomic region including all the classical class I and class II genes of a PPMI cohort were previously identified by eQTL statistical analysis (41,43,44).In this study, the same PPMI RNAseq database was reused to genotype the transcripts encoded by classical class I and class II HLA genes in order to determine their frequency and estimate their haplotypic associations with each other and with the eight regulatory MHC SVAs.The arcasHLA software tool (50) was used to impute the genotypes of the transcripts expressed by the class I and class II HLA genes to at least the three-field resolution that included the ancestral allele group, the protein type and the synonymous changes in the coding regions.In a recent comparison of the seven best of 22 genotyping computation tools, arcasHLA was the fastest and among the top three most accurate (99.1% for MHC-I and 98.1% for MHC-II) for genotyping Caucasian American RNA data (56).The PPMI cohort HLA allele frequency and haplotype data confirmed that our cohort was mostly (>95%) Caucasian American or Caucasian European as expected (33,48).Consequently, we have accepted the high accuracy and reliability of the arcasHLA imputations without resorting to the use of other genotyping tools.
Significant differences were detected at p<0.05 by the Fisher's exact test for 21 HLA class I alleles and 36 HLA class II alleles transcribed by 10 HLA genes that were different up to the three-field resolution within four subgroups (PD, Prodrome, SWEDD and HC) of the PPMI cohort when not corrected by multiple testing    2).The 52 alleles that did not survive the Bonferroni statistical challenge, but had significant differences p<0.05 between the different cases and controls by the Fisher exact test were placed within a statistically marginal zone of 'possible' rather than 'strong' or 'definite' risk or protective effects.This lower level of statistical significance might have been confounded by various factors such as lack of statistical power due to insufficient sample numbers, unreliable disease and aetiological factors, or various comorbidities and other issues not accounted for in our analysis.However, many of the 57 possible protective or susceptibility HLA alleles (p<0.05 or Pc<0.1) were reported previously by others to be statistically significant in PD and various autoimmune disease studies.In our study, there was an overall greater number of possible protective alleles than risk alleles at a ratio of 12 to 8 (60%) in the PD group compared to healthy controls, whereas the Prodrome and SWEDD comparisons with HC had more possible risk alleles than protective alleles at ratios of 18 to 2 (90%), and 18 to 5 (78%), respectively.This greater ratio of HLA risk to protective alleles in Prodrome and SWEDD compared to HC might in part explain the gradual or variable progression to PD. Idiopathic, spasmodic and prodromal PD groups have a mixed population of different HLA haplotypes and HLA alleles that carry and present various peptides and antigens to T lymphocytes, which in turn are activated to regulate a diversity of immune responses including inappropriate and harmful autoimmune responses that can cause extensive tissue damage.In this study, we could not discern easily, which are the   The statistical results for the HLA-DRB1*04 alleles in previous studies of PD suggested both susceptibility (36) and protective associations (34,35,(37)(38)(39).Our results revealed that the HLA-DRB1*04 alleles were significant statistically at the 'possible' level (uncorrected Fisher's exact test, p<0.05) with HLA-DRB1*04:02 and -DRB1*04:04 protective in PD and PPMI (Table 3), whereas HLA-DRB1*04:01 was a possible risk allele in the Prodrome and SWEDD groups, and the PPMI cohort (Table 5).This difference between the PD, Prodrome, and SWEDD groups might reflect that neither the Prodrome, nor SWEDD groups were an established PD with as yet degenerated dopaminergic neurons and large aggregates of alphasynuclein or tau proteins (48).A recent study by Mignon et al. (61), reported in a preprint, suggested that HLA-DRB1*04 alleles strongly bound to an epitope sequence of tau in neurofibrillary tangles and mediated an adaptive immune response against tau to decrease PD risk.This protective effect of the HLA-DRB1*04 antigen was intermediary with HLA-DRB1*04:01 and HLA-DRB1*04:03, and absent for HLA-DRB1*04:05 (61), which might explain in part the differentiated statistical results that we obtained for the HLA-DRB1*04 alleles in the PPMI cohort (Tables 3-5 that, along with DRB1*01:01, has the 'shared epitope' (SE) with the amino acid motif Q/RK/RRAA at positions 70-74 in combination with valine at position 11 (11-V) that are highly protective in PD.In our study, HLA-DRB1*01:02 was a possible protective allele within the combined cohort-healthy controls comparison.
Four possible HLA risk alleles, HLA-DRB5*01:01, -DRB1*04:01, -DRB1*15:01, and -DQB1*03:01, are of particular interest because they have been associated with alpha-synuclein specific T cell reactivity in patients with PD (23)(24)(25).In this regard, Ozono et al. (62) showed experimentally that HLA class II molecules with the DRB5*01:01 allele captured and transported conformationally abnormal alpha-synuclein extracellularly, whereas HLA-DRB1*04:01 transported normal alpha-synuclein to the cell surface to present to circulating CD4-positive T cells, but did not translocate structurally abnormal alpha-synuclein.Moreover, alpha-synuclein32-46 peptide immunisation of mice that expressed HLA-DRB1*15:01 triggered intestinal inflammation, enteric neurodegeneration, constipation, and weight loss (22), suggesting a critical role for alpha-synuclein autoimmunity in HLA-DRB1*15:01 carriers in the combined PPMI cohort (Supplementary Table 4).The findings by Garretti et al. ( 22) are consistent with the hypothesis that alpha-synuclein-mediated pathology can originate in the enteric neural system and proceed into the brain via the vagus nerve (12, 52).In this context, Braak's hypothesis (12) connects the onset of PD to the alleles HLA-DRB1*15:01, -DRB1*04:02:01, -DQA1*03, and -DQB1*03:02:01 that are associated with Crohn's disease, colitis or celiac disease (63-65), and that we found were significant (p<0.05) in the PPMI cohort (Tables 3-5; Supplementary Table 4).The question remains whether the CD4+ T lymphocytes that recognise and interact with the presented HLA class II bound alpha-synuclein antigens might in turn trigger cytotoxic CD8+ T lymphocytes and antibody producing B-lymphocytes to attack and destroy neurons that display HLA-bound alpha-synuclein antigen at the cell surface in the peripheral and central nervous systems.A dysfunctional blood brain barrier in PD patients can lead to increased levels of alphasynuclein, autoantibodies against alpha-synuclein, and infiltrating T cells in the CSF and plasma (66).Consequently, more information is required about what subgroups of autoreactive T and B lymphocytes and other self-antigens beside alpha-synuclein might be generated by the adaptive immune system in PD pathogenesis.
Eight SVA eQTLs expressed within the MHC region were inferred to differentially modulate the transcription levels of classical class I and class II HLA genes within the PPMI cohort (43,44).In the present study, four of the eight regulatory SVA-RIPs, R_SVA_25, NR_SVA_380, R_SVA_85 and NR_SVA_381, are significant (p<0.05) by the Fisher's exact test within the different PPMI subgroups, but only the SVA_381 PP genotype is significant (Pc<0.1)after Bonferroni corrections for multiple testing (Table 6).SVA_381 PP is a significant (p<0.05)risk in the PD-HC, Prodrome-HC and combination-HC comparisons, but not significant (p>0.05) in the SWEDD-HC comparison.This result might be related to the observation that the homologous NR_SVA_381 insertion (PP) is associated significantly with a decrease in the transcription levels of HLA-DPA1 (p=0.037) and HLA-DPB1 (p=0.001) in the PPMI cohort (Figure 4).The suppressed transcription rate might result in a reduced level of HLA-DPA1 and -DPB1 antigen presentation to the circulating CD4+ helper cells.Previously, NR_SVA_381 was inferred to modulate only the allelic expression of the HLA-DPA1, -DPB1 and -B genes, whereas R_SVA_85 only modulates the HLA-DPA1 and -DPB1 genes (44).Thus, R_SVA_85 and NR_SVA_381 might have opposing regulatory effects on the HLA-DPA1, and -DPB1 gene expression that together could have a small, but significant effect in some PD and Prodrome cases (Figures 3, 4).
The total absence of R_SVA_85 (genotype AA) is a minor risk factor for the Prodrome cohort (Table 6), which suggests that its presence (genotype PP) might be protective.A possible protective role is supported by its presence in four significant protective haplotypes (Figure 2).Although R_SVA_85 significantly increased (p=0.023) the transcription of HLA-DPB1, its presence (PP) had no significant statistical effect (p>0.05) on the levels of HLA-DPA1 transcription (Figure 4).Therefore, the protective effect of the presence of R_SVA_85 in PD or in the Prodrome cohort might be diluted out in a statistical analysis because of its overall high frequency (82.9%) in the PPMI cohort and strong association with many different HLA-DPA1 alleles (Table 7; Figure 3).Although the R_SVA_85 and NR_SVA_381 loci are separated from each other by 1.7 kb in an intergenic region between the HLA-DOA and HLA-DPA1 genes (Figure 1), they are together only at a low frequency of 1.6% (Figure 3).The effects of R_SVA_85 and NR_SVA_381 on the gene expression of HLA-DPA1 and -DPB1 genes (Figure 4), either separately or together, is of interest also for unrelated hematopoietic cell transplantation because the level of expression of HLA-DP in the recipient is an important prognostic indicator of donor-anti-host recognition and for evaluating the risk of graft-versus-host disease (67, 68).
Our previous study was unable to discern whether SVA_24, SVA_380 or SVA_27 modulated the expression of HLA-DQA1 protective or risk alleles (44).The present study revealed that of the 110 HLA-DQA1*03:03:01 risk alleles in the PPMI cohort, 31 (28.2%) were associated with SVA_24, none with SVA_380, and none with SVA_27.Similarly, of 147 DQA1*03:01:01 protective alleles, 35 (23.7%) were associated with SVA_24, none with SVA_380, and none with SVA_27.Although SVA_24 is located near HLA-A within the alpha block of the MHC class I region and located 2.7 Mb from the HLA-DQA1 locus, it appears to regulate the expression levels of the two most statistically significant (Pc<0.1)HLA class II alleles, HLA-DQA1*03:01:01 and -DQA1*03:03:01, detected in our study (Tables 3, 5, respectively).If this is the case, then the statistical significance of SVA_24 transcription and regulation of HLA alleles in PD was not detected probably because it is associated more strongly with transcription modulation of the HLA-A3, -A11 and -A30 neutral alleles (Table 7).None of the other transcribed SVA RIPs were associated with the significant HLA-DQA1*03 alleles.
This study confirms and extends our previous reports that transcribed SVA elements inserted within the MHC genomic region can modulate certain HLA genes at the transcription level (41,43,44), and therefore, might regulate the expression of particular HLA risk and protective alleles, which in turn influence the onset and progression of PD via the immune response.For example, the upregulated or downregulated HLA transcription levels modulated by SVA transcripts could change the levels of foreign or autoreactive self-peptide presentation to CD4+ T helper lymphocytes or cytotoxic CD8+ T lymphocytes and influence the onset, development or progression of PD.In this regard, the SVA and HLA PD risk variants are likely additive causes with a complicated polygenic structure.Stronger statistical and molecular significance might be found in future studies with better stratification and compartmentalisation of the PD co-morbidities associated with autoimmune diseases and other age-related neurological diseases.The coordination of the adaptive and innate immunity by the HLA system in PD is highly complex and still poorly understood (3,8,66).The infiltration of peripheral CD4+ and CD8+ lymphocytes and monocytes into the brain across a dysfunctional blood brain barrier however suggests that the adaptive immune system contributes to neurodegeneration at different stages of PD pathogenesis (10,14,26,66).While we limited our analysis to eight SVA within the MHC genomic region, it is noteworthy that there are many SVAs in other genomic regions that are strongly linked to PD (42,43), including the SVA insertion within the TAF1 gene that is associated with X-linked dystonia parkinsonism (75).
In conclusion, our study of the expressed SVA and HLA genes in circulating white blood cells confirms that the MHC genomic region has an important role in the coordinated regulation of immune responses possibly associated with the long-term onset and progression of PD, the mechanisms of which yet have to be elucidated.MHC SVA RIPs, by down or up regulating the antigen presenting HLA alleles at the proteomic level, might change the amount of risk or protective antigens presented to the CD4+ or CD8+T helper lymphocytes.Thus, co-expression of regulatory SVA RIPs and HLA class I and class II alleles adds another layer of biomolecular complication to the understanding of immune responses associated with PD.
of a UK study group (528 PD cases and 3430 controls) revealed that HLA-DRB1*03 and -DQB1*05 allele groups were possible PD risk alleles whereas HLA-DRB1*04 and -DQB1*03 might be protective.Wissemann et al. (35) in an analysis of 2843 European PD cases from two separate cohorts including healthy controls found that the HLA class II risk alleles were HLA-DRB1*15:01, -DQA1*01:02 and -DQB1*06:02, and the protective alleles were -DRB1*04:04, -DQA1*03:01, and -DQA1*03:02.They also suggested that HLA-B*07:02 and -C*07:02 are part of an HLA risk haplotype, whereas HLA-B*40:01 and -C*03:04 are protective alleles.Hollenbach et al. (38) in a sequencing and typing analysis of 11 classical HLA loci using 1597 PD and 1606 controls found strong protective effects of HLA-DRB1*04:01 and HLA-DQB1*03:02, but no significant differences between cases and controls for alleles of any class I locus (HLA-A, -B, and -C) or class II loci HLA-DPA1, -DPB1, -DRB3, -DRB4, and -DRB5.They also proposed that HLA susceptibility to PD can be explained by a specific combination of amino acids at positions 70-74 on the HLA-DRB1 molecule referred to as the 'shared epitope' (SE) and that the SE in combination with valine at position 11 (11-V) is highly protective in PD, but a risk with the absence of 11-V.More recently, Yu et al. (34) used 13,770 European PD patients in a meta-analysis of multiple cohorts from eight independent sources to confirm that HLA-DRB1*04:01, -DRB1*04:04, -DQA1*03:01 and -DQB1*03:02 were protective.They concluded that the effect of the HLA-DRB1 gene in susceptibility for PD is small and does not merit routine HLA typing in PD.An earlier study of Chinese Han (567 PD cases and 746 controls) indicated that HLA-DRB1*03:01 was a risk allele, whereas HLA-DRB1*04:06 was a protective allele in their study of only HLA-DRB1 alleles

1
FIGURE 1 Location map of the SVA and classical HLA class I and class II genes on chromosome 6 that were transcribed in blood cells in this study.(A) is the HLA class I region showing the relative SVA and the three classical class I HLA gene loci above the horizontal line investigated in this study.The relative position of HLA nonclassical gene (capital letter) and pseudogene (lower case letters) loci are shown below the horizontal line.(B) is the classical class II region showing the relative SVA and five classical class II HLA gene loci above the horizontal line investigated in this study.The relative position of the HLA nonclassical class II genes and pseudogene loci are shown below the horizontal line.The region between HLA-DRB9 and -DRB1 that harbours the structural variants for the HLA-DRB3, -DRB4, -DRB5, and -DRB6 genes and deletion are indicated by the dashed lines as an added horizontal extension.The Class III region that is located between the class I and class II regions is not shown in the figure.The location of all genes, pseudogenes and SVA are not shown to exact genomic scale.

FIGURE 2
FIGURE 2 Phased haplotypes of HLA genotypes at 10-loci and the absence or presence of the SVA insertion at 8-loci are presented as line diagrams for eight examples of 1540 different haplotypes listed in Supplementary Table7.The top horizontal line with 18 boxes represents the reference loci (REF loci) of a hypothetical haplotype with the ten labelled HLA genes (open boxes) and all the labelled SVA present at 8 loci.The R and NR designations were omitted from the labelled SVA loci.The next two horizontal lines from the top represent the two most frequent HLA/SVA haplotypes (ID.797 [n, 37] and ID.101 [n, 35], respectively) with ten yellow boxes representing the allelic groups of the HLA genes and the presence of only one SVA (closed orange box) represented by SVA-85.No other SVA was present in these two haplotypes that were not significantly different (p>0.05) between cases and healthy controls.The next six horizontal lines with ID numbers and n values beside them on their right side represent the haplotypes that were significantly different between PD and HC at p<0.05.The ten open boxes on each horizontal line represent the allelic groups of the HLA genes labelled on REF loci at the top.The SVA present in one or other of the particular haplotypes are represented by the closed orange boxes.For example, the bottom horizontal line represents the ID.1281 haplotype (n, 4) listed in Supplementary Table7and the orange closed boxes represent the presence of 5 SVA insertions that are SVA_24, SVA_25, SVA_26, SVA_377, and SVA_85.This haplotype does not have a DRB3, 4 or 5 gene, hence there is no open box in the DRB345 column.Also, there is no SVA_377 or SVA_27 insertion in any of these eight phased haplotype examples.

7
. The top horizontal line with 18 boxes represents the reference loci (REF loci) of a hypothetical haplotype with the ten labelled HLA genes (open boxes) and all the labelled SVA present at 8 loci.The R and NR designations were omitted from the labelled SVA loci.The next two horizontal lines from the top represent the two most frequent HLA/SVA haplotypes (ID.797 [n, 37] and ID.101 [n, 35], respectively) with ten yellow boxes representing the allelic groups of the HLA genes and the presence of only one SVA (closed orange box) represented by SVA-85.No other SVA was present in these two haplotypes that were not significantly different (p>0.05) between cases and healthy controls.The next six horizontal lines with ID numbers and n values beside them on their right side represent the haplotypes that were significantly different between PD and HC at p<0.05.The ten open boxes on each horizontal line represent the allelic groups of the HLA genes labelled on REF loci at the top.The SVA present in one or other of the particular haplotypes are represented by the closed orange boxes.For example, the bottom horizontal line represents the ID.1281 haplotype (n, 4) listed in Supplementary Table

7
and the orange closed boxes represent the presence of 5 SVA insertions that are SVA_24, SVA_25, SVA_26, SVA_377, and SVA_85.This haplotype does not have a DRB3, 4 or 5 gene, hence there is no open box in the DRB345 column.Also, there is no SVA_377 or SVA_27 insertion in any of these eight phased haplotype examples.

3
FIGURE 3 Frequency of R_SVA_85, SVA_381, DPA1, DPB1 haplotypes in PPMI cohort based on phased haplotypes in Supplementary Table7.(A) relative position of SVA RIPs and HLA-DP genes at the centromeric end of the MHC class II gene cluster (Figure 1).Horizontal arrows show the 5' prime to 3' direction of the DPA1 and DPB1 gene coding.(B) Four main R_SVA_85, SVA_381 phased haplotype structures, PP, PA, AP and AA, of the 2330 haplotypes listing their percentage frequency association with the DPA1 and DPB1 haplotype allelic lineages.

7
. (A) relative position of SVA RIPs and HLA-DP genes at the centromeric end of the MHC class II gene cluster (Figure 1).Horizontal arrows show the 5' prime to 3' direction of the DPA1 and DPB1 gene coding.(B) Four main R_SVA_85, SVA_381 phased haplotype structures, PP, PA, AP and AA, of the 2330 haplotypes listing their percentage frequency association with the DPA1 and DPB1 haplotype allelic lineages.possible high risk HLA haplotypes that lead to a faster rate of disease onset, and which are protective or low risk HLA alleles that slow down the disease rate.Wissemann et al. (35) on the basis of their study suggested that the 7.1 ancestral haplotype (AH) that consists of the linked HLA alleles B*07:02/C*07:02/DRB5*01/ DRB1*15:01/DQA1*01:02/DQB1*06:02 is a PD high risk haplotype and that the C*03:04, DRB1*04:04 and DQA1*03:01 alleles are part of low-risk haplotypes.We found that five of the alleles in the possible high risk 7.1AH were present in the SWEDD-HC comparison, but not in the PD-HC or Prodrome-HC comparisons.DRB1*15:01 was a possible risk allele only in the Combination (all subgroups)-HC comparison.Also, these high-risk alleles were present in the SWEDD group at a frequency of between 14.2% and 18.3% relative to a frequency between 7% and 10.7% in the HC group.The low-risk alleles DRB1*04:04 and DQA1*03:01 were distributed as protective alleles in our PD-HC comparison.In contrast, C*03:04 was a risk allele in the Prodrome-HC, and the SWEDD-HC comparisons.Furthermore, none of the HLA alleles of the frequent Caucasian 8.1AH haplotype: HLA-A*0101/C*0701/ B*0801/DRB1*0301/DQA1*0501/DQB1*0201, except for C*0701, were significant in our study groups.

4
FIGURE 4 Box plots of regulation of the expression of HLA-DPA1 and HLA-DPB1 transcripts by NR_SVA_381 genotypes [(A, B), respectively], and HLA-DPA1 and HLA-DPB1 transcripts by NR_SVA_85 genotypes (C, D, respectively) in PPMI cohort.The genotypes are absent-absent (AA), absent-present (AP), and present-present (PP).The number of genotypes (n) for NR_SVA_381 in each (A, B) are 805 for AA, 404 for AP, and 57 for PP in 1266 individuals.The number of genotypes for R_SVA_85 in each (C, D) are 24 for AA, 371 for AP and 826 for PP for 1221 individuals, data were not available for 45 individuals.The statistical p-values are shown above the box plots on horizontal lines between the genotypes.

TABLE 1
Common medical diagnoses in 318 PD patients and 264 healthy controls (free of PD).

TABLE 3
Significantly expressed HLA genotypes in healthy controls (HC) versus Parkinson Disease (PD).

TABLE 4
(44)ificant expressed HLA genotypes in Parkinson Disease (PD) compared to (A) prodrome and (B) scans without evidenece of dopamine deficits (SWEDD).statisticallyinferredregulatoryeffects on classical class I and class II gene transcription levels and their different isoforms(44).The MHC SVA genotype frequencies and their influence on classical class I and class II HLA genes and transcripts based on a previous study(44)are shown in Supplementary Table5.The number and percentage frequency of the 64 SVA-phased haplotypes with the eight MHC genotyped SVA as present or absent insertions in the present study are shown in Supplementary had

TABLE 5
Significantly expressed HLA genotypes in healthy controls (HC) versus (A) prodrome, and (B) scans without evidence of dopamine deficits (SWEDD).