Next-Generation Sequencing Identifies Extended HLA Class I and II Haplotypes Associated With Early-Onset and Late-Onset Myasthenia Gravis in Italian, Norwegian, and Swedish Populations

Genetic susceptibility to myasthenia gravis (MG) associates with specific HLA alleles and haplotypes at the class I and II regions in various populations. Previous studies have only examined alleles at a limited number of HLA loci that defined only broad serotypes or alleles defined at the protein sequence level. Consequently, genetic variants in noncoding and untranslated HLA gene segments have not been fully explored but could also be important determinants for MG. To gain further insight into the role of HLA in MG, we applied next-generation sequencing to analyze sequence variation at eleven HLA genes in early-onset (EO) and late-onset (LO) non-thymomatous MG patients positive for the acetylcholine receptor (AChR) antibodies and ethnically matched controls from Italy, Norway, and Sweden. For all three populations, alleles and haplotype blocks present on the ancestral haplotype AH8.1 were associated with risk in AChR-EOMG patients. HLA-B*08:01:01:01 was the dominant risk allele in Italians (OR = 3.28, P = 1.83E−05), Norwegians (OR = 3.52, P = 4.41E−16), and in Swedes HLA-B*08:01 was the primary risk allele (OR = 4.24, P <2.2E-16). Protective alleles and haplotype blocks were identified on the HLA-DRB7, and HLA-DRB13.1 class II haplotypes in Italians and Norwegians, whereas in Swedes HLA-DRB7 exhibited the main protective effect. For AChR-LOMG patients, the HLA-DRB15.1 haplotype and associated alleles were significantly associated with susceptibility in all groups. The HLA-DR13–HLA-DR–HLA-DQ haplotype was associated with protection in all AChR-LOMG groups. This study has confirmed and extended previous findings that the immunogenetic predisposition profiles for EOMG and LOMG are distinct. In addition, the results are consistent with a role for non-coding HLA genetic variants in the pathogenesis of MG.

Genetic susceptibility to myasthenia gravis (MG) associates with specific HLA alleles and haplotypes at the class I and II regions in various populations. Previous studies have only examined alleles at a limited number of HLA loci that defined only broad serotypes or alleles defined at the protein sequence level. Consequently, genetic variants in noncoding and untranslated HLA gene segments have not been fully explored but could also be important determinants for MG. To gain further insight into the role of HLA in MG, we applied next-generation sequencing to analyze sequence variation at eleven HLA genes in early-onset (EO) and late-onset (LO) non-thymomatous MG patients positive for the acetylcholine receptor (AChR) antibodies and ethnically matched controls from Italy, Norway, and Sweden. For all three populations, alleles and haplotype blocks present on the ancestral haplotype AH8.1 were associated with risk in AChR-EOMG patients. HLA-B*08:01:01:01 was the dominant risk allele in Italians (OR = 3.28, P = 1.83E−05), Norwegians (OR = 3.52, P = 4.41E− 16), and in Swedes HLA-B*08:01 was the primary risk allele (OR = 4.24, P <2. 2E-16). Protective alleles and haplotype blocks were identified on the HLA-DRB7, and HLA-DRB13.1 class II haplotypes in Italians and Norwegians, whereas in Swedes HLA-DRB7 exhibited the main protective effect. For AChR-LOMG

INTRODUCTION
Myasthenia gravis (MG) is a rare antibody-mediated, T-cell dependent autoimmune disorder of the neuromuscular junction (NMJ) characterized by fluctuating muscle weakness and abnormal fatigability (1,2). The global prevalence of MG ranges from 40 to 180 per million with an estimated annual incidence of 1.74 to 12 cases per million persons (3,4). MG occurs across all ancestral groups and affects individuals of all ages and both sexes, although several studies have shown that MG exhibits a binomial distribution based on age and sex; MG is frequently observed in younger women aged less than 40 years old and older men over 60 years old (4).
In the majority of patients,~85-90%, MG is a result of pathogenic autoantibodies targeting the nicotinic acetylcholine receptors (AChRs) located at the postsynaptic membrane of the NMJ (5-7). Anti-AChR autoantibodies predominantly consist of immunoglobulin subclasses IgG1 and IgG3, which are potent activators of complement resulting in membrane attack, impaired synaptic signal transmission, and muscle damage that manifests as skeletal muscle weakness (8). In addition, CD4+ T helper cells and the cytokines they secrete play important roles in the pathogenesis of MG. Rare forms of MG are due to autoantibodies targeting other postsynaptic membrane components such as the muscle-specific kinase (MuSK) protein, which is essential for NMJ formation and clustering of AChRs, and the lipoprotein receptor-related protein 4 (LRP4) a receptor that binds to neuronal agrin to activate MuSK (5,9). Double-seronegative MG cases (meaning negative for AChR and MuSK antibodies) may have other antibodies targeting NMJ proteins (such as LRP4, Agrin, and Cortactin), but further studies are needed to determine their relevance for disease mechanism (5,9,10). The MG autoantibodies are important diagnostic factors and are used along with the clinical presentation (ocular vs. generalized), thymus pathology, and age of onset for classifying MG patients and to help guide treatment (3). The demarcation between early-onset MG (EOMG) and late-onset MG (LOMG) is not well-defined, but most studies agree than EOMG occurs at <50 years, and LOMG is >50 years old (3,11). These two subtypes of MG differ in terms of sex affected and the associated susceptibility genes; EOMG is common in women with a hyperplastic thymus, whereas LOMG mostly afflicts men who have normal or atrophic thymus. The different MG subgroups clearly indicate that MG is a heterogeneous disease, suggesting that many predisposing factors are involved in determining disease pathogenesis. The disease etiology in MG is multifactorial involving a complex interplay of environmental and genetics factors. Familial cases of MG have been reported over the years, implicating the role of genetic factors in MG risk (12)(13)(14)(15)(16). Furthermore, a twin study examining MG heritability showed that the concordance rate between monozygotic twins (35%) was greater than dizygotic twins (3-4%), further underscoring the notion of a genetic predisposition (17). However, the incomplete concordance in monozygotic twins also suggests the contribution of a large environmental component to MG risk. The strongest genetic predictors of MG risk are the human leukocyte antigen (HLA) genes (18) located on chromosome 6p21. 31. The associated HLA alleles vary according to the different subgroups of MG. Specifically, the HLA-A1-B8-DR3-DQ2 ancestral haplotype, also referred to as AH8.1 (19), or specific alleles which lie on this haplotype have been found to be strongly associated with AChR-MG and EOMG in several European populations (20)(21)(22)(23)(24). The HLA-B7-DR2 haplotype and the HLA-DRB1*15:01 allele were found to be associated with LOMG in two cohorts (25,26). In a recent retrospective study performed in Italian MG patients HLA-DQB1*05:01 was associated with thymoma, and the HLA-DRB1*16~HLA-DQB1*05:02 haplotype with non-thymomatous AChR-MG with >60 years onset (27). These studies highlight the genetic heterogeneity in HLA alleles predisposing to various groups of MG among different populations. However, previous studies were limited by the small number of HLA genes interrogated in a single study, and restricted coverage of genes. Limited sequence coverage of HLA genes means that allele and phase ambiguities are likely to be present, thereby compromising the accuracy of the alleles and haplotypes defined. Therefore, reexamination of HLA associations with MG using extended coverage and advanced molecular typing methods is warranted. Next-generation sequencing (NGS) is an attractive option for variant characterization of HLA genes due to its comparatively low cost, high-accuracy and unbiased variant discovery. NGS of extended HLA gene segments has been applied successfully in population-based and trait-association studies to identify allelic variants in non-coding regions, and to achieve mostly unambiguous HLA genotyping (28)(29)(30)(31). In this study, we report the analyses of HLA alleles and haplotypes, defined using high-resolution NGS, associated with susceptibility and protection to EOMG and LOMG in Italian, Norwegian, and Swedish cohorts.

Study Populations
The study populations consisted of MG patients and healthy controls (HC) recruited from Italy (MG = 354, 219 females; HC = 250, 90 females), Norway (MG = 412, 252 females; HC = 500, 319 females), and Sweden (MG = 339, 202 females; HC = 2,120, sex unknown). All participants provided written informed consent at the site of study recruitment and arrived de-identified to the genotyping and analysis laboratories (Stanford University and the University of California San Francisco). All subjects included in the analyses were self-described white European. There was not a requirement for ancestors living in Italy for a certain number of generations, however, patients and controls were reported to be of Italian descent. White Norwegians was classified according to the birth country and ethnicity of the parent. The Swedish cohort is a single-center consecutively collected cohort of MG patients either treated or reviewed for second opinion at the Karolinska University Hospital in Stockholm, Sweden. Self-reported information on ancestry was collected at inclusion. There was no requirement for ancestors having lived in Sweden for a certain number of generations.

HLA Genotyping
HLA typing was performed on genomic DNA extracted from blood samples using local protocols such as the salting-out method or the standard phenol/chloroform method. For all DNA samples with the exception of Swedish HC samples which were typed previously by mid-resolution Sanger sequencing, were retrospectively typed at high-resolution for class I (HLA-A, HLA-C, HLA-B) and class II (HLA-DRB1, HLA-DRB3, HLA-DRB4, HLA-DRB5, HLA-DQA1, HLA-DQB1, HLA-DPA1, HLA-DPB1) HLA loci using the MIA FORA NGS high-throughput (HT) semi-automated typing protocol (Immucor, Inc., Norcross, GA, USA) and Illumina MiniSeq instruments (Illumina, Inc., San Diego, CA, USA). All sample testing including sequence data analysis and HLA genotype assignment using the MIA FORA FLEX v3.0 alignment software (Immucor, Norcross, GA, USA), were performed according to previously published methods (28).

Standardization of STR Ambiguities
The MIA FORA software permits detailed examination of all sequence segments, as well as unambiguous allele assignment, with the exception of short tandem repeat (STR) enriched regions located within introns of some class II genes. These low complexity regions consist of~1-6 bp nucleotide units that are repeated numerous times and cannot be enumerated accurately by the sequencing methodology. In order to standardize alleles that are indistinguishable due to STRs, alleles were assigned to groups and were given the suffix SG (denoting STR Group) to the lowest numbered allele in that group (28,30

Statistical Analyses
Allele carrier frequencies were determined by direct counting from the sequencing results and were calculated by dividing the number of times a specific allele (either at the homozygous or heterozygous state) is observed in a population by the total number of copies of all alleles at that particular genetic locus in the population. The Bridging Immuno Genomic Data Analysis Workflow Gaps (BIGDAWG) software package v1.8 (32) was used to perform case-control phenotype association analyses. Association analyses were performed using chi-square (c2) testing. Haplotypes frequencies were estimated using the R 'haplo.stats' package to run the expectation maximization (EM) algorithm in the BIGDAWG package. A tilde symbol "~" is used to show alleles located or shared on a specific haplotype block. Calculation of haplotype/allele confidence interval, odds ratio, and P-values were performed using the R epicalc package run in BIGDAWG. The effect sizes of HLA alleles and haplotypes on MG risk were measured by odds ratio (OR) with 95% confidence intervals (CI), and associated probability (P) values were derived from a two-tailed Fischer's exact test or a Pearson's chi-squared test when appropriate. Correction of nominal P-values for multiple testing was applied as follows. First the Bonferroni correction method was used to correct P-values generated from tests of overall (locus-level) heterogeneity at the locus. P-values less than 0.05 (a) were considered statistically significant. If the adjusted locus-level heterogeneity P-value at a particular locus was statistically significant allele-level association P-values were not corrected for multiple comparison, However, if the adjusted locus-level heterogeneity P-value at a locus was not statistically significant allele-level association P-values were corrected for multiple testing (i.e. multiplying the P-value by the number of identified alleles at each locus) using the Bonferroni method. Stratification analysis was applied to the primary risk alleles (i.e. the allele(s) with the strongest effect) to detect secondary alleles and haplotypes associated with the disease. Due to the small number of ocular patients from each population (ranging from 13 to 19 in the EOMG group, and three to 30 in the LOMG group), ocular data from all three groups were pooled together then analyzed. All calculated P-values were two-sided and Pvalues less than 0.05 (a) were considered statistically significant. Standardized linkage disequilibrium (LD) measures were computed using D' (33) and the W n statistic (34) at the locuslevel and D' ij for alleles at two loci implemented in the PyPop software (35). In control subjects allele frequencies at all loci were tested for deviations from Hardy-Weinberg equilibrium (HWE) proportions using the exact method of Guo and Thompson (36). Analyses were performed using the R statistical program version 3.5.0 and SPSS version 21 software package (IBM SPSS Inc., Chicago, IL, USA).

Demographics of Cohorts
MG datasets were categorized according to age of onset; earlyonset MG (EOMG) age of <50 years, and late-onset MG (LOMG) age of >50. In addition, the cohorts were summarized according to ocular and generalized MG subtypes, autoantibody profile, and thymoma cases ( Table 1). We first sought to determine whether gender influenced any of the clinical features described. We found that the distribution of male and females were significantly different among EOMG and LOMG patients in all three groups (P <0.001). There was a predominance of female cases in the EOMG group, ranging from 77.6% in the Italian cohort to 79.0% in Norwegians and 80.4% in Swedish patients. In contrast, LOMG was more common in men, occurring at 66.7% in Italians, 62.7% in Norwegians, and 61.1% in Swedes. These finding are in keeping with previous reports of sex differences in MG by age of onset. Similarly, females were prominent in the anti-AChR-Ab seropositive group in all three cohorts, ranging from 57 to 60%. Thymoma was slightly more common in women although the numbers are small. We restricted the analyses to patients that were non-thymomatous and antibody positive for the AChR since the dataset of the rarer antibody positive subsets were too small to analyze and make any inferences.

Locus-Level Heterogeneity in AChR-EOMG Patients
We tested eleven HLA loci characterized at maximum resolution in the Italian and Norwegian case-control cohorts and six loci in the Swedish group for allelic and haplotype heterogeneity and association with MG risk and protection. The Swedish data was examined at 2-field allelic resolution for six HLA genes (HLA-A, HLA-C, HLA-B, HLA-DRB1, HLA-DQB1, and HLA-DPB1) because the control sample used was previously genotyped for only these loci using mid-resolution molecular methods. For all three populations, HLA heterogeneity between AChR-EOMG nonthymomatous cases and controls were analyzed at the overall locuslevel, the results of which are shown in Supplementary Table 2.
In the Italian cohort following adjustment for multiple testing, the majority of loci and haplotypes were significant with the exception of HLA-A, HLA-C, HLA-DPA1, HLA-DPB1, and HLA-DPA1~HLA-DPB1. Similarly, in the Norwegian group most of the loci survived multiple testing and only the HLA-DP loci and haplotype results were not significant. In the Swedish cohort all loci with the exception of HLA-DPB1 were statistically significant. The significant results indicate differences in the distribution of HLA alleles between cases and controls.
Likewise HLA-B*08:01 was the strongest risk factor in the Swedish cohort ( Table 4; OR = 4.24, CI = 3.12-5.74, P <2.22E−16), followed by HLA-C*07:01 (OR = 3.51, CI = 2.60-4.72, P <2.22E−16); HLA-DQA1 loci were not typed in Swedes. Analyses of alleles reduced to 2-field resolution in the Italian and Norwegian cohorts revealed a similar pattern of significant allele frequency differences between cases and controls (data not shown). In the Italian and Norwegian cohorts it is noteworthy that the effect size of the HLA-DQA1*05:01 allele at the protein level is considerably less than observed at the intronic level: Italians OR = 2.45, CI = 1.37-4.34, P = 8.41E−04; Norwegians    Table 2). In Norwegians, HLA-DRB*13:02 class I and II haplotype blocks were also strong risk factors; HLA-    Figure 2.
To determine whether sex influenced the associations observed we performed both case-control analyses stratified by sex and logistic regression adjusting for sex in Italian and Norwegian cohorts; sex data was not available for Swedish controls. Both sets of analyses for Italians and Norwegians produced very similar results to the original/unadjusted results (data not shown). Since the majority of AChR-EOMG cases in each three cohorts are comprised of females the association signals could be attributed to females. In logistic regression male sex had a negative coefficient estimate of~−1.5 and the P-values were statistically significant indicating a negative correlation between males and AChR-EOMG.

Locus-Level Heterogeneity in AChR-LOMG Patients
Locus-level heterogeneity results for the AChR-LOMG subtype for three cohorts are shown in Supplementary Table 3

HLA-DRB*15.1 and HLA-DRB1*07 Are Risk Factors for AChR-LOMG Patients
In all three cohorts' alleles and haplotype blocks of the HLA-DR15~HLA-DR51~HLA-DQA1~HLA-DQB1 extended haplotype were positively associated with AChR-LOMG (Tables 5-7). However the effect sizes were significantly greater in the Italians with OR's ranging from 3.26 to 3.56, compared to Norwegians; ORs ranging from 1.57 for HLA-C*07:02:01:03 to 3.23 for the extended haplotype encompassing both class I and II regions, and Swedes; OR = 1.54 for HLA-B*07:02 and largest OR of 2.30 for the extended haplotype.
The association of HLA-DRB1*15:01 with LOMG risk has been previously reported (26). However, in this study we find These results indicate that both alleles are the strongest determinants of AChR-LOMG risk. In Norwegians a similar pattern was observed ( Table 6).

Generalized and Ocular MG Subsets
We examined whether the allele distribution varied with generalized and ocular subsets within the aforementioned EOMG and LOMG non-thymomatous AChR positive antibody datasets. In the generalized subset similar associations were observed as described above. For both EOMG and LOMG ocular tests, no significant associations were found following adjustment for multiple testing.

DISCUSSION
We provide a detailed MG association analyses data derived from high-depth sequencing of complete HLA genes (HLA-A, HLA-C, HLA-B, HLA-DQA1, HLA-DPA1) and wide coverage of genomic regions (HLA-DRB1, HLA-DRB3/4/5, HLA-DQB1, HLA-DPB1). In contrast to previous reports of HLA associations with MG, which have often utilized serological and low to mid-resolution molecular typing methods, we evaluated individuals from three European populations in a single study and utilized next-generation sequencing to capture variation in intronic and untranslated regions.
Our results have extended and refined previous observations of HLA alleles with MG risk. Consistent with previous HLA MG association studies conducted in European populations (20,22,26,38), we showed that the ancestral haplotype 8.1 most common in Northern European populations is associated with AChR-EOMG risk in Italian, Norwegian, and Swedish cohorts. However, in the Italian and Norwegian cohorts, who were examined at maximum HLA allelic resolution, we successfully sequenced the intronic and UTR variants for most alleles composing the AH8.1 haplotype. This is clearly illustrated when we consider the dominant AChR-EOMG risk allele HLA-B*08:01 which according to the IMGT/ HLA Database release 3.250 has five 3-field variants (alleles that differ by synonymous mutations and are translated into the same protein), and two 4-field variants named HLA-B*08:01:01:01 and HLA-B*08:01:01:02 that are distinguishable due to a single SNP T > C at 2,802 bp in the 3' UTR of the HLA-B gene. In addition, we showed conclusively that the frequency of HLA-B*01:01:01:01 was over represented in Italian and Norwegian AChR-EOMG patients compared to controls; in fact HLA-B*08:01:01:02 occurs at much lower frequencies (AF~0.005, n = 1 or 2) in the Italian group regardless of disease status, and was not observed at all in Norwegians. Although not shown, HLA-B*01:01:01:02 was not observed in the Swedish AChR-EOMG patient cohort therefore HLA-B*08:01:01:01 is also the risk allele in this group. Furthermore, we show that HLA-B*08:01:01:01 is almost exclusively linked to HLA-C*07:01:01:01 (D' >0.95) of which there are five other 4-field variants that differ due to various SNPs located within introns. These observations also extend to the class II region of the AH8.1 haplotype; HLA-DQA1*05:01:01:02 was identified as the risk allele whereas HLA-DQA1*05:01:01:01 and HLA-DQA1*05:01:01:03 are neutral. Due to the presence of a prominent low-complexity sequence region near the 5' end of intron 2 in the HLA-DRB loci and unsequenced regions it was not possible to define the 4-field variant of HLA-DRB1*03:01:01 and HLA-DRB3*01:01:02. We note that increased allelic resolution has decreased the statistical power of the association analyses by increasing the number of categories. We observed this event when we compare the 4-field association results to the 2-field data where overall the effect size is greater and P-values smaller. The downside is that lower resolution typing pools a large number of HLA alleles together and therefore masks the effects of individual yet pertinent alleles. One of the main take home messages from this study is that specific intronic and UTR variants of HLA alleles contribute to MG pathogenesis. Since the AH8.1 haplotype is most frequent and highly conserved in Europeans it is not too surprising that alleles of this haplotype and extended regions have been found to be associated with several immune system dysfunctions including chronic inflammation and autoimmune diseases such as Type I diabetes (39) and systemic lupus erythematosus (40). Autoimmunity is thought, in part, to result from the downregulation of type II cytokine responses, which is influenced by the interaction of alleles on AH8.1, leading to enhanced humoral responses and increased production of autoantibodies (41). On the other hand, the stability of the AH8.1 haplotype, and relatively low recombination rate across the haplotype suggests that this haplotype is positively selected and has a survival advantage. The AH8.1 contributes to EOMG in European populations (26,38,42) but not populations from East Asia where HLA-DRB1*09:01 is the main risk allele for EOMG (43,44). The numerous studies conducted in Europeans have shown that HLA-B*08:01 appears to be the dominant risk allele for EOMG and the other associated alleles located on AH8.1 are simply a consequence of strong LD between them and HLA*08:01. In this present study, this notion appears to be the case when we only consider the reduced 2-field data for all three groups; HLA-B*08:01 has the strongest effect. However, at the 4-field resolution HLA-DQA1*05:01:01:02 has a similar effect size to HLA-B*08:01:01:01 for risk in Italian patients. This finding is of interest as to the best of our knowledge the HLA-DQA1*05:01:01 allele has never been investigated in previous MG association studies.
We identified additional risk factors for EOMG that appeared to be ethnic specific such as the HLA-DRB1*16:01:01 class II haplotype-blocks in Italians, and the HLA-DRB1*13:02:01 haplotype-blocks encompassing class I and II alleles in Norwegians. These findings are novel and should be confirmed and further examined to refine the SNPs associated with EOMG in these European populations.
We also confirmed previous findings of a protective effect conferred by the HLA-DRB1*13:01:01:01SG class II allele and haplotype in both AChR-EOMG and AChR-LOMG patients from Italy, Norway, and Sweden. HLA-DRB1*13:01:01:01SG has a frequency of~0.05 in several European populations (unpublished observations). The protective effect of the HLA-DRB1*13:01 allele is not exclusive to MG but has also been found to be negatively associated with a plethora of autoimmune diseases in Europeans, and Asians (37,45). It has been hypothesized that the protective effect of HLA-DRB1*13:01 is due to the high binding affinity/avidity of T-cell receptors on autoreactive regulatory T-cells (Tregs) for HLA-DRB1*13:01 molecules. Consequently, thymic negative selection and autoreactive Tregs development are enhanced resulting in decreased pathogenicity (41).
In conclusion, this study provides additional evidence that specific HLA alleles and haplotypes play essential roles in the pathogenesis of different subtypes of MG. Distinctive HLA factors for susceptibility and resistance were found in EOMG and LOMG, and open avenues for investigating the mechanisms for these distinct diseases. The extensive LD within the HLA region requires that populations of non-European background should be investigated to distinguish between primary and secondary effects. Due to the location of many of the causative polymorphisms in non-coding regions that could affect expression functional assays of candidate genes are optimal to further understand MG pathogenesis. A deeper understanding of the biological mechanisms underlying MG and could offer new insights for diagnostic approaches and the development of novel therapies.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: (www.immport.org/ shared/home), ImmPort/.

AUTHOR CONTRIBUTIONS
LC was involved in the conception and design of the study, performed the statistical analyses, interpreted the results, and wrote the manuscript. MF-V and JO were involved in the conception and design of the study, contributed to the interpretation of the results, and contributed to the final version of the manuscript. JH contributed to the conception of the study and interpretation of the results. PC, RF, BL, HH, SB,