High Resolution Haplotype Analyses of Classical HLA Genes in Families With Multiple Sclerosis Highlights the Role of HLA-DP Alleles in Disease Susceptibility

Multiple sclerosis (MS) susceptibility shows strong genetic associations with HLA alleles and haplotypes. We genotyped 11 HLA genes in 477 non-Hispanic European MS patients and their 954 unaffected parents using a validated next-generation sequencing (NGS) methodology. HLA haplotypes were assigned unequivocally by tracing HLA allele transmissions. We explored HLA haplotype/allele associations with MS using the genotypic transmission disequilibrium test (gTDT) and multiallelic TDT (mTDT). We also conducted a case-control (CC) study with all patients and 2029 healthy unrelated ethnically matched controls. We performed separate analyses of 54 extended multi-case families by reviewing transmission of haplotype blocks. The haplotype fragment including DRB5*01:01:01~DRB1*15:01:01:01 was significantly associated with predisposition (gTDT: p < 2.20e-16; mTDT: p =1.61e-07; CC: p < 2.22e-16) as reported previously. A second risk allele, DPB1*104:01 (gTDT: p = 3.69e-03; mTDT: p = 2.99e-03; CC: p = 1.00e-02), independent from the haplotype bearing DRB1*15:01 was newly identified. The allele DRB1*01:01:01 showed significant protection (gTDT: p = 8.68e-06; mTDT: p = 4.50e-03; CC: p = 1.96e-06). Two DQB1 alleles, DQB1*03:01 (gTDT: p = 2.86e-03; mTDT: p = 5.56e-02; CC: p = 4.08e-05) and DQB1*03:03 (gTDT: p = 1.17e-02; mTDT: p = 1.16e-02; CC: p = 1.21e-02), defined at two-field level also showed protective effects. The HLA class I block, A*02:01:01:01~C*03:04:01:01~B*40:01:02 (gTDT: p = 5.86e-03; mTDT: p = 3.65e-02; CC: p = 9.69e-03) and the alleles B*27:05 (gTDT: p = 6.28e-04; mTDT: p = 2.15e-03; CC: p = 1.47e-02) and B*38:01 (gTDT: p = 3.20e-03; mTDT: p = 6.14e-03; CC: p = 1.70e-02) showed moderately protective effects independently from each other and from the class II associated factors. By comparing statistical significance of 11 HLA loci and 19 haplotype segments with both untruncated and two-field allele names, we precisely mapped MS candidate alleles/haplotypes while eliminating false signals resulting from ‘hitchhiking’ alleles. We assessed genetic burden for the HLA allele/haplotype identified in this study. This family-based study including the highest-resolution of HLA alleles proved to be powerful and efficient for precise identification of HLA genotypes associated with both, susceptibility and protection to development of MS.


INTRODUCTION
Multiple sclerosis (MS) is a chronic debilitating neurological disorder associated with central nervous system inflammation, demyelination, and axonal degeneration (1). MS is considered a multifactorial disease with abundant evidence linking multiple common alleles across the genome to disease risk and progression (2,3). The most notable susceptibility genomic region for MS maps to the Major Histocompatibility Complex (MHC) in chromosome 6p21.3 (4)(5)(6)(7)(8)(9)(10), where distinct HLA alleles have been consistently found associated with both, susceptibility and protection.
In addition to bidirectional influences on risk, HLA allelic and haplotypic heterogeneity has been reported (11), together with population effects (12), epistasis (13), and sex dimorphism (14,15), reflecting the biological complexity of the mechanisms underlying the statistical associations. Different study designs and the evolution of genotyping methods also contributed to the identification of primary and secondary associations, enriching the working models that describe the role of HLA gene products in disease predisposition.
The recent advent of cost-effective next generation sequencing (NGS) methods and customized algorithms allow to sequence nearly complete HLA genes including UTRs and introns, to uncover the nucleotide variation on study groups including novel variants, and to assign the least ambiguous HLA genotypes and untruncated haplotypes (16).
As part of the disease association and family haplotype projects for the 17 th International HLA and Immunogenetics Workshop (IHIW), we generated high-resolution data for 11 HLA genes in 477 MS trio families. We took advantage of recently developed algorithms to track the segregation of NGS HLA genotypes and performed a classical transmission disequilibrium test (TDT) analysis (17) to further improve or understanding of the HLA allelic and haplotypic diseaseassociated landscape.

Study Subjects
In total 477 trio MS families, totaling 1431 subjects with non-Hispanic European ancestry are included. A trio includes one proband diagnosed with MS and two unaffected parents. All children met established diagnostic criteria of MS (18).
Transmission and non-transmission of parental alleles and haplotypes was evaluated to assess associations with MS. In addition, associations were examined in a case-control analysis by comparing the distribution of HLA alleles in these 477 children with those in a control group including 2029 previously characterized unrelated healthy subjects with non-Hispanic European ancestry as control group (19).
There were 54 additional families with multiple MS cases (n = 147) found in multiple generations that included samples with European ancestry. Table 1 shows clinical demographic information for the MS cases for trio and extended families. We did not perform disease association statistical analyses for the extended families but reviewed the HLA allele/haplotype inheritance patterns within the families. Supplemental Table 1 includes summary of the extended families.
This study was approved by the University of California, San Francisco Institutional Review Board. The analyses of HLA genotype data with the double-blinded sample IDs were conducted at the Stanford Blood Center and Stanford University under the Stanford University Institutional Review Board (IRB) eProtocol titled, "17th International HLA and Immunogenetics Workshop" (#: 38899). Table contains "Disease course", "Gender" and "Age of disease onset" information. "Trio" column shows demographic information about MS cases (children) from 477 trio families.
"Extended families" column shows demographic information about MS cases from 54 extended families in which the numbers of MS patients ranged from 2 to 7.

Building HLA Haplotypes
The family ID, subject ID, familial relationship and HLA genotypes organized in Genotype List String (GL string) format (26) were downloaded from the 17 th IHIW database (22) and used as input format to build HLA haplotypes from trios using HaplObserve software (24). HaplObserve generates an output file containing "transmitted" and "non-transmitted" haplotypes/alleles counts in comma separated value (CSV) file that allows keeping track of which haplotypes were transmitted or not transmitted to the offspring. The haplotypes were separated into the small haplotypes/alleles as described previously (24). Supplemental Table 2 contains "transmitted" and "non-transmitted" haplotype counts for 11-loci, Class I, HLA-DRB3/4/5~HLA-DRB1~HLA-DQA1~HLA-DQB1, HLA-DPA1~HLA-DPB1 haplotypes with both untruncated and twofield allele determinations.

Genotypic Transmission Disequilibrium Test (gTDT) and Multiallelic TDT (mTDT)
The HLA genotypes represent a combination of HLA alleles or HLA haplotypes for target HLA loci. First, we performed TDT with HLA genotypes from trio families using "Genotypic TDT " function in trio version 3.18.0 R package (27)(28)(29). We use gTDT as an abbreviation for Genotypic TDT in this manuscript. We tested both untruncated (e.g. HLA-A*02:01:01:01) and two-field (e.g. HLA-A*02:01) allele names for 11 HLA genes (see section Genotyping), and 19 HLA haplotype blocks ( . We chose these blocks to eliminate false signals resulting from 'hitchhiking' alleles. In addition, we performed a second TDT analysis with HLA genotypes from the trio families using the Transmission/ disequilibrium test of a multiallelic marker by Bradley-Terry model (mtdt2) function in gap version 1.2.2 R package (30)(31)(32). We use mTDT as an abbreviation for multiallelic marker TDT. Similar to TDT, we tested both untruncated and two-field allele names for 11 HLA genes, and 19 HLA haplotype blocks as described above.
We performed various stratification gTDT and mTDT analyses, which excluded families and individuals that carried target allele/haplotype. For example, to eliminate the risk effect of haplotype bearing HLA-DRB1*15:01, we removed all the families that contained at least one family member with the risk haplotype bearing HLA-DRB1*15:01 in the data set. Finally, we performed conditional logistic regression TDT analyses using colGxE function in trio R package in the presence or absence of the haplotype bearing HLA-DRB1*15:01 to determine the effect and its interaction with the other HLA alleles/haplotypes (29,33). To identify the genetic model underlying the association between HLA haplotypes/alleles and MS, we also performed the MAX gTDT to compute the maximum over the TDT statistics for an additive, dominant, and recessive model, and to compute permutation-based p-values (28).

Case-Control (CC)
For case-control analyses, we used Bridging ImmunoGenomic Data-Analysis Workflow Gaps (BIGDAWG) (34). We tested the same 11 HLA genes, and 19 haplotypes for CC analyses using BIGDAWG. Similar to gTDT and mTDT, we also performed various stratification CC analyses.

Summarizing gTDT, mTDT, and CC Results
Sixty [(11 loci + 19 haplotypes) x 2 (untruncated and two-field)] different tests were performed for gTDT, mTDT and CC. Each software package for gTDT, mTDT and CC generates different format of output files. To circumvent manual manipulation of the output files, and capture essential information from the output files, we developed a script to summarize the results in a single file using Practical Extraction and Report Language (Perl) programming. We compared gTDT, mTDT and CC results for all observed alleles and haplotypes (Supplemental Tables 3-12). We used a threshold of significance of 0.05 for gTDT, mTDT and CC.

Deviations From Expected Hardy-Weinberg Equilibrium (HWE)
Python for Population Genomics (PyPop) version 0.7.0 (35) was used to investigate Hardy-Weinberg Equilibrium (HWE) via the Guo and Thompson test (36), assessing genotyping proportions for both individual loci. We identified individual genotypes deviating significantly from HWE expectations using Chen's method (37), using a threshold of significance of 0.05.

Measuring MS Risk-Protective Effects
We assessed the risk-protective effects of the identified risk and protective HLA alleles/haplotypes in two approaches. First, we recorded the presence of a risk allele/haplotype to be +1 and a protective allele/haplotype to be -1, and calculated the sum of the scores for each case-control study subject. We classified each subject into five categories for both cases and controls based on the final net scores: 1) Risk (positive score); 2) Neutral Zero (no risk and protective factor present); 3) Neutral Risk Protective (equivalent numbers of risk and protective factors); 4) Protective Risk (negative score, but at least one copy of risk allele present); 5) Protective (protective allele only). Supplemental Table 13A shows the summary for each category. We calculated 2 x 2 odds ratio for Risk vs. Neutral Zero, Protective vs. Neutral Zero and Neutral Risk Protective vs. Neutral Zero to measure the risk and protective effects (Supplemental Tables 13B-D) (38).
Second, we calculated HLA genetic burden (HLAGB) for each study subject as the sum of the burden of each MS associated HLA allele as described (39,40). Each allele burden was calculated as the allele dose multiplied by the allele effect size obtained in this study. We generated box plots for HLAGB scores using "ggplot2" package in R programming language.

Assessing Risk and Protective Cumulative Effects
Supplemental Table 13A shows the risk and protective HLA allele/haplotype count summary in the case-control study data set. As expected, statistically significant more risk effects were observed in cases compared to no risk/protective factors present (Supplemental Table 13B: OR = 3.22, CI = 2.37-4.39, Yates p < 0.0001). The result confirms that subjects who carry the positive scores of the risk HLA alleles/haplotypes are more susceptible to MS than the subjects who do not carry risk and/or protective HLA alleles/haplotypes. Conversely, we observed statistically significant more protective effects in controls compared to no risk/protective factors present (Supplemental Table 13C: OR = 0.58, CI = 0.43-0.79, Yates p = 0.00060). In addition, aligned with the trio family results described above, the presence of risk factors predisposes to disease even in the presence of equal number of protective factors (Supplemental Table 13D Figure 2A). We also observed statistically significant differences in HLAGB scores between the control group and parents (Supplemental Figure 2A). The control group shows lower HLAGB driven by higher burden of protective alleles compared to parents who carry the neutralized risk and protective factors. The over-transmission of risk alleles resulted in elevated HLA genetic burden predisposing to MS in their children. We did not find a statistically significant difference of HLAGB between parents (Supplemental Figure 2A), in gender of cases or controls (Supplemental Figure 2BC). Using the classification shown in Supplemental Table 13A, we observed statistically significant difference in both "Risk" and "Protective" groups of MS compared to controls (Supplemental Figure 2D).

DISCUSSION
In the present study, multiple HLA MS-risk and protective alleles were precisely mapped through the analysis of both, coding and non-coding variants. The HLA-DRB5*01:01:01~HLA-DRB1*15:01:01:01 block was observed in 52.8% of the affected children in the trio dataset, confirming the role of this block conferring susceptibility to MS at the four-field resolution (16). We were unable, however, to resolve the effect of individual alleles at HLA-DRB1 and HLA-DRB5 because of the exceptionally tight LD. The haplotype carrying HLA-DRB1*15:01 is the strongest genetic susceptibility factor for MS in Europeans (50), and has been reported as risk in Japanese (9) and African Americans (51). The etiologic role of HLA-DRB5*01:01 in MS was also demonstrated as a risk and disease modifier factor (52)(53)(54).
A novel independent association with susceptibility was identified for HLA-DPB1*104:01, observed in 6. promoter is located in the complementary strand of genomic DNA sequence that overlaps with intron 1 and exon 2 of HLA-DPB1 ( Figure 1) (58). Polymorphisms in promoters in HLA-DPB1 have been associated with autoimmune disease such as systemic sclerosis (59). We identified four SNPs (rs3135020, rs3097670, rs3130169 and rs3128959) that are associated with expression of brain cerebellum for HLA-DPA1, and brain nucleus accumbens (basal ganglia) for HLA-DPB1. The later three SNPs are located within the HLA-DPA1 promoter region or HLA-DPB1 intron 1 (Figure 1) (58). Seven different HLA-DPB1*104:01 alleles that includes intron 1 sequences are recognized as of March 2021, and all alleles share the same nucleotide at three eQTL associated SNPs (rs3097670, rs3130169 and rs3128959) (Supplemental Figure 3). If these eQTLs combined with the structural features for antigen presentation by HLA-DPB1*104:01 were to confer susceptibility, it can be speculated that any of these variants could be involved in MS pathogenesis.
In the class II region, we found one HLA-DRB1 and two HLA-DQB1 protein variants conferring protection to developing MS. The allele HLA-DRB1*01:01:01 appeared to be the principal protective determinant in haplotypes bearing this allele. The examination of interactions between protective and susceptibility alleles, indicate that HLA-DRB1*01:01:01 confers dominant protection. Our analysis is also consistent with protective effects by HLA-DQB1*03:01 and HLA-DQB1*03:03. Furthermore, the HLA-DRB1*09:01 allele reported to be protective in Asian populations (8,9) may have derived from the primary association with HLA-DQB1*03:03 (60, 61). We performed pairwise comparisons of the amino acid sequences of the alleles HLA-DQB1*03:01, HLA-DQB1*03:02 and HLA-DQB1*03:03. The allele HLA-DQB1*03:01 differs from HLA-DQB1*03:02 and HLA-DQB1*03:03 by replacements in 7 and 6 amino acids with 4 and 3 substitutions at residues located at the peptide binding domain (PDB), respectively. In contrast, HLA-DQB1*03:02 and HLA-DQB1*03:03 differ only by the replacement of one amino acid at residue 57 located at the PDB. The alleles associated with protection present the same charged amino acid (Aspartic acid) at this residue while the nonprotective allele carries the hydrophobic Alanine. The amino acid substitutions at residue 57 of DQB1 have been postulated to play important roles in susceptibility and resistance in insulindependent diabetes mellitus (IDDM) (62).
The allele HLA-B*44:02 is frequently associated with the protective allele HLA-DQB1*03:01, thus protection associated with these alleles was difficult to resolve. The neutral transmission of haplotypes bearing both HLA-B*44:02 and the highly susceptibility allele HLA-DRB1*15:01:01:01 is suggestive that the observed susceptibility conferred by HLA-DRB1*15:01:01:01 over-transmitted in haplotypes bearing multiple alleles of HLA-B is specifically neutralized by the presence of HLA-B*44:02. We conclude that the HLA-B*44:02 protein effect in protection is likely to be an independent factor. Conditional logistic regression TDT analysis suggested a statistically significant interaction between the haplotype carrying HLA-DRB1*15:01 and HLA-B*44:02 (RR = 0.55, CI = 0.32-0.94, 2 degree of freedom Wald test p = 3.05e-02).
The HLA allele transmission analysis in family 2965 provides an interesting insight of a protective factor offsetting the effect of a susceptibility allele (Supplemental Figure 1). The observation of protection conferred by HLA-B*27:05:02 and HLA-B*27:02:01 suggests that antigen presenting features of this allele and closely related alleles may confer protection. The HLA-B*27:05 and HLA-B*27:02 are both serologically recognized as B27, and carry the Bw4 ligand for a killer immunoglobulin-like receptor KIR3DL1. The KIR3DL1 in combination with Bw4 epitope was shown to be protective against MS in African Americans (63). We performed gTDT with the first-field allele name (serological equivalent) for the 477 trios and observed the statistical significance of HLA-B*27 (gTDT: RR = 0.37, CI = 0.21-0.63, p = 0.00028) (17).
In the case of protection conferred by the class I haplotype HLA-A*02:01:01:01~HLA-B*40:01:02~HLA-C*03:04:01:01, we hypothesize that these three HLA alleles are not the protective factors per se, and other elements exert protection within the 1. 45 Mb genomic region that is specific to this class I haplotype block.
In summary, HLA-DRB5*01:01:01~HLA-DRB1*15:01:01:01 was significantly associated with predisposition as expected. A second independent risk allele, HLA-DPB1*104:01 was newly identified. HLA-DRB1*01:01:01, HLA-DQB1*03:01 and HLA-DQB1*03:03, showed protective effects. The HLA class I block, HLA-A*02:01:01:01~HLA-C*03:04:01:01~HLA-B*40:01:02 and the alleles HLA-B*44:02, HLA-B*27:05 and HLA-B*38:01 showed moderately protective effects independently from each other and from the class II associated factors. Altogether, the present study demonstrates the effectiveness of high resolution extended coverage typing for dissecting HLA alleles/haplotypes associated with disease susceptibility using family-based segregation analyses. This is the first MS-HLA family study using NGS. In this study both, susceptible and protective candidate HLA alleles/haplotypes were mapped with more precision by eliminating at the same time false signals resulting from 'hitchhiking' alleles (64). The mapping to specific HLA allele structures may allow to design research focused in their functional features facilitating the understanding of the mechanisms involved in disease susceptibility and protection. In addition, the HLA genetic burden (HLAGB) defined by HLA genotype evaluated in this study appears to generate a highly informative risk score (RS) that could be further evaluated in the outcomes of the disease in future studies.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by The University of California, San Francisco Institutional Review Board & the Stanford University Institutional Review Board. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
This study was conceived and designed by KO, MF, and JO. KO conducted data quality control and performed the statistical analysis. KO, MF, and JO were also involved in data interpretation and drafting the manuscript. LC, GM-M, KM, and SG were responsible for the NGS genotyping assays. NI developed the HLGB scores. JH contributed to the analysis plan. SH was responsible for the collection of samples and clinical data. SC and AS were responsible for sample and data management at the UCSF-DNA bank. All authors contributed to the article and approved the submitted version.

FUNDING
This study was supported by a grant from the National Institutes of Health U19NS095774 (JO and MV). The UCSF DNA biorepository is supported by RG-1611-26299 from the National Multiple Sclerosis Society. The content is solely the responsibility of the authors and does not necessarily reflect the official views of the NIAID, NINDS, NIH or United States Government.

ACKNOWLEDGMENTS
We thank the Stanford Blood Center for the support and promotion of the 17 th IHIW endeavor, Tamara Vayntrub and Susan Twietmeyer for their tremendous administrative support of 17th IHIW efforts. We are grateful to the individuals who participate in this study. The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health, and by NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS. The data used for the analyses described in this manuscript were obtained from the GTEx Portal (https://gtexportal.org/home/) on 04/04/2019 and/or dbGaP accession number phs000424.v2.p1 on 04/04/2019.