Genome-Wide Association Study and Gene-Based Analysis of Participants With Hemophilia A and Inhibitors in the My Life, Our Future Research Repository

Introduction Up to 30% of individuals with hemophilia A develop inhibitors to replacement factor VIII (FVIII), rendering the treatment ineffective. The underlying mechanism of inhibitor development remains poorly understood. The My Life, Our Future Research Repository (MLOF RR) has gathered F8 and F9 mutational information, phenotypic data, and biological material from over 11,000 participants with hemophilia A (HA) and B as well as carriers enrolled across US hemophilia treatment centers, including over 5,000 whole-genome sequences. Identifying genes associated with inhibitors may contribute to our understanding of why certain patients develop those neutralizing antibodies. Aim and Methods Here, we performed a genome-wide association study and gene-based analyses to identify genes associated with inhibitors in participants with HA from the MLOF RR. Results We identify a genome-wide significant association within the human leukocyte antigen (HLA) locus in participants with HA with F8 intronic inversions. HLA typing revealed independent associations with the HLA alleles major histocompatibility complex, class II, DR beta 1 (HLA DRB1*15:01) and major histocompatibility complex, class II, DQ beta 1 (DQB1*03:03). Variant aggregation tests further identified low-frequency variants within GRID2IP (glutamate receptor, ionotropic, delta 2 [GRID2] interacting protein 1) significantly associated with inhibitors. Conclusion Overall, our study confirms the association of DRB1*15:01 with FVIII inhibitors and identifies a novel association of DQB1*03:03 in individuals with HA carrying intronic inversions of F8. In addition, our results implicate GRID2IP, encoding GRID2-interacting protein, with the development of inhibitors, and suggest an unrecognized role of this gene in autoimmunity.


INTRODUCTION
Inhibitor development against factor VIII (FVIII) is the most serious complication of replacement factor therapy. Thirty percent of individuals with severe hemophilia A (HA) receiving factor replacement therapy develop inhibitors, rendering the treatment ineffective (1). The underlying basis for why some individuals develop inhibitors while others do not remains poorly understood, but risk factors include ethnic background, family history, and F8 variant, suggesting that genetics is an important contributor (2- 6). Individuals with HA with intronic inversions, large structural variants, or nonsense variants are 7-10 times more likely to develop inhibitors compared with those with milder mutations (4). Individuals with missense variants have the lowest incidence of inhibitors (<10%) presumably because they synthesize FVIII polypeptides that can induce tolerance (5). The intron 22 inversion accounts for up to 50% of mutations among individuals with severe HA and results in a truncated F8 with inverted intron 1 through 22 (4,7). Individuals with inhibitors are classified as low or high responders if their inhibitor titers are below or higher than 5 Bethesda units per mL (BU/mL), respectively. Individuals with high-titer inhibitors are less likely to respond to immune tolerance induction (ITI) (8). Therefore, the genetic risk factors contributing to low or high inhibitors may differ and may be dependent on the underlying pathogenic F8 variant.
Genome-wide association studies (GWAS) have become more widely used as an approach to uncover the etiology of diseases. They have some limitations, however, as they require a large sample size and may miss rare variants. In cases where multiple rare variants with small effects are likely to contribute to the disease, whole exome sequencing and gene burden analysis have been used to provide insights into clinical phenotype (26).
The My Life, Our Future Research Repository (MLOF RR) has gathered F8 and F9 mutational information, phenotypic data, and biological material from over 11,000 participants with hemophilia A and B as well as carriers enrolled across US hemophilia treatment centers, and whole-genome sequencing of over 5,000 genomes (27,28).
Here, GWAS and gene-based analyses were performed in participants with HA from the MLOF RR to identify genes associated with inhibitor development.

My Life, Our Future Study
The MLOF program consists of a collaboration between the American Thrombosis and Hemostasis Network (ATHN), National Hemophilia Foundation, and Bloodworks Northwest, with funding from Bioverativ, a Sanofi company (27). Informed consent was obtained for inclusion of data and biological samples from a subset of participants who were chosen for whole-genome sequencing (WGS). The MLOF study included participants with hemophilia of all types and severities, and whether a causative F8 or F9 variant was identified or not. In this study, only the subset of participants with HA were considered. Phenotypic and demographic data, including self-reported race and ethnicity, were provided by ATHN and were collected across hemophilia treatment centers around the United States.

Whole-Genome Sequencing
WGS of samples was conducted as part of TOPMed (Trans-Omics for Precision Medicine), 1 and data collection, processing, and quality control are described elsewhere (27,29). WGS data were obtained through the database of Genotypes and Phenotypes (dbGaP, phs001515.v1.p1) as a processed Variant Call Format file as well as mapped reads (compressed alignment files). Variants that did not pass quality control or those that were monomorphic were filtered out within the MLOF RR cohort, and 119,015,152 variants remained. Samples duplicated or with sex mismatch were removed (N = 10). We produced a genetic relationship matrix using the PLINK 2.0-make-king option (30) and estimated principal components (PCs) using PC-AiR (31). We generated a final genetic relationship matrix using PC-Relate and the PCs from PC-AiR to account for population structure (32).

Inhibitor Definition
We defined participants with HA as having a history of inhibitors (cases) if they had active inhibitors and currently prescribed bypassing agents or ITI, or if they had a previous history of inhibitors and were not currently prescribed bypassing agents or ITI. We defined controls (participants with HA without inhibitors) as having no history of usage of bypassing agents or ITI and status reported as "no history of inhibitors, " or if presence of inhibitors was ruled out. Participants with high-titer inhibitors had a peak inhibitor titer of >5 BU/mL as measured by the Bethesda inhibitor assay. The low-titer inhibitors group was not defined because we could not discriminate between low-titer participants or participants with missing or nonexhaustive BU measures. Participants with unknown inhibitor status were excluded.

Cohort Definitions
Given the difference in inhibitor prevalence between ethnic groups (3,6), and to reduce confounding due to population structure, we separated the MLOF RR cohort by genetic ancestry.
For each subgroup, we excluded participants if any of their first 2 genetic PCs were >3 standard deviations away from the group mean. The largest subgroup was individuals who reported "White" (European) ethnicity and not "Hispanic, Latino/a, or Spanish origin" (N = 2,246), and we performed discovery analyses on this subset of individuals. We attempted replication of significant associations in the 2 other largest subgroups from this dataset, namely "Black or African American" (N = 310) and "Hispanic, Latino/a, or Spanish origin" (N = 488), excluding participants with mild hemophilia A in the replication cohorts due to small sample size. We restricted all analyses to male participants with HA because only 1 female participant with genetic data had a history of inhibitors. Total numbers of cases and controls included are reported in Table 1.

Single-Variant Analysis
We performed single-variant association tests on inhibitor status as a binary phenotype using logistic mixed-model score tests implemented in the R (3.6.1) package generalized linear mixedmodel association tests (GMMAT) (v1.2.0) (33), adjusting for age, F8 variant, hemophilia A severity, sequencing center, and the first 3 PCs. Pathogenic F8 variant types were defined as large inversions, large structural variants [>50 base pairs (bp)], splice-site, nonsense, frameshift, and other [small (<50 bp) inframe, synonymous, missense, and untranslated region variants]. Disease severity was defined as mild (>5 to <50% FVIII), moderate (1 to <5% FVIII), or severe (<1% FVIII) hemophilia A. We also performed association tests restricted to the subset of participants with intronic inversions using GMMAT, adjusting for age, sequencing center, and the first 3 PCs. The genetic relationship matrix estimated from PC-related was used as input for GMMAT. Analyses were restricted to variants with a minor allele frequency (MAF) >1% and ≥5 expected variant counts in cases. Variants with a P value <5 × 10 −8 were considered statistically significant. We performed association tests for three different phenotype definitions: (1) all-titer inhibitors, (2) high-titer inhibitors (BU > 5), and (3) all-titer inhibitors restricted to participants with intronic inversions. All three case definitions were compared to non-inhibitor controls, restricting to participants with HA with intronic inversion in analysis 3. We generated locus plots using LocusZoom (34).

Replication and Meta-Analysis (Single Variant)
As the score test does not provide effect size estimates, we reanalyzed all variants With P < 1 × 10 −7 in the Europeanancestry discovery analysis using Wald tests implemented in GMMAT, which provides effect size estimates and standard errors. Logistic mixed models were adjusted for covariates. Then, we attempted to replicate those associations in the replication (i.e., non-European ancestry) cohorts using Wald tests, adjusting for F8 mutation type, sequencing center, age, and the first 3 PCs. Given the smaller sample sizes of non-European ancestry cohorts, mutation type was more broadly defined as presence or absence of large F8 structural variants or intronic inversions in those cohorts.
We used inverse variance meta-analysis as implemented in the R package meta to combine results for the discovery and replication analyses (35).

Variant Aggregation Analysis
We conducted gene-based analyses using variant-set mixedmodel association tests (SMMAT) (36) and restricted the analysis to variants with an MAF < 5% and annotated as missense or predicted to have a high impact on protein-coding genes as defined using SnpEff 4.1 (37). We only considered genes with ≥ 3 polymorphic variants and an aggregated expected minor allele count in cases ≥ 5. Gene-based tests were adjusted for covariates as in the single-variant analysis.

Replication and Meta-Analysis (Variant Aggregation Analysis)
Replication tests of genes significantly associated with inhibitors in the discovery analysis were attempted in the replication cohorts using SMMAT and meta analyzed each cohort using smmat.meta included in the GMMAT package, including covariates, as described above.

Human Leukocyte Antigen Typing
We typed HLA alleles based on the WGS data using HLA * LA (bioconda v1.0) (38) and performed 4-digit HLA allele association analysis in R by logistic regression, adjusting for covariates as in the single-variant analysis. We performed HLA allele association for all participants with inhibitors and a subgroup of participants with intronic inversion. Each HLA allele was tested separately based on the presence or absence of the allele. HLA alleles with a Bonferroni-corrected P value < 0.05 were significant. We attempted to replicate significant association in the non-European cohorts. For participants with an intronic inversion, we further tested each HLA allele after conditioning on the presence or absence of DRB1 * 15:01.

Expression Quantitative Trait Loci (eQTL) and Gene Expression
We retrieved expression quantitative trait loci (eQTL) for IGLV (immunoglobulin lambda variable) cluster genes from the Genotype-Tissue Expression portal (version 8). Gene expression of GRID2IP was assessed in the European Bioinformatic Institute (EBI) Expression Atlas through the Open Targets portal (39). We also assessed colocalization between GWAS and eQTL studies through the Open Targets Genetics portal. 2 In single-variant association analyses, only variants with an MAF >1% and ≥5 expected minor allele counts in cases were included, i.e., MAF ≥ 5/(2*N cases ). The column "MAF threshold" reports the approximate minimum allele frequency threshold that satisfies both conditions for each of the 3 main analyses. A similar approach was used for variant aggregation tests, where only genes were included if the aggregated expected minor allele count was ≥5. The number of cases and controls represent the number included in the association analyses after filtering (methods).

Association With Clinical Phenotypes
We tested the association between significant variants identified in the association analyses and the following clinical phenotypes: age at inhibitor diagnosis, peak inhibitor titer, number of bleeds in the last 3 years, successful ITI, and length of ITI treatment. We tested the association between quantitative and binary phenotypes using linear and logistic regression respectively, adjusting for age, genetic ancestry, hemophilia A severity, and F8 variant type. Quantitative phenotypes were log-transformed prior to association testing.

Genome-Wide Association Study of Inhibitors in Participants With HA
We first performed single-variant GWAS of participants with inhibitors in the MLOF RR using logistic mixed models implemented in GMMAT, focusing on participants of European ancestry, the largest subgroup (Supplementary Table 1  To reduce heterogeneity that may be due to differing genetic risk factors between participants with low-(≤5 BU) or high-titer inhibitors (>5 BU), we performed a GWAS including only high-titer inhibitor participants ( Figure 1B and Supplementary Figure 2), and no loci reached genomewide significance (P < 5 × 10 −8 ). However, we detected an association trend for a cluster of variants in linkage disequilibrium (cut-off: r 2 > 0.8) near the chromosome 22 IGLV cluster (Supplementary Figures 3, 4 and Supplementary Table 2). The top variant, rs5756720 (P = 3.6 × 10 −7 ), has low frequency in European ancestry participants (MAF = 2.3%) but reached a frequency of 8% in high-titer inhibitor participants.

Low-Frequency Variants in GRID2IP Are Associated With High-Titer Inhibitors
We performed variant aggregation tests using the efficient hybrid burden and Sequence Kernel Association Test (SKAT) implemented in SMMAT (36) to assess whether aggregating lowfrequency and rare variants (MAF < 0.05) could reveal additional associations. We first tested whether we could capture the effect of F8 mutations on inhibitor status by gene burden tests by omitting F8 mutation type or hemophilia A severity in the regression model. F8 mutations were associated with inhibitors (burden P = 1.6 × 10 −6 , efficient test P = 1.0 × 10 −5 ) (Supplementary Table 6); small F8 mutations were protective against inhibitors, reflecting that the participants carrying those variants are less likely to develop inhibitors compared with participants with large structural variants (Supplementary Table 7). We next assessed whether other genes were associated with inhibitors after controlling for F8 mutation type and severity. We identified a significant association for ZPBP2 with all-titer inhibitors, and ITGB4, GRID2IP, and RGS16 with high-titer inhibitors ( Table 3

Association With Clinical Phenotypes
We investigated whether DRB1 * 15:01, DQB1 * 03:03, and the GRID2IP rs61732374 missense variant were associated with additional clinical phenotypes in MLOF participants, including number of bleeds, peak inhibitor titers, and age at inhibitor diagnosis. We did not observe a significant association between the 2 HLA alleles and number of bleeds, inhibitor titers, or age at inhibitor diagnosis (P > 0.05). However, inhibitor-positive participants carrying the rs61732374-A allele displayed increased inhibitor titers (P = 0.017, Figure 4). In addition, 162 participants underwent ITI. We tested whether these genotypes could influence ITI success or length of treatment but did not identify significant associations (P > 0.05).

DISCUSSION
We performed a large-scale GWAS of FVIII inhibitors in the MLOF RR cohort in participants who underwent WGS. Despite comprehensive genotyping, no variant reached genomewide significance when including all participants with HA, which suggests that our study may be underpowered to detect associations due to the relatively low number of inhibitor cases and/or disease heterogeneity. The latter is supported by FIGURE 3 | Gene-based associations with inhibitors and high-titer inhibitors in the European discovery cohort and around GRID2IP in participants with HA. (A) Quantile-quantile (QQ) plots for inhibitor status, (B) high-titer inhibitor status, and (C) inhibitor status in patients with intronic F8 inversions in the European discovery cohort for gene-based variant aggregation tests. Association P values were calculated using logistic mixed-model score tests using variant-set mixed-model association tests (SMMAT), adjusting for age, sequencing center, F8 mutation type, hemophilia A severity, and the first 3 principal components. (D) LocusZoom plot reporting single-variant generalized linear mixed-model association tests (GMMAT) score test P values of high-titer inhibitors in participants with hemophilia A around GRID2IP. Logistic mixed models were adjusted for age, sequencing center, F8 mutation type, hemophilia A severity, and the first three principal components. In contrast to the main single-variant analysis with high-titer inhibitor, all variants with an allele frequency >1% were included to show the association with the top coding GRID2IP variant included in variant aggregation tests (rs61732374, P = 4 × 10 −6 ). GRID2IP, glutamate receptor, ionotropic, delta 2 [GRID2] interacting protein 1; GWAS, genome-wide association study; LD Ref Var, linkage disequilibrium reference variant; HA, hemophilia A. the detection of significant association of a variant within the HLA region with inhibitors in participants carrying intronic inversions.
To understand which HLA alleles may contribute to inhibitor risk, we typed and performed an association analysis of HLA alleles at 4-digit resolution. We identified an association between the presence of DRB1 * 15:01 and increased risk of inhibitors. This association was stronger in participants with intronic inversions, in whom we identified a secondary independent association with DQB1 * 03:03. DRB1 * 15:01 has been previously identified as associated with inhibitors, particularly in participants with intron 22 inversions (23). To our knowledge, we report for the first time an association with DQB1 * 03:03, which was only detectable after conditioning on DRB1 * 15:01. In addition to increased sample size, this may partly explain why association with this allele was not detected in previous studies (23,24,41,42). Nonetheless, this result will require replication in independent studies.
HLA genes identified in this study encode major histocompatibility complex class II molecules, which may FIGURE 4 | The GRID2IP rs61732374-A allele is associated with increased peak inhibitor titer. Significance was evaluated using a linear regression model adjusting for age, hemophilia A severity, F8 mutation type, and genetic ancestry group. Inhibitor titers were log transformed prior to association testing. The analysis included black or African American as well as European ancestry participants with history of inhibitors (current or past) and available peak inhibitor titer information (N = 310). The analysis could not be performed in the Hispanic, Latino/a, or Spanish group as the A-allele was not observed in this population. BU, Bethesda Units. * P < 0.05. contribute to inhibitor formation through presentation of F8 peptides to CD4 T-helper cells by antigen-presenting cells. In addition, our results suggest that HLA alleles may modulate inhibitor risk (5) differently depending on the underlying F8 mutation. Different F8 missense variants, such as p.Arg612Cys and p.Arg2169His, are more likely to lead to inhibitor development in part because they lead to higher levels of novel binding between treatment FVIII peptides and HLA-DRB1 (43). Similar mechanisms may be at play for FVIII inversions, although individuals with HA with the intron-22 inversion may be partly tolerized to FVIII, as the full FVIII sequence remains expressed intracellularly as 2 non-secreted polypeptides (44).
The HLA variant DPB1 * 02:02 has been previously identified as a risk factor for inhibitor development in the MLOF study (45). The variant was not included in our analysis as its frequency was under our inclusion threshold (<1%). Outside the HLA region, we also detected a potential association with variants within the chromosome 22 IGLV cluster. The IGLV genes encode the variable region of immunoglobulins, which defines the antigenbinding site. Therefore, it is plausible that variants that modulate IGLV genes outside their coding sequence may influence the risk of developing neutralizing autoantibodies, although we only detected colocalizing eQTLs for IGLV pseudogenes (Supplementary Figure 4).
Finally, variant aggregation tests identified an association of rare variants in ZPBP2 with inhibitors, and ITGB4, GRID2IP, and RGS16 with high-titer inhibitors. Both ZPBP2 and GRID2IP are part of loci that have been previously associated with autoimmune disease, including asthma and inflammatory bowel diseases (IBDs) (46)(47)(48)(49)(50)(51)(52). This suggests that inhibitor risk may be modulated by common loci associated with inhibitors. It is possible that the associations detected in this study are capturing the effect of common nearby variants. However, the strongest association for inhibitors around GRID2IP was with the low-frequency coding variant rs61732374, in weak linkage disequilibrium with nearby variants (r 2 < 0.2) (Supplementary Figure 3). In addition, all variants included in ZPBP2 were rare (MAF < 1%).
We observed some inflation at the tails of gene-based test quantile-quantile plots, particularly for the high-titer inhibitor analysis, although this was not reflected in the genomic control (λ = 0.96). This may be due to the imbalance between the number of case and control participants (6% high-titer inhibitor cases). Nevertheless, we were able to replicate the GRID2IP association in Black or African American participants. The association between inhibitors and GRID2IP variants was only observed in the high-titer inhibitor group. Consistently, we observed a significant association between the low-frequency GRID2IP missense variant, rs61732374 (Arg504Cys), and increased inhibitor titers at peak in participants with a history of inhibitors. Although our results suggest that variants in GRID2IP may increase the risk of developing high levels of inhibitors, we did not identify a significant impact on ITI outcomes in the subset of participants who underwent this treatment. GRID2IP encodes the glutamate receptor ionotropic delta 2 (GRID2) interacting protein, which is predominantly expressed in the brain and functions as a linker between GRID2 and actin at the parallel fiber-Purkinje cell synapse (53). GRID2IP expression is strongly upregulated in peripheral blood mononuclear cells of patients with vitiligo (54). Interestingly, another GRID2IP missense variant, rs184043502, was strongly associated with IBDs in the FinnGen biobank (P = 2 × 10 −10 ) (Supplementary Table 12). In addition, significant association at the GRID2IP locus for irritable bowel disease and white blood cell counts strongly colocalized with GRID2IP blood eQTLs (posterior colocalization probability > 0.97) (39,49,55). Together, this suggests that GRID2IP may have an unrecognized role in autoimmunity.

CONCLUSION
In conclusion, GWAS and HLA typing revealed a significant association of variants in the HLA region, in particular HLA DRB1 * 15:01 and DQB1 * 03:03, with inhibitors in participants with F8 intronic inversions. Our results suggest that the underlying cause of FVIII inhibitors is likely heterogeneous and dependent on the underlying F8 pathogenic variant. Finally, we identified and replicated an association between low-frequency coding variants in GRID2IP and FVIII inhibitors, highlighting a potential role of the gene in anti-drug antibody development.

DATA AVAILABILITY STATEMENT
The original contributions presented in this study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

ETHICS STATEMENT
The Coordinating Ethics Committee of the Hospital District of Helsinki and Uusimaa (HUS) approved the FinnGen study protocol Nr HUS/990/2017. MLOF phenotypic data were provided by ATHN under the WIRB Protocol #20122010 on a deidentified basis. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
SL designed the study, carried out analyses, interpreted data, performed statistical analyses, and wrote the manuscript. CH analyzed data and performed statistical analyses. KK, DR, and JD designed the study, collected and interpreted data, and wrote the manuscript. JD, TH, and CL designed the original research plan. All authors provided critical revision of the manuscript and had final approval of the manuscript for publication.

ACKNOWLEDGMENTS
We would like to thank Till Andlauer, Glenn Perce, Joe Salas, Robert Peters, Alexandra Hicks, Clément Chatelain, and Len Valentino for the productive discussions and feedback. We also thank Graham Neill for providing feedback and helping with the preparation of the manuscript. We would also like to thank all participants in the MLOF study, their families, and the hemophilia treatment center staff. The MLOF Research Repository is a collection of genetic samples and data developed through the collaboration of the American Thrombosis and Hemostasis Network, Bloodworks Northwest, and the National Hemophilia Foundation with support from Bioverativ (Sanofi) and under the leadership of the project's Principal Investigator, Barbara Konkle. Phenotypic data is provided through the ATHNdataset, under the stewardship of the American Thrombosis and Hemostasis Network. The partners acknowledge the dedicated efforts of the hemophilia treatment centers and their patients for contributing to this Research Repository. Molecular data for the TOPMed program were supported by the National Heart, Lung, and Blood Institute (NHLBI). Genome Sequencing for "NHLBI TOPMed: MLOF" (phs001515) was performed at Baylor (HHSN268201600033I) and New York Genome Center (NYGC) Genomics (HHSN268201500016C). Core support including centralized genomic read mapping and genotype calling, along with variant quality metrics and filtering, was provided by the TOPMed Informatics Research Center (3R01HL-117626-02S1; contract HHSN268201800002I). Core support including phenotype harmonization, data management, sample-identity quality control, and general program coordination were provided by the TOPMed Data Coordinating Center (R01HL-120393; U01HL-120393; contract HHSN268201800001I). We gratefully acknowledge the studies and participants who provided biological samples and data for TOPMed. We acknowledge the FinnGen consortium for access to release 6 datasets used to perform phenome-wide association analyses. The FinnGen project is funded by 2 grants from Business Finland