Genome-Wide Association Analyses Identify Variants in IRF4 Associated With Acute Myeloid Leukemia and Myelodysplastic Syndrome Susceptibility

The role of common genetic variation in susceptibility to acute myeloid leukemia (AML), and myelodysplastic syndrome (MDS), a group of rare clonal hematologic disorders characterized by dysplastic hematopoiesis and high mortality, remains unclear. We performed AML and MDS genome-wide association studies (GWAS) in the DISCOVeRY-BMT cohorts (2,309 cases and 2,814 controls). Association analysis based on subsets (ASSET) was used to conduct a summary statistics SNP-based analysis of MDS and AML subtypes. For each AML and MDS case and control we used PrediXcan to estimate the component of gene expression determined by their genetic profile and correlate this imputed gene expression level with risk of developing disease in a transcriptome-wide association study (TWAS). ASSET identified an increased risk for de novo AML and MDS (OR = 1.38, 95% CI, 1.26-1.51, Pmeta = 2.8 × 10–12) in patients carrying the T allele at s12203592 in Interferon Regulatory Factor 4 (IRF4), a transcription factor which regulates myeloid and lymphoid hematopoietic differentiation. Our TWAS analyses showed increased IRF4 gene expression is associated with increased risk of de novo AML and MDS (OR = 3.90, 95% CI, 2.36-6.44, Pmeta = 1.0 × 10–7). The identification of IRF4 by both GWAS and TWAS contributes valuable insight on the role of genetic variation in AML and MDS susceptibility.

Given the evidence of a shared genetic basis across BCM and the underlying genetic predisposition for AML and myelodysplastic syndromes (MDS) observed in family, epidemiological, and genetic association studies (Goldin et al., 2012;Gao et al., 2014;Churpek, 2017;Walker et al., 2019), we hypothesized that germline variants may contribute to both AML and MDS development. Using the DISCOVeRY-BMT (Determining the Influence of Susceptibility COnveying Variants Related to one-Year mortality after BMT) study population (2309 cases and 2814 controls), we performed AML and MDS genome-wide association studies in European Americans and used these data sets to inform our hypothesis. To address the disease heterogeneity within and across our data we used a validated meta-analytic association test based on subsets (ASSET; Bhattacharjee et al., 2012). ASSET tests the association of SNPs with all possible AML and MDS subtypes and identifies the strongest genetic association signal. To systematically test the association of genetically predicted gene expression with disease risk, we performed a transcriptome wide association study (TWAS; Gamazon et al., 2015;Gusev et al., 2016). This allows a preliminary investigation into the role of non-coding risk loci, which might be regulatory in nature, that impact expression of nearby genes. The TWAS statistical approach, PrediXcan (Gamazon et al., 2015), was used to impute tissuespecific gene expression from a publicly available whole blood transcriptome panel into our AML and MDS cases and controls. The predicted gene expression levels were then tested for association with AML and MDS. The use of both a GWAS and TWAS in the DISCOVeRY-BMT study population allowed us to identify AML and MDS associations with IRF4, a transcription factor which regulates myeloid and lymphoid hematopoietic differentiation, and has been previously identified in GWAS of BCM (Law et al., 2017).

Study Design and Population
Our study was a nested case-control design derived from the parent study DISCOVeRY-BMT (Determining the Influence of Susceptibility COnveying Variants Related to 1-Year Mortality after unrelated donor Blood and Marrow Transplant) (Hahn et al., 2015). The DISCOVeRY-BMT cohort was compiled from 151 centers around the world through the Center for International Blood and Marrow Transplant Research (CIBMTR). Briefly, the parent study was designed to find common and rare germline genetic variation associated with survival after an URD-BMT. DISCOVeRY-BMT consists of two cohorts of ALL, AML and MDS patients and their 10/10 human leukocyte antigen (HLA)-matched unrelated healthy donors. Cohort 1 was collected between 2000 and 2008, Cohort 2 was collected from 2009 to 2011.
Acute myeloid leukemia and MDS patients were selected from the DISCOVeRY-BMT patient cohorts and used as cases and all the unrelated donors from both cohorts as controls. AMLsubtypes included de novo AML with normal cytogenetics, de novo AML with abnormal cytogenetics and therapy-related AML (t-AML). De novo AML patients did not have precedent MDS, chemotherapy or radiation for prior cancers. MDS subtypes included de novo MDS, defined as patients without precedent chemotherapy or radiation for prior cancers, and therapy-related MDS (t-MDS). Patient cytogenetic subtypes were available, however due to limited sample sizes for each cytogenetic risk group, we consider here only broad categories. Controls were unrelated, healthy donors aged 18-61 years who passed a comprehensive medical exam and were disease-free at the time of donation. All patients and donors provided written informed consent for their clinical data to be used for research purposes and were not compensated for their participation.

Genotyping, Imputation, and Quality Control
Genotyping and quality control in the DISCOVeRY-BMT cohort has previously been described in detail (Hahn et al., 2015;Clay-Gilmour et al., 2017;Karaesmen et al., 2017;Zhu et al., 2018). Briefly, samples were assigned to plates to ensure an even distribution of patient characteristics and genotyping was performed at the University of Southern California Genomics Facility using the Illumina Omni-Express BeadChip R containing approximately 733,000 single nucleotide polymorphisms (SNPs; Yan et al., 2012). SNPs were removed if the missing rate was >2.0%, minor allele frequency (MAF) < 1%, or for violation of Hardy Weinberg equilibrium proportions (P < 1.0 × 10 −4 ).
Problematic samples were removed based on the SNP missing rate, reported-genotyped sex mismatch, abnormal heterozygosity, cryptic relatedness, and population outliers. Population stratification was assessed via principal components analysis using Eigenstrat software (Price et al., 2006) and a genomic inflation factor (λ) was calculated for each cohort. Following SNP quality control, 637,655 and 632, 823 SNPs from the OmniExpress BeadChip in Cohorts 1 and 2, respectively were available for imputation. SNP imputation was performed using Haplotype Reference Consortium, hg19/build 37 1 via the Michigan Imputation server McCarthy et al., 2016). Variants with imputation quality scores <0.8 and minor allele frequency (MAF) < 0.005 were removed yielding almost 9 million high quality SNPs available for analysis in each cohort.

Genome-Wide SNP Associations With AML and MDS
Quality control and statistical analyses were implemented using QCTOOL-v2, R 3.5.2 (Eggshell Igloo), Plink-v1.9, and SNPTEST-v2.5.4-beta3. Logistic regression models adjusted for age, sex, and three principal components were used to perform single SNP tests of association with de novo MDS, t-MDS, AML by subtype (de novo AML with normal cytogenetics, de novo AML with abnormal cytogenetics and t-AML) in each cohort. European American healthy donors were used as controls. SNP meta-analyses of cohorts 1 and 2 were performed by fitting random effects models (Lee et al., 2017). To identify the strongest association signal with AML and MDS we conducted a summary statistic SNP-based association analysis (ASSET) implemented in R statistical software (Bhattacharjee et al., 2012). ASSET tests each SNP for association with outcome using an exhaustive search across non-overlapping AML and MDS case groups while accounting for the multiple tests required by the subset search, as well as any shared controls between groups (Bhattacharjee et al., 2012).

Heritability Estimation of AML and MDS
We calculated heritability of AML and MDS combined and by independent subtypes as the proportion of phenotypic variance explained by all common genotyped SNPs, using the genomebased restricted maximum likelihood method performed with the Genome-wide Complex Trait Analysis (GCTA) software (Yang et al., 2011;Deary et al., 2012;Lee et al., 2012). We report heritability on the observed scale due to genome-wide genotyped variants as well as heritability on the liability scale assuming AML and MDS disease prevalence of 0.0001 (Lee et al., 2013;Lu et al., 2014;Mitchell et al., 2015).

Transcriptome-Wide Association Study of AML and MDS
To prioritize GWAS findings and identify expression quantitative trait loci (eQTL)-linked genes, we carried out a gene expression tests of association of de novo AML and MDS using PrediXcan (Gamazon et al., 2015). This method leverages the well-described functional regulatory enrichment in genetic variants relatively close to the gene body (i.e., cis-regulatory variation) to inform models relating SNPs to gene expression levels in data with both gene expression and SNP genotypes available. Robust prediction models are then used to estimate the effect of cis-regulatory variation on gene expression levels. Using imputation, the cisregulatory effects on gene expression from these models can be predicted in any study with genotype measurements, even if measured gene expression is not available. Thus, we imputed the cis-regulatory component of gene expression into our data for each individual using models trained on the whole blood transcriptome panel (n = 922) from the Depression Genes and Networks (DGN; Battle et al., 2014), yielding expression levels of 11,200 genes for each case and control. The resulting estimated gene expression levels were then used to perform genebased tests of differential expression between AML and MDS cases and controls adjusted for age and sex. A fixed effects model with inverse variance weighting using the R package Metafor was used for meta-analysis of cohorts 1 and 2. A Bonferronicorrected transcriptome wide significance threshold was set at P < 4.5 × 10 −6 .

Functional Annotation of Genetic Variation Associated With AML and MDS
To better understand the potential function of the variants identified by GWAS and ASSET analyses we annotated significant SNPs using publicly available data. eQTLGen, a consortium analyses of the relationship of SNPs to gene expression in 30,912 whole blood samples, was used to determine if significant and suggestive SNPs (P < 5 × 10 −6 ) were whole blood cis-eQTL, defined as allele specific association with gene expression (Võsa et al., 2018). Genotype-Tissue Expression project (GTEx) was used to test for significant eQTLs in > 70 additional tissues (Carithers et al., 2015). We also tested for difference in the log fold tissue expression of IRF4 in an independent sample of AML samples from The Cancer Genome Atlas (N = 170) and GTEx (N = 70). AML and MDS SNP associations were also placed in context of previous GWAS using Phenoscanner, a variant-phenotype comprehensive database of large GWAS, which includes results from the NHGRI-EBI GWAS catalog, the UK Biobank, NIH Genome-Wide Repository of Associations between SNPs and Phenotypes and publicly available summary statistics from more than 150 published genome association studies. Results were filtered at P < 5 × 10 −8 and the R statistical software package phenoscanner 2 was used to download all data for our significant variants (Staley et al., 2016). Chromatin state data based on 25state Imputation Based Chromatin State Model across 24 Blood, T-cell, HSC and B-cell lines was downloaded from the Roadmap Epigenomics project 3 (Roadmap Epigenomics Consortium et al., 2015). Figures including chromatin state information and results from previous GWAS were constructed using the R Bioconductor package gviz (Mifsud et al., 2015;Cairns et al., 2016;Spurrell et al., 2016). Lastly, we sought to identify promoter interaction regions (PIR), defined as significant interactions between gene promotors and distal genomic regions. Variants in PIRs can be connected to potential gene targets and thus can impact gene function (Spurrell et al., 2016). Briefly Hi-C libraries, enriched for promoter sequences, are generated with biotinylated RNA baits complementary to the ends of promoter-containing restriction fragments. Promoter fragments become bait for pieces of the genome that are targets with which they frequently interact, allowing regulatory elements and enhancers to be pulled down and sequenced. Statistical tests of bait-target pairs are done to define significant PIRs and their targets (Cairns et al., 2016;Schofield et al., 2016;Schoenfelder et al., 2018). To identify the genomic features with which our significant SNPs might be interacting via chromatin looping we used publicly available Promoter Capture Hi-C (PCHi-C) data on a lymphoblastoid cell line (LCL), GM12878, and two ex vivo CD34 + hematopoietic progenitor cell lines (primary hematopoietic G-CSF mobilized stem cells and hematopoietic stem cells) (Schofield et al., 2016). We integrated our SNP data with the PCHi-C cell line data and visualized these interactions using circos plots (Yu et al., 2018).

DISCOVeRY-BMT Cases and Controls
Results of quality control have been described elsewhere (Karaesmen et al., 2017). Following quality control, the DISCOVeRY-BMT cohorts include 1,769 AML and 540 MDS patients who received URD-BMT as treatment and 2,814 unrelated donors as controls (Supplementary Table 1 Table 1). Overall, the distribution of antecedent cancers differed significantly between t-MDS and t-AML, with almost 2/3 of. t-MDS and 1/3 of t-AML patients diagnosed with a prior hematologic cancer.

SNP Associations With AML and MDS
Genome-wide association studies of AML by subtype (de novo abnormal cytogenetics, de novo normal cytogenetics and t-AML) and MDS (de novo and t-MDS) are shown in Supplementary  Figure 1. No population stratification was observed in PCA analysis and genomic inflation factor for cohort 1 and 2 were 1.04 and 1.03, respectively. Quantile-quantile plots of SNPs after post-imputation quality control (MAF > 0.005, imputation quality scores > 0.8) are shown in Supplementary Figure 3. To identify loci that show association with AML and MDS we used ASSET. For SNPs to be considered, we used previously defined criteria, which required ASSET SNP associations at P ≤ 5.0 × 10 −8 with significant individual one-sided subset tests (P < 0.01), the variant association could not be driven by a single disease nor could it be both positively and negatively associated in different cohorts of the same disease (Law et al., 2017). In the ASSET GWAS analyses we identified a novel typed SNP associated with AML and MDS on Chromosome 6 ( Figure 1). The T allele at rs12203592, a variant in intron 4 of Interferon Regulatory Factor 4 (IRF4), conferred increased risk of de novo abnormal cytogenetic AML, de novo normal cytogenetic AML, MDS and t-MDS (OR = 1.38; 95% CI, 1.26-1.51, Pmeta = 2.8 × 10 −12 ). T-AML showed no association with rs12203592. The effect allele frequency was 19% in de novo AML, MDS and t-MDS cases versus 14% in controls. ASSET analyses also identified another variant in modest linkage disequilibrium (LD), r 2 = 0.7, with rs12203592 in the regulatory region of IRF4; the A allele at rs62389423, showed a putative association with de novo AML and MDS (OR = 1.36; 95% CI, 1.21-1.52, Pmeta = 1.2 × 10 −7 ) (Figure 2A). We identified one significant association in the subtype GWAS which was disease specific. The C allele in rs78898975 in TATA-box binding protein associated factor 2 (TAF2), associated with an increased risk of t-MDS (ORmeta = 5.87, 95% CI = 3.20, 10.76, Pmeta = 9.9 × 10 −9 ) but not de novo MDS (OR = 1.8, 95% CI = 0.81, 1.45, Pmeta = 0.20) FIGURE 1 | ASSET analysis and associations by AML and MDS subgroup. Forest plot of the odds ratios (OR) for the association between rs12203592 in IRF4 and MDS and AML subtypes. The variant resides in the Chromosome 6 outside the major histocompatibility complex region. Studies were weighted by inverse of the variance of the log (OR). The solid gray vertical line is positioned at the null value (OR = 1); values to the right represent risk increasing odds ratios. Horizontal lines show the 95% CI and the box is the OR point estimate for each case-control subset with its area proportional to the weight of the patient group. The diamond is the overall effect estimated by ASSET, with the 95% CI given by its width.
Frontiers in Genetics | www.frontiersin.org E029:Primary monocytes from peripheral blood; E030:Primary neutrophils from peripheral blood; E031:Primary B cells from cord blood; E032:Primary B Cells from peripheral blood; E033:Primary T Cells from cord blood; E034:Primary T Cells from blood; E035:Primary hematopoietic stem cells; E036:Primary hematopoietic stem cells short term culture; E037:Primary T helper memory cells from peripheral blood 2; E038:Primary T help naïve cells from peripheral blood; E039:Primary T helper naïve cells from peripheral blood; E040:Primary T helper memory cells from peripheral blood 1; E041:Primary T helper cells PMA-Ionomycin stimulated; E042:Primary T helper 17 cells PMA-Ionomycin stimulated; E043:Primary T helper cells from peripheral blood; E044:Primary T regulatory cells from peripheral blood; E045:Primary T cells effector/memory enriched from peripheral blood; E046:Primary Natural Killer cells from peripheral blood; E047:Primary T CD8 naïve cells from peripheral blood; E048:Primary T CD8 memory cells from peripheral blood; E-50:Primary hematopoietic stem cells G-CSF mobilized Female; E-51:Primary hematopoietic stem cells G-CSF mobilized Male; E062:Primary Mononuclear Cells from Peripheral Blood; E0116 Lymphoblastic Cell Line. The colors indicate chromatin states imputed by ChromHMM and shown in the key titled "Roadmap Chromatin State." (Supplementary Figure 1). The effect allele frequency was 7% in t-MDS, 2% in de novo MDS and 1.5% in controls.
A previous genome-wide association study of AML done in European American cases and controls reported a susceptibility variant in BICRA (rs75797233) (Walker et al., 2019). The variant was not significantly associated with AML risk in our meta-analyses (OR = 1.08, 95% CI = 0.78-1.37). However, their cohort did not include patients who received an allogeneic transplant as curative therapy and the distribution of AML subtypes differed between the studies. DISCOVeRY-BMT AML cases consisted of more unfavorable cytogenetic cases (trisomy, monosomy) than the AML cases from Walker et al. (2019) BICRA (rs75797233) SNP may be more likely associated with cytogenetic subtypes that comprise a prognostically less severe AML. In addition, the lower frequency (MAF = 0.02) of this imputed variant (info score > 0.8 in both cohorts) possibly reduced power to detect an effect.

Functional Annotation of SNP Associations With AML and MDS
Multiple GWAS of healthy individuals have shown associations between the T allele at rs12203592 and higher eosinophil counts, lighter skin color, lighter hair, less tanning ability, and increased freckling (Astle et al., 2016;Staley et al., 2016). GWAS have also identified associations between this allele and increased risk of childhood acute lymphoblastic leukemia in males, nonmelanoma skin cancer, squamous cell carcinoma, cutaneous squamous cell carcinoma, basal cell carcinoma, actinic keratosis, and progressive supranuclear palsy ( Figure 2B; Staley et al., 2016). Furthermore, analyses of multiple B-cell malignancies recently identified a rs9392017, adjacent to IRF4, as a pleiotropic susceptibility variant associated with both CLL and HL (Di Bernardo et al., 2008;Mifsud et al., 2015;Schofield et al., 2016;Law et al., 2017). This SNP is approximately 40Kb away from rs12203592, although not in LD (r 2 = 0.01).
The rs12203592 risk allele associated with increased expression of IRF4, P = 1.48 × 10 −29 in whole blood (Võsa et al., 2018). IRF4 is a key transcription factor for lymphoid and myeloid hematopoiesis (Shaffer et al., 2008;Pratt et al., 2010;Havelange et al., 2011;Salaverria et al., 2011) and rs12203592 resides in a regulatory region across Blood, HSC, B-Cell and T-Cell lines ( Figure 2C). The variant's regulomedb score indicates how likely a variant is to be a regulatory element from 1a (most likely) to 7 (no data); the variant's score of 2b, indicates the variant is likely to affect transcription factor binding (Boyle et al., 2012). While the HL and CLL pleiotropic variant rs9392017 was not a significant eQTL for IRF4 in whole blood, PCHi-C cell line data from both GM12878 and the ex vivo CD34 + hematopoietic progenitor cell lines show chromatin looping between rs9392017 and the regulatory region containing rs12203592 (Supplementary Figure 2).

Heritability Estimates of AML and MDS
The heritability of AML and MDS on the observed scale due to genotyped variants was 0.46 with standard error (SE) = 0.07. Transforming this to the liability scale and assuming a disease prevalence of 0.0001 resulted in a heritability of 0.10 (SE = 0.02) which differed significantly from a heritability of zero (P = 2.0 × 10 −16 ). The proportion of variance in de novo AML with normal cytogenetics and de novo MDS on the liability scale had similar heritability at 9%, SE = 0.03, P = 1.9 × 10 −3 and 14%, SE = 0.04, P = 1.4 × 10 −4 , respectively. Treatment-related AML and MDS were tested independently and estimated proportion of variance explained by all SNPs was 7% for t-AML and 4% for t-MDS, however SE were high and the heritability did not significantly differ from zero.

Transcriptome-Wide Association Study-PrediXcan
Using PrediXcan (Gamazon et al., 2015) gene expression imputation models trained on the DGN data set, we identified one transcriptome wide significant gene associated with de novo AML and MDS. Increased expression of IRF4 was associated with an increased risk for the development of de novo AML and MDS (OR = 3.90; 95% CI, 2.36-6.44, Pmeta = 1.0 × 10 −7 ), consistent with our SNP-level findings (Figure 3). This association is consistent with gene expression analyses of TCGA AML samples compared to GTEx whole blood show IRF4 expression is 1.75-fold greater in AML samples than GTEx whole blood (P < 0.01). Whole blood transcriptome models also identified two additional genes with suggestive associations with de novo AML and MDS. Increased expression of AKT Serine/Threonine Kinase 1, AKT1 at 14q32.33 was associated with risk for the development of de novo AML and MDS (OR = 1.56; 95% CI, 1.25-1.95, Pmeta = 1.0 × 10 −4 ) (Figure 4). Likewise, increased expression of Ras guanyl nucleotide-releasing protein 2, RASGRP2, was associated with an increased risk for development of de novo AML and MDS (OR = 4.05; 95% CI, 1.84-8.91, Pmeta = 5 × 10 −4 ) (Figure 4). Other suggestive gene associations (Pmeta < 5 × 10 −4 ) were identified with limited or no evidence of biological plausibility to AML/MDS etiology (Supplementary Table 2).

DISCUSSION
We performed the first large scale AML and MDS GWAS in a URD-BMT population providing evidence of novel pleiotropic risk loci associated with increased susceptibility to AML and MDS. The DISCOVeRY-BMT cohorts from the Center for International Blood and Marrow Transplant Research (CIBMTR) allow us to capture ∼99% of all AML and MDS patients who received an URD-BMT performed in the United States within the given time-frame (i.e., Cohort 1: 2000and Cohort 2: 2009. We identified an association between the T allele at rs12203592 (typed) in IRF4 and an increased risk for the development of de novo AML, de novo MDS and t-MDS in patients who had undergone URD-BMT compared to healthy donor controls. While therapy-related myeloid neoplasms have been shown to be genetically and etiologically similar to other high-risk myeloid neoplasms (McNerney et al., 2017), in our transplant population t-AML did not associate with this variant, while t-MDS did show evidence of association with rs12203592. We also identified a genome-wide significant t-MDS variant which was an eQTL for both TAF2 and DEPTOR genes. Differences in associations identified in t-MDS compared to t-AML, could be due to factors related to underlying susceptibility. For example, 61% of t-MDS cases primary diagnosis was a hematologic malignancy, whereas, only 31% of t-AML cases primary diagnosis was a hematologic malignancy (Supplementary Table 1). However this merits further exploration in larger cohorts We also provide the first estimates of the heritability of AML and MDS, at between 9 and FIGURE 4 | Regional plots of PrediXcan-TWAS and SNP associations with AML and MDS. Each box represents PrediXcan-TWAS significant genes AKT1, IRF4 and RASGRP2 ± 0.5 megabases. The grey shaded bars represent the gene, where height is gene expression association and width is gene region in base pairs and the purple dots represent SNP associations with AML and MDS -log10 (P-values) are shown on the y-axis. Green and red lines denote the transcriptome-wide and genome wide significant P-values, respectively. Results are filtered for imputation quality (rsq > 0.8) and heterogeneity of effect between cohorts.
14%, which are in line with other GWAS of cancer heritability on the liability scale, indicating that genetic variation contributes to AML and MDS susceptibility (Sampson et al., 2015).
The rs12203592 SNP has been shown to regulate IRF4 transcription by physical interaction with the IRF4 promoter through a chromatin loop (Visser et al., 2015). This SNP resides in an important position within NFkB motifs in multiple blood and immune cell lines, supporting the hypothesis that this SNP may modulate NFkB repression of IRF4 expression (Ward and Kellis, 2012;Kheradpour and Kellis, 2014). Furthermore, this SNP resides in a hematopoietic transcription factor that has been previously identified to harbor a hematological cancer susceptibility locus, rs9392017, which we show interacts with the region containing our susceptibility variant. These data add to the mounting evidence that there could be pleiotropic genes across multiple hematologic cancers (Slager et al., 2012;Mitchell et al., 2016;Law et al., 2017;Went et al., 2018;Vijayakrishnan et al., 2019). Imputed gene expression logistic regression models showed a significant association between higher predicted levels of IRF4 expression and the risk for development of de novo AML or MDS (Gamazon et al., 2015). We also see this in the TCGA data, where gene expression analyses show significantly higher expression of IRF4 in AML samples versus GTEx samples. Although IRF4 functions as a tumor suppressor gene in early B-cell development (Acquaviva et al., 2008), in multiple myeloma IRF4 is a well-established oncogene (Shaffer et al., 2008),with oncogenic implications extending to adult leukemias (De Silva et al., 2012) and lymphomas (Bisig et al., 2012), as well as pediatric leukemia. IRF4 overexpression is a hallmark of activated B-cell-like type of diffuse large B-cell lymphoma and associated with classical Hodgkin lymphoma (cHL), plasma cell myeloma and primary effusion lymphoma (Carbone et al., 2002). In a case-control study of childhood leukemia increased IRF4 expression was higher in immature B-common acute lymphoblastic leukemia and T-cell leukemia with the highest expression levels in pediatric AML patients compared to controls (Adamaki et al., 2013). In addition to the CLL genetic susceptibility loci identified in IRF4, high expression levels of the gene have been shown to correlate with poor clinical prognosis (Allan et al., 2010).
Transcriptome-wide association studies can be a powerful tool to help prioritize potentially causal genes. It is, however, imperative to investigate the SNP and gene-expression associations in the context of the surrounding variants and genes to reduce the possibility of a false signal from co-regulation. Coregulation can occur when there are multiple GWAS and TWAS hits due to linkage disequilibrium and thus it becomes difficult to determine which locus is driving the phenotypic association. In our study, the SNP rs12203592 is a significant eQTL for only IRF4, this implies that the SNP and imputed gene expression signal we identified is not being driven by co-regulation of neighboring SNPs and/or genes. When considering non-imputed gene expression sets, eQTLGen (Võsa et al., 2018) corroborates this finding; rs12203592 is significantly associated with only increased expression of IRF4. In addition, the relationship of rs12203592 to IRF4 expression in blood seems tissue specific, as GTEx data across over 70 tissues shows association with only lung tissue at P = 9.1 × 10 −9 . The specificity of rs12203592 to IRF4 expression in blood and the lack of correlation between IRF4 expression and other genes in DISCOVeRY-BMT give confidence that the observed ASSET association is the potential susceptibility locus in the region. The functional significance of variants in this gene in hematopoiesis and its previous recognition as a locus associated with the risk for development of other hematological malignancies, further strengthen the evidence of an association of IRF4 with development of AML and MDS. A limitation of the TWAS metholodology is the highly tissue specific nature of gene expression and regulation. Our use of whole blood from GTEx to create relevant genotype weights may not be representative of gene expression in AML and MDS cases but more so controls.
In addition to IRF4, we identified an association between the risk for development of de novo AML or MDS and higher expression of AKT1. AKT1 is an oncogene which plays a critical role in the PI3K/AKT pathway. AML patients frequently show increased AKT1 activity, providing leukemic cells with growth and survival promoting signals (Tang et al., 2015) and enhanced AKT activation has been implicated in the transformation from MDS to AML and overexpression of AKT has been shown to induce leukemia in mice (Kharas et al., 2010).
We also identified AML and MDS gene expression associations with RASGRP2, which is expressed in various blood cell lineages and platelets, acts on the Ras-related protein Rap and functions in platelet adhesion. GWAS have identified significant variants in this gene associated with immature dendritic cells (% CD32+) and immature fraction of reticulocytes, a blood cell measurement shown to be elevated in patients with MDS versus controls (Astle et al., 2016). RASGRP2 expression has not been studied in relation to AML or MDS, however recently RASGRP2/Rap1 signaling was shown to be functionally linked to the CD38-associated increased CLL cell migration. The migration of CLL cells into lymphoid tissues because of proliferation induced by B-cell receptor activation is thought to be an important component of CLL pathogenesis (Mele et al., 2018). This finding has implications for the design of novel treatments for CD38+ hematological diseases (Mele et al., 2018). These data imply the replication of these gene expression associations with the development of AML and MDS are warranted. This is the largest genome-wide AML and MDS susceptibility study to date. Despite our relatively large sample size, the complexity of cytogenetic risk groups in these diseases limits our analysis, particularly with respect to therapy-related AML and MDS. The DISCOVeRY-BMT study population is composed of mostly European American non-Hispanics and thus validation of these associations in a non-white cohort of patients is imperative. Lastly, the use of TWAS is a powerful way to start to prioritize causal genes for follow-up after GWAS, however there are limitations. TWAS tests for association with genetically predicted gene expression and not total gene expression, which includes environmental, technical and genetic components (Wainberg et al., 2019).
Our results provide evidence for the impact of common variants on the risk for AML or MDS susceptibility and further characterization of the 6p25.3 locus might provide a more mechanistic basis for the pleiotropic role of IRF4 in AML and MDS susceptibility. The co-identification of variants in IRF4 associated with the risk for both myeloid and lymphoid malignancy supports the importance of broader studies that span the spectrum hematologic malignancies.

DATA AVAILABILITY STATEMENT
The datasets presented in this article are not readily available because de-identified individual participant data that underlie the reported results are not available on dbGaP as informed consent is not compliant with the NIH Genomic Data Sharing Policy. Requests to access the datasets should be directed to International Blood and Marrow Transplant (www.cibmtr.org) by emailing (contactus@cibmtr.org) or Dr. Alyssa Clay-Gilmour (claygila@mailbox.sc.edu).

AUTHOR CONTRIBUTIONS
JW, AC-G, LS-C, and TH designed the research, performed research and analysis, and wrote the manuscript. CH, DV, XS, and LPo performed the genotyping. SL, LPr, AW, and GB performed the quality control of genomic data. All authors reviewed and approved the manuscript.

FUNDING
This work was supported by grants from the National Institute of Health. LS-C and TH were supported by 1R01HL102278 and 1R03CA188733 to perform this work. EK is supported by the Pelotonia Foundation Graduate Student Fellowship. Any opinions, findings, and conclusions expressed in this material are those of the author(s) and do not necessarily reflect those of the Pelotonia Fellowship Program or The Ohio State University. AC-G was supported by CA9204, Mayo Clinic R25 Training Grant when she performed a majority of this work. The CIBMTR is supported by Public Health Service Grant/Cooperative Agreement 5U24-CA076518 from the National Cancer Institute (NCI), the National Heart, Lung and Blood Institute (NHLBI) and the National Institute of Allergy and Infectious Diseases