Translational Analysis of Moderate to Severe Asthma GWAS Signals Into Candidate Causal Genes and Their Functional, Tissue-Dependent and Disease-Related Associations

Asthma affects more than 300 million people globally and is both under diagnosed and under treated. The most recent and largest genome-wide association study investigating moderate to severe asthma to date was carried out in 2019 and identified 25 independent signals. However, as new and in-depth downstream databases become available, the translational analysis of these signals into target genes and pathways is timely. In this study, unique (U-BIOPRED) and publicly available datasets (HaploReg, Open Target Genetics and GTEx) were investigated for the 25 GWAS signals to identify 37 candidate causal genes. Additional traits associated with these signals were identified through PheWAS using the UK Biobank resource, with asthma and eosinophilic traits amongst the strongest associated. Gene expression omnibus dataset examination identified 13 candidate genes with altered expression profiles in the airways and blood of asthmatic subjects, including MUC5AC and STAT6. Gene expression analysis through publicly available datasets highlighted lung tissue cell specific expression, with both MUC5AC and SLC22A4 genes showing enriched expression in ciliated cells. Gene enrichment pathway and interaction analysis highlighted the dominance of the HLA-DQA1/A2/B1/B2 gene cluster across many immunological diseases including asthma, type I diabetes, and rheumatoid arthritis. Interaction and prediction analyses found IL33 and IL18R1 to be key co-localization partners for other genes, predicted that CD274 forms co-expression relationships with 13 other genes, including the HLA-DQA1/A2/B1/B2 gene cluster and that MUC5AC and IL37 are co-expressed. Drug interaction analysis revealed that 11 of the candidate genes have an interaction with available therapeutics. This study provides significant insight into these GWAS signals in the context of cell expression, function, and disease relationship with the view of informing future research and drug development efforts for moderate-severe asthma.


INTRODUCTION
Asthma is one the of most predominant non-communicable diseases throughout the world. It accounts for over 400,000 deaths per year and by World Health Organization estimates, 24.8 million disability adjusted life years (DALYS) were attributable to Asthma in 2016 (1). Although there is no cure for asthma, most symptoms can be managed well with medication. However, patients with severe asthma, which represent ∼4% of all patients, suffer from uncontrolled symptoms and frequent exacerbations despite medication (2).
Over the years, many genome and phenome wide association studies (GWAS, PheWAS) have been completed resulting in a large number of signals mainly via single nucleotide polymorphisms (SNPs) associated with asthma relevant traits (3)(4)(5)(6)(7), which have been reviewed in detail (8,9). GWAS and PheWAS (10)(11)(12) have greatly advanced asthma research and translating these signals into candidate causal genes is the next step in moving to greater mechanistic understanding of asthma, therapeutic targets and pathways for investigation. However, many single SNPs are largely defined and generally mapped to the closest gene, regardless of whether the SNP has any effect on the function of that gene e.g., expression. This ignores the complexity of the 3D architecture of DNA which may result in SNPs linearly far away from a gene being closer than thought in the 3D structure and having a functional role (13). Indeed, it has been shown that over 90% of disease-linked variants are located in non-coding regions of the genome (14). Therefore, accurately determining causal genes is important for understanding the biology underlying GWAS and PheWAS signals, especially within respiratory relevant tissues and compartments, including the lung. An additional layer of complexity is the possibility of specific SNP-tissue interactions, where SNP-gene regulation may occur differently in different tissues and environments (15,16).
In this study, we investigated the 25 signals (including four novel signals; rs11603634, rs1090584, rs10905284, and rs61816761) identified in the recent Moderate to Severe Asthma GWAS (7). We aimed to identify candidate causal genes from these signals and understand their association with asthma, highlighting potential targets for downstream investigation utilizing unique [Unbiased biomarkers for the prediction of respiratory disease outcomes (U-BIOPRED)] and publicly available datasets. We then investigated these genes via multiple gene enrichment and interaction analyses and gene expression analysis in the lung and in whole blood. This study represents a significant advance in our understanding of the mechanistic underpinnings of these signals and provides candidate genes and pathways for future drug development in moderate-severe asthma.

Signal to Gene Association Analysis
To determine likely signal to gene associations in lung tissue, T-cells and whole blood, we interrogated publicly available databases (GTEx V8, Open Target Genetics (OTG), HaploReg V4.1, LD Matrix 5.0), a unique dataset presenting expression quantitative trait loci (eQTLs) for bronchial epithelial brushes (n = 117), bronchial biopsies (n = 84), and nasal epithelial brushes (n = 75) (U-BIOPRED), as well as a general literature search for previously published associations. Using these datasets, we noted eQTLs [GTEx, OTG (19), HaploReg (20), U-BIOPRED], sQTLs (GTEx), variant 2 gene scores (OTG), Gene to eQTL localization (OTG), and tagged functional variation [LDLink (LDProxy) (21)] for the sentinel SNP reported for these signals in the original study or, in cases where the sentinel SNP was not covered, a proxy SNP. We used a corrected significance threshold of P < 0.05, as presented in each dataset, to identify eQTLs across all datasets, and a H4 > 0.8 threshold for the Gene to eQTL co-localization. Proxy SNPs were selected based on linkage disequilibrium R 2 scores from the publicly available reference haplotypes for the EUR super population (CEU+GBR+TSI+FIN+IBS) from 1000_Phase 3 (Version 5) of the 1,000 Genomes Project accessed through LDLink, where the selected proxy was the SNP with the highest R 2 value (minimum R 2 = 0.2) that was present in the interrogated database.
Each signal to gene association identified was given a score point, with greater weighting given to U-BIOPRED eQTLs due to the specificity of the eQTLs to the respiratory related samples. This generated a total score, where genes scoring s ≥ 3 were selected for further downstream analysis. In cases where no genes achieved s ≥ 3 for a particular signal, the gene with the highest available score was selected. In the case of signal rs61816761, where all candidate genes scored below 3, FLG was selected as the most likely candidate causal gene due to existing literature (22,23).

Gene Function and Disease Association Analysis
Gene function was determined using the Gene 2 Function (24) and UniProt (25) databases, and gene function and biological processes annotations were listed. FIGURE 1 | Translational analysis pipeline. Twenty-five signals from the Moderate to Severe Asthma GWAS were taken through signal to trait and signal to gene association analyses. Signals were searched for in publicly available databases such as GeneATLAS, PheWAS, Open Target Genetics, GTex, HaploReg and LDLink as well as our U-BIOPRED (UB) datasets of Bronchial Biopsies, Bronchial Brushes, and Nasal Brushes. Gene association was scored as per Table 3 and 37 genes were identified. Candidate causal genes were then searched for in gene expression datasets from subjects with asthma vs. controls and single-cell-RNA lung tissue datasets. Asthma related pathways and interactions between genes were identified using the DAVID Functional Annotation Tool and GeneMANIA prediction server and drug gene interactions were identified using the Drug Gene Interaction Database.

Total and Single Cell RNA Expression Analysis in Lung Tissue
To determine the candidate gene expression profile in lung tissue, the Human Protein Atlas (HPA, http://www.proteinatlas.org) resource was used (26). For total lung expression, normalized expression (NX) values were used. The HPA provides NX values through normalizing the average transcript per million (TPM) value of all individual lung samples from three transcriptomics datasets (HPA, GTEx, and FANTOM5). For single cell (sc) lung tissue gene expression, transcripts per million protein coding genes (pTPM) values were used. The HPA provides pTPM values through normalization of read counts from four lung tissue datasets (Single Cell Expression Atlas, the Human Cell Atlas, the Gene Expression Omnibus (GEO) and the European Genome-phenome Archive). Both NX and pTPM values were obtained for 36 candidate genes (FLG had no data available). log 10 (pTPM) values were used for the heatmap. Genes which showed enriched/enhanced expression for lung epithelial (blue bars) or blood/immune cells (red bars) were highlighted in the bar chart. Enriched/enhanced expression was determined by using the same scoring system as the HPA. Enriched/enhanced expression is defined by the HPA as having NX values at least four times of any other tissue/cell type from their full transcriptomics data of 37 tissues and 43 single cell types.
Co-localization Analysis of Tissue Gene Expression and the Reported Trait "Asthma" To identify evidence linking gene expression across tissue/cell types with the asthma trait, the OTG portal was used. H4 scores for studies where asthma was the exclusive trait were obtained for the 37 candidate genes focusing on lung tissue and/or blood/immune cell types. Median H4 scores of all variants (signals) were plotted for each gene. Genes AAGAB, CD247, DEXI, FLG, GATA3, HLA-DQA1/A2, B2, IL33, KIF1A, KIAA1109, MSL1, MUC5AC, PGAP3, and ZBTB10 had no association data for lung/blood/immune cells/tissue and therefore have not been included in the analysis.

Differential mRNA Expression Analyses in Airway Epithelial Cells Isolated From Subjects With Asthma or Control Subjects
Genes which showed enriched epithelial expression in the HPA database and/or colocalization in lung tissue (median H4 > 0.8) were selected for further investigation to explore mRNA expression in bronchial epithelial cells isolated from subjects with asthma or control subjects. The freely available U-BIOPRED gene expression dataset, GSE43696, was used from the GEO resource (27), as it was the largest adult dataset available with a control group. The GSE43696 dataset comprises of Agilent Human GE 4×44K V2 Gene Expression data of bronchial epithelial cells for 20 control, 50 mild-moderate, and 38 severe asthma subjects (28,29). In this dataset, mild-moderate asthma was defined as subjects having an FEV 1 of >60% predicted, with/without low-moderate dose inhaled corticosteroids. Severe asthma was defined as subjects having severe refractory asthma, including the continuous use of high-dose inhaled corticosteroids and/or frequent use of oral corticosteroids with continuing symptoms and/or chronic airflow limitation. Control subjects had normal lung function, no history of respiratory disease or recent respiratory infection and no evidence of bronchial hyperresponsiveness. Expression values were plotted for each gene stratified by subject group. Data was statistically analyzed using a Kruskal-Wallis test with a two-stage linear step-up procedure of Benjamini, Krieger, and Yekutieli to control the FDR at 5%. P < 0.05 was considered significant.

Differential mRNA Expression Analyses in Blood Cells Isolated From Subjects With Asthma or Control Subjects
Candidate genes which showed enriched blood cell expression in the HPA database or colocalization in blood (median H4 > 0.8) were selected for further investigation to explore mRNA expression in blood cells between asthma and control subjects. The freely available gene expression dataset with adult subjects which also included a control group, GSE69683, was used from the GEO resource (27). The GSE69683 dataset comprises of Affymetrix HT HG-U133+ PM Array gene expression data of blood cell samples with 87 control, 77 mild-moderate, and 246 severe asthma subjects from the U-BIOPRED cohort (30). Samples were selected based on their non-smoker status (<5 pack-year smoking history). Mild-moderate asthma was defined as having controlled or partially controlled asthma symptoms, as defined by the Global Initiative for Asthma (GINA), whilst receiving a dose of <500µg fluticasone propionate/day or equivalent. Severe asthma was defined as having uncontrolled symptoms defined according to GINA guidelines and/or frequent exacerbations (more than two per year) despite high-dose inhaled corticosteroids (ICS) (ICS ≥ 1,000 µg fluticasone propionate/day or equivalent dose). Control subjects had no history of asthma or wheeze, had no other chronic respiratory disease, had never smoked and their prebronchodilator FEV1 was ≥ 80% predicted. Expression values were plotted for each gene stratified by subject group. Data was analyzed using the Shapiro-Wilk test to test for normality before using the Kruskal-Wallis test (nonnormal distribution) or Welch's ANOVA (normal distribution) to analyze differences between groups. Only genes GSDMB and STAT6 had normally distributed data. Both statistical tests were combined with a two-stage linear step-up procedure of Benjamini, Krieger, and Yekutieli to control the FDR at 5%. P < 0.05 was considered significant.

Gene Functional Annotation Analysis
To identify gene or gene clusters in functional and disease related associations, the DAVID Bioinformatics Resource (31, 32) was used. The 37 candidate genes underwent enrichment analysis in the Genetic Association Database of complex diseases and disorders (GAD), Gene Ontology (GO) Term and KEGG and REACTOME pathway databases. The 37 genes were used as a set and clusters below 5% FDR were considered significant. The top 10 GAD terms were listed and fold enrichment (FE) scores were plotted against -log 10 (P-value) with the area of points representing the number of genes in the cluster for GO Terms and pathway analysis.

Gene Interaction Analysis
To identify and predict candidate causal gene interactions, the GeneMANIA prediction server was used which normalizes and weights interaction networks from various sources of data to build an interaction map (33). The 37 candidate genes were inputted as a set to produce a network map depicting physical interactions, co-localization, co-expression and predicted genes and interactions. Scores for predicted genes and the number of gene interactions were used to identify the strongest predicted genes.

Gene Drug Interaction Analysis
We used The Drug Gene Interaction Database (DGIdb) (34) to identify known drug interactions with the 37 genes highlighted by our signal to gene analysis and 5 predicted genes (CD274, IL37, IRAK4, PDCD1LG2, and ZAP70) which were found to have the strongest interactions through the gene interaction analysis. Drugs showing an interaction score, i.e., the numeric representation of publication count and source count, the ratio of average known gene partners for all drugs to the known partners for the given drug, and the ratio of average known drug partners for all genes to the known partners for the given gene, of >1.0 were selected. In addition we queried each gene at clinicaltrials.gov in order to identify drugs currently in Phase II, III, and IV trials for our identified genes as targets.

PheWAS Analysis Indicates Signals Are Associated With Blood Cell Counts, Asthma, and Other Inflammatory Disorders
The GeneATLAS database identified traits associated with the risk allele of the 25 signals or proxies from the Moderate to Severe Asthma GWAS (Tables 1, 2). As expected, all signals showed association with asthma with odds ratios (OR) >1 (Figure 2). All signals, apart from rs61816761 (via proxy rs61816766), showed association with a blood/immune cell traits particularly eosinophil levels but also lymphocyte and neutrophil counts. Signal rs11603634 was relatively specific in showing only asthma, eosinophil percentage and platelet count associations. All signals apart from rs61816761 and rs9273410 showed association with eosinophil number and/or percentage highlighting the role of this cell type in asthma. All signals apart from rs61816761, rs3749833, rs10905284, rs11603634, rs112502960, and rs776111176 showed an association with allergy. Of these, signals rs3749833, rs10905284, rs11603634, rs112502960, and rs776111176 were associated with eosinophils but not allergy. Conversely rs9273410 was the only signal which showed an association with allergy but not eosinophils. These nuances in associations highlight the different asthma phenotypes.
Signals, rs1986009, rs9273410, rs7936312, rs776111176, and rs2941522 showed broad associations across many trait and trait groups. Signals rs9273410, rs776111176, and rs72743461 were the only ones associated with insulin-dependent diabetes. Signal rs776111176 was the only one associated with rheumatoid arthritis.
The strongest eQTLs, as defined by the most significant combined P and B-values for the risk allele, were observed in whole blood for rs2941522 (GSDMB & ORMDL3) and rs7305461 (RPS26 & STAT6) (Figure 3). Weaker associations (P-value) presenting with the greatest observed effect (B-value) were observed for rs776111176 in whole blood, lung (HLA-DQA2) and for signal rs9273410 in bronchial brushes (HLA-DQB1) and in whole blood and lung (HLA-DQB2) (Figure 3).

Gene Function and Biological Processes Analysis Identifies Distinct Gene Groups
The largest gene function clusters were observed for genes involved in DNA/RNA binding (BACH2, GATA3, IRF1, RORA, RPS26, SMAD3, STAT6, and   Only selected PheWAS traits have been displayed and organized into asthma (blue), blood/immune cell (red), allergy (yellow), other respiratory (green), inflammatory (purple), and auto-immune (gray) groups. The trait "basophil percentage" did not meet the Bonferroni corrected P-value for any of the signals and therefore was not included in the Figure. A full list of PheWAS terms and Beta and P-values are available in Supplementary Table 1. For disease traits (i.e., all except blood cell traits) a black dot represents an odds ratio of <1 with respect to the moderate to severe asthma risk allele. Proxies were used for the following signals which were not present in the database: rs61816761 (rs61816766, WDR36) or signal transduction (CD247, GNGT2, HLA-DQA1/A2/B1/B2, IL18R1, IL1RL1, IL33, LRRC32, PDCD1, and TSLP). Both FLG and MUC5AC are structural components involved in barrier formation and GSDMB is involved in cell death. Some genes such as DEXI, KIF1A, KIAA1109, ZBTB10, and ZNF652 had very limited or no information and as yet their function or involvement in biological processes is unknown. The gene function of implicated genes is described in Table 5.
Frontiers in Allergy | www.frontiersin.org Signals were analyzed using the translational pipeline including publicly available datasets such as Open Target Genetics, GTex, HaploReg and LDLink and our UBIOPRED (UB) datasets of Bronchial Biopsies, Bronchial Brushes, and Nasal Brushes. Each gene was scored based on supporting evidence as laid out in Table 3, with double weighting being given to eQTL associations observed in UB, based on the datasets localized data. The genes highlighted in green were taken forward in downstream analyses.

Single Cell RNA Expression in Lung Tissue Indicates Cell Type Specificity in Healthy Individuals
The HPA resource was used to look at overall and single cell type gene expression profile in lung tissue from healthy donors for the 37 candidate causal genes. Genes which showed enriched/enhanced expression (4-fold higher expression than in other cells/tissues) in epithelial or blood/immune cells were also identified. All genes except FLG showed expression in lung tissue and cells (Figure 4). KIF1A showed lower levels of lung expression and was undetectable in the cell types analyzed potentially due to the sensitivity of this technique. MUC5AC showed overall highest expression in lung tissue and along with IL33 and SLC22A4 was enriched in club and ciliated cells.
IL33 also showed high expression in endothelial cells. Both IRF1 and RPS26 showed high lung cell type expression but no cell type specificity. CD247, GNGT2, HLA-DQB2, and PDCD1 were expressed at low levels in the datasets and were specific for T-cells and macrophages. HLA-DQA1/A2, HLA-DQB1 were highly expressed overall and specifically in macrophages, whereas IL18R1 and IL1RL1 showed greater expression in granulocytes.
A full list of enriched tissues and cell types is available in Supplementary Table 2.

Co-localization Analysis Reveals Genes With High Asthma Trait Associations in Lung Tissues and Blood/Immune Cells
The OTG platform was used to investigate whether gene expression in lung tissue and/or blood immune/cells was associated with the asthma trait ( Figure 4B). BACH2, IRF1, LRRC32, PDCD1, RORA, STAT6, and TSLP showed a strong link (H4 score>0.8) to blood/immune cells only, whereas ZNF652 was exclusively linked to lung tissue expression. Genes GSDMB, IL1RL1, IL18R1, ORMDL3, RPS26, and SUOX on the other hand showed strong association to both lung tissue and blood/immune cells.

Genes Show Differential Gene Expression in Bronchial Epithelial Cells Isolated From Subjects With Severe Asthma
From the above analyses, 10 genes (GSDMB, IL18R1, IL1RL1, IL33, MUC5AC, ORMDL3, RPS26, SLC22A4, SUOX, and ZNF652) showed enriched epithelial expression and/or colocalization of the asthma trait in lung tissue. These specific genes were further investigated in the bronchial epithelium of mildmoderate or severe asthma subjects compared to control subjects ( Figure 5). Six of the 10 studied genes showed differential expression in asthma vs. control subject cells. Expression levels of IL18RL1 (P = 0.019, MA and 0.0008, SA), IL1RL1 (P = 0.0034, MA and 0.010, SA), and ORMDL3 (P = 0.040, MA and 0.0029, SA) were higher in both mild-moderate and severe asthma subjects compared to controls (Figures 5A,B,D) whereas expression levels of MUC5AC (P = 0.022) and RPS26 (P = 0.0004) were higher in severe asthma subjects only compared to controls (Figures 5C,E). Expression levels of SLC22A4 (P = 0.0051, MA and <0.0001, SA) was lower in both mildmoderate and severe asthma subjects compared to controls ( Figure 5F). Expression levels of SLC22A4 (P = 0.0028) was decreased in severe asthma ( Figure 5F) compared to mildmoderate asthma. Genes GSDMB, IL33, SUOX, and ZNF652 did not show any significant differences in expression across groups (Supplementary Figure 1).

Genetic Association Database of Complex Diseases and Disorders Identifies Genes Associated With Asthma
The DAVID functional annotation tool was used to perform enrichment analysis for the 37 candidate genes in the GAD. The top 10 gene clusters (5% FDR) are listed in  FLG, IL33, GSDMB, SLC22A5, SMAD3, TSLP, GATA3, SUOX,  MUC5AC, IL1RL1, IRF1, ORMDL3, STAT6, HLA-DQA2, IL18R1, HLA-DQA1, and HLA-DQB1 (P = 1.78 × 10 −15 ). Genes were also associated with other, predominantly autoimmune, diseases including Type 1 diabetes, Celiac disease, Crohn's disease, and rheumatoid arthritis which complements the findings of the PheWAS. There is considerable overlap between genes associated with asthma and other diseases with HLA-DQA1/B1 genes present in ∼74% of significant terms (Supplementary Table 4).

Pathway and Gene Ontology Term Enrichment Analysis Highlights Genes
Involved in Asthma and the Strong Presence of a HLA-DQA1/A2/B1/B2 Cluster GO term enrichment analysis ( Figure 7A) identified a predominant gene cluster of HLA-DQA1/A2/B1/B2 genes. These genes were enriched mainly in processes involving MHC class II molecules and vesicle-membrane transport interactions. KEGG and REACTOME pathway analyses also identified this gene cluster as being involved in asthma, MHC class II antigen presentation, autoimmune processes, pathogenic infections, and diabetes ( Figure 7B). PD-1 signaling was the most significant pathway identified which was enriched for CD247, PDCD1, HLA-DQA1/A2, and HLA-DQB1/2. Other significant pathways, including phosphorylation, translocation and generation of second messenger molecules, involved the CD247, HLA-DQA1/A2/B1/B2 genes. Full result tables are available in Supplementary Tables 5, 6.

Gene Interactions Analysis Highlights Key Co-localization and Co-expression Relationships and Predicts Additional Related Genes
Gene interaction analysis of the 37 candidate genes identified that the strongest physical interactions were observed between HLA-DQA1 and B1, SMAD3 and SMAD4 (predicted gene), CD247 and ZAP70 (predicted gene), and FLG and CASP14 (predicted gene). IL33, LRRC32, IL18R1 and CD274 (predicted gene) were observed to have the greatest number of co-localized partners.
GATA3 and IL18R1 were observed to have the greatest number of co-expressional partners. AAGAB, KIF1A, and KIAA1109 showed no interaction between any other gene (Figure 8). The predicted genes with the strongest interactions were CD274, IL37, IRAK4, PDCD1LG2, and ZAP70 (Supplementary

DISCUSSION
Asthma is a heterogenous disease in which both genetic and environmental factors contribute to susceptibility and progression (35). Severe asthma, characterized by uncontrolled symptoms, a burden of medication and frequent exacerbations, remains inadequately managed in many patients (2). The current study aimed to provide significant translation of 25 genetic signals identified as risk factors for the development of moderate to severe asthma to new biological insight using a broad range of approaches and datasets. The main findings are (i) moderate-severe asthma genetic signals drive a large number of inflammatory cell traits particularly eosinophil levels and are risk factors for related traits such as atopic dermatitis and other inflammatory diseases, (ii) our signal to gene pipeline identified 37 candidate causal genes, (iii) genes show enrichment in lung tissue and inflammatory cells illustrating both the role in  Table 3. EQTLs listed are those that achieved a P-value of P < 0.05 in one of the interrogated datasets.
Frontiers in Allergy | www.frontiersin.org   (1) Humoral immune response mediated by circulating immunoglobulin (2) immunoglobulin production involved in immunoglobulin mediated immune response (3) antigen processing and presentation of exogenous peptide antigen via MHC class II (4) T cell receptor signaling pathway  inflammation and structural changes in the airways, (iv) 32 of the 37 genes had additional support for a role in asthma including some with differential expression in the airways and blood cells of severe patients, (v) genes show enrichment for pathways relevant to T cell function, interferon signaling, endoplasmic reticulum (ER) biology and others, (vi) gene interaction analyses identified predicted genes already known to be involved in asthma and finally, (vii) these genes and pathways show potential as targets for novel drug development or repurposing.

PheWAS Identified Features of Asthma and Other Traits Driven by Moderate-Severe Asthma Signals
Our initial analysis focused on understanding the uniqueness of the moderate-severe asthma signals by testing for association with a large number other traits via PheWAS. This approach can identify novel associations with quantitative traits e.g., lung function and provide significant inference regarding the mechanisms underlying the original association. All signals showed associations with the asthma trait including the MUC5AC signal that was identified as a potential moderatesevere asthma signal originally (7) and blood/immune cell trait associations were very prominent. Recently, this MUC5AC signal has been identified as potentially more relevant to adult onset asthma (36), a phenotype that shares less overlap with atopic comorbidities than childhood onset asthma. This may explain, at least in part, the differential profile compared to many of the other signals which showed associations with e.g., atopic dermatitis. Interestingly, some signals showed trait specificity whilst others had broader trait associations. Of particular note are signals rs9273410 and rs776111176 both corresponding to HLA-DQA1/A2/B1/B2 candidate genes, which had the widest disease associations including Hay fever, type 1 diabetes, inflammatory bowel disease, ulcerative colitis and psoriasis. The gene association for these results were confirmed in DAVID pathway and gene enrichment analysis where the HLA-DQA1/A2/B1/B2 gene cluster was present in ∼74% of diseases and pathways. Interestingly, PheWAS analysis for these signals showed a protective effect for some auto-immune diseases. Previous studies of signals in this region have confirmed inverse disease dependence risk between asthma and autoimmune disease, for example signal rs2395185, the asthma risk allele T was shown to be protective for ulcerative colitis (37,38). These differences in trait profiles across diseases may offer insight into disease pathophysiology for this gene cluster.
Another notable association with the majority of signals was the association with blood eosinophils levels, with eosinophils known to be an effector cell in asthma and linked to severe asthma pathology (39). Indeed, current treatments such as Mepolizumab, Benralizumab, and Reslizumab reduce exacerbations through blocking IL5/IL5 signaling and reducing eosinophil numbers (40)(41)(42)(43).
Interestingly, signal rs2941522 corresponding to candidate genes ORMDL3, GSDMB, PGAP3, and MSL1 had the strongest association with neutrophil counts. This observation may point to a particular pathology for carriers of this variant and a role in non-eosinophilic or neutrophilic asthma (44,45). Multiple GWAS and functional studies relate asthma to ORMDL3 and GSDMB (46,47) however to our knowledge no studies have been published investigating this variant or linked genes with regards to neutrophils in asthma.
Signals, rs12479210, rs144829310, and rs1986009 had the strongest eosinophilic associations, corresponding to IL1RL1, IL18R1, SLC22A4/5, and IL33 candidate causal genes. IL18R1 and IL1RL1 have been shown in many studies to have variants associated with asthma as well as having altered expression profiles in asthmatic subjects (3,28). Predictive interaction analysis revealed strong physical interactions between IL1RL1 and TMED1 which is unsurprising as both are related to IL33 signaling (48) which has been extensively researched in the context of asthma (49).

Signal to Gene Analyses Identifies 37 Plausible Candidate Causal Genes
The complexity of signals identified through GWAS highlight the importance of considering relevant tissue compartments when determining signal to gene associations. For many of the signals, we identified multiple plausible candidate causal genes to a single signal, with different genes acting as eQTLs in different cells and tissue types. This is potentially as anticipated as it is feasible that a signal may influence multiple genes and pathways. For example,  Table 2. No data was available for FLG and therefore has not been shown. (B) Co-localization analysis (H4 scores) data was taken from the Open Target Genetics (OTG) platform for lung (blue squares) or blood/immune cells (red circles) in studies with the asthma trait (exclusive). Scores represent evidence of association between candidate gene, specific tissue and asthma trait. A higher score indicates greater association. Dotted line shows cut off value of 0.8. Candidate genes AAGAB, CD247, DEXI, FLG, GATA3, HLA-DQA1/A2, B2, IL33, KIF1A, KIAA1109, MSL1, MUC5AC, PGAP3, and ZBTB10 had no association data for lung/blood/immune cells/tissue in OTG and therefore have not been shown.
considering signal rs12479210, we observe an eQTL for IL1RL1 in lung tissue but not in a T-cell population, where IL18R1 is the observed eQTL. This strongly suggests that a single signal may drive the differential expression of multiple genes and contribute to asthma mechanisms in multiple ways, the IL1RL1 signal remains one of the most reproducible asthma signals potentially for this reason (50). Additionally, we highlight that when eQTLs are common across different tissue types, these may present . Expression values were taken from the dataset and a Kruskal-Wallis test with a two-stage linear step-up procedure of Benjamini, Krieger and Yekutieli used to control the FDR at 5% was performed. *P < 0.05, **P < 0.01, ***P < 0.001, and ****P < 0.0001. dramatically different effects on gene expression. For example, when considering the above locus of association, we observe a positive eQTL for IL1RL1 in whole blood (B = 0.1), but a negative eQTL effect in whole lung (B = −0.372). This emphasizes the need for further analysis on these identified eQTLs in order to unpick the relationship between the eQTLs and asthma relevant phenotypes (51). The 37 identified candidate causal genes broadly mapped to; barrier formation/defense, cell death, DNA/RNA binding, membrane binding and transport, metabolic processes, and signal transduction. Importantly, independent biological studies of asthma have identified a potential role for several of these pathways e.g., epithelial barrier/defense (52) and DNA/RNA binding (53), however more novel mechanisms identified require further investigation.

Gene to Asthma Analyses Provide Significant Support for a Role of Candidate Causal Genes in the Structural Changes in the Airways and Inflammation in Asthma
Structural remodeling is a major component of asthma pathophysiology resulting in the narrowing of the airways. Changes comprise of loss of ciliated cells, goblet cell metaplasia, increased sputum production, basement membrane thickening and smooth muscle cell hypertrophy, and hyperplasia leading to an overall decline in lung function (54)(55)(56). Airway remodeling is thought to occur in response predominantly to chronic inflammation, which is supported by studies showing that steroid treatment in asthmatic patients both reduces airway inflammation and has a positive effect on airway remodeling (57,58). The strong association of these signals with inflammatory traits and a candidate causal genes list with genes predominantly involved in signal transduction indicates these signals may contribute to airway remodeling.
Immune cell infiltration of the lung epithelium is a magnifier of asthmatic inflammation (59). STAT6 expression was increased in the blood cells of asthmatic patients compared to controls in the dataset analyzed in this study. It is a transcription activator and has been linked to goblet cell metaplasia and an increase in MUC5AC production (60). The MUC5AC protein is a secreted gel forming mucin and its protein levels have also been shown to be increased in asthmatic airway mucus (61). Furthermore, in the airway epithelial dataset analyzed, MUC5AC showed a significant increase in gene expression in patients with severe asthma. Indeed, IL-13, an inflammatory cytokine highly associated with asthma induced activation of STAT6 has been shown to increase mucin expression and mucus metaplasia in both airway epithelial cells and submucosal glands in mice and has been linked to goblet cell hyper/metaplasia in humans (62). Both BACH2 and RORA are also DNA binding proteins, however in blood cells, their expression values were significantly lower in asthmatics. RORA has been shown to play a role in lung . Expression values were taken from the dataset and either a Kruskal-Wallis test or Welch's ANOVA (STAT6), both with a two-stage linear step-up procedure of Benjamini, Krieger and Yekutieli used to control the FDR at 5%, was performed. *P < 0.05, **P < 0.01, ***P < 0.001, and ****P < 0.0001. development (63) and childhood asthma (64) and studies in mice have shown BACH2 to repress T-cell cytokine production (65,66) indicating lower levels of this protein in asthma may contribute to unregulated inflammation. Together, these findings suggest that imbalances in transcriptional regulators in inflammatory cells may, in part, contribute to asthma pathology possibly via downstream processes which affect multiple cells types including the airway epithelium. However, some caution is required as these expression datasets were from mixed cell populations, so we cannot exclude the influence of different cell populations between groups driving some mRNA level differences. Furthermore, the signal for the MUC5AC candidate gene, rs11603634, associated with the asthma, eosinophils and platelets traits only, indicating a role specific to severe/eosinophilic asthma that is potentially not driven by allergic mechanisms due to the lack of association with related traits in PheWAS. 6 | DAVID functional annotation tool analysis of the genetic association database of complex diseases and disorders (GAD).

Term
Genes Count (%) FE* p-value FDR   Asthma  FLG, IL33, GSDMB, SLC22A5, SMAD3, TSLP, GATA3, SUOX, MUC5AC,  IL1RL1, IRF1, ORMDL3, STAT6, HLA-DQA2, IL18R1, HLA-DQA1,  HLA- Eosinophils, mast cells and T-helper (Th) cells are the key producers of inflammatory cytokines, IL-4, IL-5, and IL-13, associated with Th2 asthma (59,67). From the signal association with eosinophils in the PheWAS analysis, it suggests inflammation plays a key role in asthma pathogenesis. Indeed, more than half of the 37 candidate causal genes were found, through gene function and pathway analyses, to be involved in signaling pathways related to inflammatory cytokines. Some genes such as TSLP, IL-33 and its receptor IL1RL1 have well-established asthma inflammatory roles (51,68). IL1RL1, showed increased expression in both airway epithelium and blood of asthmatic patients compared to controls. Furthermore, inflammatory pathology is supported by the presence of GATA3 and STAT6 in asthma and inflammatory gene clusters in the pathway analysis. Both of these transcription factors have been shown to be vital for Th2 cell activation in asthma (69). Expression of IL18R1, a receptor for IL-18, was also increased in asthmatic airway epithelium. IL-18 has been shown to regulate both Th1 and Th2 immune responses (70,71) through INFγ and IL-12 (72) and induce IgE production via IL-4 and STAT6 (73). Additionally, for IL1RL1 and IL18R1 colocalization analysis revealed strong association with both lung tissue and blood/immune cells indicating infiltration into the lung.

Gene Functional Annotation, Pathway, and Gene Interaction Analysis Provides Insight Into Asthma Mechanisms Including Prediction of Other Asthma Related Genes
Functional and pathway analysis has revealed that the 37 candidate genes are involved in predominantly transcriptional regulation, membrane interaction and cytokine signaling, indicating that these processes may be disrupted or imbalanced in patients with asthma. A strong HLA-DQA1/A2/B1/B2 gene cluster has emerged involving in a broad range of processes including auto-immune disease, viral and bacterial infection, antigen presentation and ER/Golgi/vesicle membrane interaction. If the gene cluster is broadened to include CD247, then combined these five genes are present in over 75% pathway and GO terms and processes. The HLA genes are part of a class of major histocompatibility complex (MHC) molecules which present exogenous allergen on airway dendritic cells and interact with CD247, part of the T-cell antigen receptor complex, to achieve T-cell activation (74,75). From the blood dataset utilized in this study, both CD247 and HLA-DQA1/A2 gene expression were found to be decreased in asthmatic patients. Although studies in asthma are lacking, type 2 (non-insulin dependent) diabetics have shown chronic inflammation resulted in immunosuppression and CD247 downregulation (76). Seeing as chronic inflammation is also present in severe asthmatics, it is possible that a similar process is involved. Indeed, the signals, rs9273410 and rs776111176, for the HLA-DQ genes also had a strong risk association with diabetes albeit, type 1 (insulin-dependent) diabetes. Genetic variants in the HLA region have long been associated with asthma (77), however deconvolution of this region has been difficult.
Another feature which has appeared is the interaction/involvement of the candidate genes with the ER apparatus. The role of the ER, ER stress and the unfolded protein response (UPR) in asthma remains unclear. An increase in ER stress has been reported in asthma, and ER stress has also been shown in turn to increase mucin production, including MUC5AC, and airway remodeling (78,79). In some studies, ORMDL3, an ER transmembrane protein, has been heavily implicated in asthma related ER stress (78,79). ORMDL3 gene expression was shown to be increased in asthmatic airway epithelium but decreased in blood compared to controls in the datasets used in this study. In FIGURE 8 | Gene interaction analysis. GeneMANIA was used to explore and predict gene interactions. Black circles represent inputted candidate genes and gray circles are predicted genes (selected to show top 20 with strongest predicted association). The area of the gray circle represents prediction scores and thickness of line represent interaction scores, with a larger area/thicker line representing stronger prediction/interaction. mice, ORMDL3 expression has been shown to be induced by allergen, IL-4 and IL-13 via STAT6 and in bronchial epithelial cells, overexpression of ORMDL3 has been shown to trigger activating transcription factor 6 (ATF6), which has also been implicated in airway remodeling in asthma (80). However, the role of ORMDL3 in ER driven asthma pathology remains to be resolved (81,82).

Analyses of 42 Genes Identified in the Current Study Provide Significant Opportunities for Drug Development and Or/Repurposing
Of the 37 candidate genes and five strongest predicted genes, drugs for GATA3, IL33, IL1RL1, SMAD3, and TSLP are either already in use or in clinical trials for asthma, clearly validating these genes from our pipeline as therapeutic targets. Drugs supported by genetic evidence are twice as likely to go through the drug development pipeline, be successful in Phases II and III and ultimately go into clinic (83). Drugs that target MUC5AC (ENSITUXIMAB) and STAT6 (CHEMBL363332 and CHEMBL1374370) have emerged as potential avenues of future research via repurposing for severe asthma and show these two proteins are viable therapeutic targets. The PDCD1 and CD274 gene had the largest number of drug associations, albeit predominantly chemotherapy drugs. Although not a prominent gene to come out of this analysis, PDCD1 has been shown in a small study to be increased in asthma patients after whole lung allergen challenge (84). However, caution must be taken when interpreting drug-gene interactions. For    example, there are many drugs available, which may offer a starting point for targeting of the HLA-DQA1/B1 genes such as azathioprine, which is an immunosuppressant already in use to treat rheumatoid arthritis, Crohn's disease and ulcerative colitis. However, in the blood cell dataset analysis, HLA-DQA1 was decreased in asthmatics indicating a lack of gene expression is linked to risk and therefore inhibitory drugs at best would be ineffective and at worst intensify the asthma phenotype.

A Move to Personalized Medicine
For any therapy to be truly effective in a disease such as asthma which is heterogenous, medicines need to be stratified based on evidence that that drug target and/or pathway is driving disease and genetic variants may help identify these individuals. The combination of variants may lead to a particular type of asthma such as IL33 high, which then may be amenable to therapy targeting IL33. Therefore, asthma studies which measure gene expression and also determine genotype offer the most comprehensive basis for understanding the effects of these signals and genes, as then risk allele, gene expression profile, and asthma phenotype can be bridged. For example, the rs11603634 signal, with candidate gene MUC5AC, is inherited in ∼50% of the European population. The airway epithelium datasets analyzed in this study showed increased expression of MUC5AC and a brief analysis in the original GWAS paper has shown the individuals homozygous for the risk allele, A, show increased MUC5AC expression compared to homozygous non-risk allele, C, individuals (7). Together these results indicate MUC5AC is a strong candidate for inhibitor therapy particularly for individuals that carry the risk allele.
On the other hand, signal rs61816761 (candidate gene FLG) is inherited in ∼2% of the European population and was highly specific in only associating with eczema/dermatitis in addition to asthma. This signal includes a loss of function variant and results from this study are consistent with the literature which indicate loss of function variants in the gene are associated with asthma in children with eczema (23). Therefore, therapy for these individuals would require restoring the loss of either FLG itself or of another molecule in its pathway but would probably only be effective in individuals with this variant. Prediction analysis in this study revealed novel physical interactions with CASP14 and PCSK6 and co-expressional interactions with PRL, CASP14, and LRRC32. These interactions may be compromised in subjects with this variant and are options for future exploration.
Therefore, any therapy, including those being investigated for repurposing in asthma, should be stratified and targeted to take into account the individual's genotype. There is previous proof of concept for this e.g., the prevention of asthma exacerbation by the IL4R antagonist, pitrakinra was shown to be effective in carriers of specific IL4R genotypes, although the specific mechanism is not known at this time (85).

CONCLUSIONS
The results from this study replicate and importantly extend previous GWAS translational study findings. A study by Dong et al. which integrated eQTL data and GWAS summary data, using Sherlock Bayesian analysis, identified 11 candidate genes implicated in severe asthma of which four genes (HLA-DQA1, GNGT2, STAT6, and SLC22A5) overlap with the genes identified through our pipeline (86). Furthermore, a study by El-Husseini et al. which focused on druggable candidate genes identified through GWAS and eQTL analysis also showed overlap with the genes identified in this study notably SLC22A4 and SUOX (87). Collectively, these results provide direct support for the role of 37 candidate causal genes in moderate to severe asthma pathological mechanisms and indirect evidence through interaction for an additional five genes. Overall, the findings demonstrate the contribution of altered airway structural cell and inflammatory cell mechanisms underlying asthma including novel inferences regarding genes not previously identified as potentially contributing to asthma. These genes and pathways can form the basis of novel drug development and/or drug repurposing with the aim to treat moderate to severe asthma where there is an unmet clinical need potentially in a stratified approach using genetics to guide therapy to the individual patient.

AUTHOR CONTRIBUTIONS
IS conceived the study, contributed to the design, and contributed to writing the manuscript. SH, YG, and IA provided U-BIOPRED datasets and contributed to drafting the manuscript. MP and KR carried out the analysis and wrote the manuscript. All authors contributed to the article and approved the submitted version.

FUNDING
The IS laboratory receives grant support from Asthma UK, British Lung Foundation, NIHR, MRC, GSK, NC3Rs, BBSRC, and Boehringer Ingelheim. The U-BIOPRED initiative was funded by Innovative Medicines Initiative (EU-IMI 115010). The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health, and by NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS. The data used for the analyses described in this manuscript were obtained from the GTEx Portal on 16/06/2020.