HIF-1α Pulmonary Phenotype Wide Association Study Unveils a Link to Inflammatory Airway Conditions

Despite experimental data linking HIF-1α dysfunction to inflammatory airway conditions, the effect of single nucleotide polymorphisms within the HIF1A gene on these conditions remains poorly understood. In the current study, we complete a phenotype wide association study to assess the link between SNPs with known disease associations and respiratory phenotypes. We report two SNPs of the HIF1A gene, the intronic rs79865957 and the missense rs41508050. In these positions the A and the T allele are significantly associated with allergic rhinitis and acute bronchitis and bronchiolitis, respectively. These findings further support the role of HIF-1α in inflammatory pulmonary conditions and may serve as a basis to refine our understanding of other HIF-1α associated phenotypes.


INTRODUCTION
Hypoxia inducible factor (HIF) is central in the mammalian response to hypoxia. (Majmundar et al., 2010) HIF-1 is a nuclear factor that consists of a Hypoxia inducible factor 1α (HIF-1α) and a Hypoxia inducible factor 1β (HIF-1β) subunit. (Wang and Semenza, 1993;Gladek et al., 2017) While HIF-1β is stable regardless of the oxygen concentration, HIF-1α is rapidly degraded under normoxic conditions. (Yu et al., 1998;Gladek et al., 2017) Under hypoxic conditions however, HIF-1α is stabilized, leading to formation of HIF-1. (Ke and Costa, 2006;Slemc and Kunej, 2016) HIF-1 then in turn acts as a transcription factor, affecting over 98 target genes associated with up to 20 biological pathways. (Ke and Costa, 2006;Slemc and Kunej, 2016) Given this central role, it comes as no surprise that variations within the highly conserved HIF1A gene have been associated with a wide array of pathologic conditions. (Majmundar et al., 2010) Apart from playing an important role in normal lung development, HIFs have been shown to play a central role in the development of multiple pulmonary conditions, including pulmonary hypertension, Chronic obstructive pulmonary disease (COPD) and lung cancer angiogenesis. (Shimoda and Semenza, 2011) Despite this, within pulmonology, to date, variations within the HIF1A gene have only been associated with COPD and lung cancer. (Chan et al., 2017;Gladek et al., 2017;Paradowska-Gorycka et al., 2018;Wang et al., 2018;Hoang et al., 2019;Huang et al., 2020) Our current study sets out to examine the association between single nucleotide polymorphisms (SNPs) in the HIF1A gene and respiratory phenotypes. By starting with SNPs of interest, the Phenotype Wide Association Study (PheWAS) design flips the direction of inference commonly used in genome-wide association studies (GWAS). (Bush et al., 2016) To do so, it integrates data captured from patient's electronic health records (EHRs) with their genetic information. The major benefit of this approach is that it allows us to focus our efforts specifically on SNPs with known disease associations within this master regulator gene, improving the likelihood that found associations are based on molecular mechanisms that are relevant to the disease phenotypes uncovered.

Single Nucleotide Polymorphisms Selection
SNPs were selected from enriched literature review, including recently completed review by Gladek et al. (Gladek et al., 2017) All studies identified with keywords "HIF1a" and "variant" published after their literature review was completed, were reviewed. SNPs significantly associated with human disease were included in the current study (Table 1).

Population
Subjects were drawn from The Children's Hospital of Philadelphia (CHOP) biorepository at the Center for Applied Genomics (CAG). The pediatric samples included in this biorepository are linked to subjects' EMRs. All subjects have consented to both genomic analysis and EMR mining (Gottesman et al., 2013).

Ancestry Identification
Subjects in the PheWAS cohort were separated by ancestry based on the results of principal component analysis (PCA). PCA was performed using flashpca on approximately 2.4 million imputed SNPs with MAF >0.01 that had been pruned for linkage disequilibrium using PLINK v1.9 (Abraham and Inouye, 2014;Chang et al., 2015) The first three principle components were plotted, and ancestry designation was performed by comparison to the reference genotypes from the HapMap consortium. (Altshuler et al., 2010) The complete dataset contained 71,600 individuals: 34,410 Caucasians, 31,507 African Americans, 2644 Hispanics, and 3039 East Asians.

PheWAS
A PheWAS was conducting using the published PheWAS R package from . (Carroll et al., 2014) International Classification of Diseases 9 (ICD-9) codes were obtained from an anonymized extraction of the Children's Hospital of Philadelphia diagnosis database that contained subjects that had been recruited into the patient collection of the Center for Applied Genomics. Counts of the occurrence of each ICD-9 code for each subject were generated, and the resulting table was converted into the PheWAS phenotype table by a function in the R package. Subjects were included in the case group for each PheWAS phenotype if they possessed two or more occurrences of any of the ICD-9 codes that composed the phenotype in question. Subjects were listed as controls for the PheWAS phenotype if they lacked the case- defining ICD-9 codes, as well as ICD-9 codes corresponding to closely related phenotypes. Conversion from ICD-9 codes to PheWAS phenotypes was performed using the default translation table included in the R package. Phenotypes were analyzed in the PheWAS if they were represented by 20 or more cases in the cohort. The subject's sex and age were included as covariates in the analysis, as were the 10 flashpca generated principle components and a variable representing the group in which genotyping array had been imputed. Genotypes were extracted from the imputed data as allele dose information to preserve some information regarding genotype probability, and the allele doses were used as the genotype inputs to the PheWAS. The PheWAS analysis was performed individually on each PCAdefined ancestry, and then a meta-analysis was performed combining all four ancestries using the PheWAS-meta function provided in the PheWAS R package. For the association test, a logistic regression model, adjusted for age and sex was used. For defining significance in this study, we set a FDR threshold of 0.05. As a total of 2146 traits were analyzed, the over-conservative significance threshold based on Bonferroni correction was p 2.3 × 10 -5 .

In Silico Validation
SNP's significantly associated with respiratory disease were validated in an independent cohort by querying the publicly available Open Target Genetics database. (Ghoussaini et al., 2021) The Ensembl VEP was then used to assess the likely effect of these variants. (McLaren et al., 2016) To assess chromatin state and regulatory potential associated with the locations of the SNPs, other publicly available databases including Haploreg and Encode were queried.

RESULTS
We found 42 SNPs that have been previously associated with different medical conditions, including various cancers, cardiovascular diseases, metabolic disorders and (auto) immune diseases. This includes the 34 SNPs identified by Gladek et al. (Gladek et al., 2017) In addition, eight more SNPs were identified in studies published after their literature review was completed (Chan et al., 2017;Paradowska-Gorycka et al., 2018;Wang et al., 2018;Hoang et al., 2019;Huang et al., 2020).
Of the 42 SNPs included in our PheWAS, nine were significantly associated with at least one disorder. Table 2 summarizes the data for all the SNP-phenotype associations passing False Discovery Rate (FDR) or Bonferroni test. Most of the detected associations were from cohorts with less than 500 cases. However, the A allele of SNP rs79865957 was found to be significantly associated with allergic rhinitis (Figure 1) in a European cohort of 4,348 cases and 18,794 controls with an allele frequency of 0.08%. The OR was 2.86, Beta 1.05, SE 0.25 and p-value 3.48E−05. The second, rs41508050, the T allele was significantly associated with acute bronchitis and bronchiolitis (Figure 2) in an African American cohort of 2,234 cases and 21,463 controls with an allele frequency of 0.18%. The OR was 0.32, Beta 1.21, SE 3.36 and p-value 0.0001. Using the Open Target Genetics database rs79865957, the A allele was found to have been previously positively associated with both chronic airway obstruction (OR 1.94, p-value 0.0019, Beta 0.663) and asthma (OR 1.34, p-value 0.033, Beta 0.292). It has also been negatively associated with paternal chronic bronchitis/ emphysema (OR 0.75, p-value 0.0069, Beta −0.293). Using  Ensemble Variable Effect Predictor (VEP), it was found to be a likely intron variant for HIF1A. For rs41508050, the T allele was previously negatively correlated with "Bring up phlegm/sputum/ mucus on most days" (OR 0.72, p-value 0.0026, Beta −0.328) and is a missense variant for HIF1A.
The publicly available HaploReg tool was queried for both SNPs. SNP rs79865957 has four SNPs in linkage disequilibrium (r2 ≥ 0.8), two of which (rs76269977 and rs142660658) are intronic in the HIF1A gene. It is located in a regulatory region but not in a constrained sequence. It has histone H3K4me1_Enh enhancer marks in a lung carcinoma line and both H3K4me1_Enh and H3K27ac in a fetal lung fibroblast line. It is also a DNAse hypersensitivity site in a fetal lung fibroblast line. SNP rs41508050 has no other SNPs in linkage disequilibrium, is in a regulatory region and in a constrained sequence both by Genomic Evolutionary Rate Profiling and SiPhycons. It has histone H3K27ac_Enh marks in both lung fibroblast and lung carcinoma lines and is a DNase hypersensitivity site in a lung carcinoma cell line. Looking at Encode it had RFX5 bound in the GM12878 lymphoblastoid cell line. (Table 3)

DISCUSSION
We present the results of a HIF-1α PheWAS analysis focused on association with respiratory phenotypes. We identified two SNPs that are significantly associated with respiratory disease. Given the allele rarity in our patient population, the Open Target Genetics database was queried in further support. This resource integrates knowledge derived from the UK Biobank with published data from other sources and provides an independent cohort to validate our findings. (Baumann and Cabassa, 2020) The prior associations with allergic airway disease in the form of asthma for rs79865957 and association with bringing up phlegm/sputum/mucus for rs41508050 are consistent with the respective associations with allergic rhinitis and acute bronchitis and bronchiolitis in our cohort, suggesting the association may be driven by the underlying biological "inflammation" process which is the central driver across all these phenotypes involving different organs. To address the likely impact of these variants we used the Ensembl VEP and the publicly available HaploReg tool (Ward and Kellis, 2012;McLaren et al., 2016), both of which underscore the possible significance of both variants. Adding to the evidence supporting a functional impact are the previously published associations between rs79865957 and diabetic kidney disease and between rs41508050 and angina versus myocardial infarction as initial presentation of coronary disease (Hlatky et al., 2007;Huang et al., 2020).
Previously, variations within the HIF1A gene have been associated with COPD, lung cancer and a host of nonpulmonary conditions. (Gladek et al., 2017) Both the SNPs reported here had prior significant disease associations. First, rs79865957 was previously associated with diabetic kidney disease in a Han Chinese population. (Nava-Salazar et al., 2011) While to our knowledge the functional consequences of this SNP have not been eluded, the authors hypothesized that in a high glucose environment HIF1A transcription may be stimulated. Additionally, rs41508050 has a known association with the development of stable angina as opposed to myocardial infarction as initial presentation of coronary artery disease. (Hlatky et al., 2007) In vitro studies have previously linked this variant with a higher transcriptional activity. (Nava-Salazar et al., 2011) However, to our knowledge, the current study is the first to report on the association between SNPs of the HIF1A gene and allergic rhinitis, acute bronchitis and bronchiolitis. The reported association with allergic rhinitis is consistent with previously published experimental data highlighting the role of HIF-1α in allergic airway pathology. In an allergic airway disease model, HIF-1α inhibition decreased Th2 inflammation as measured by reduced IL-4, IL-5 and IL-13. (Kim et al., 2010) Beyond this, in a mouse model downregulation of HIF-1 or blockade of HIF-1α reduced cellular infiltrate in peribronchial lung tissues, thickness of smooth muscle and eosinophil infiltration. (Huerta-Yepez et al., 2008) Likewise, the role of HIF-1α in bronchiolitis is supported by experimental data on the consequences of HIF-1α stabilization by the Respiratory Syncytium Virus. (Kilani et al., 2004) Traditionally, GWAS identify SNPs significantly associated with human disease. These findings are then used to guide animal studies aiming to prove a causal link between the SNP and the disease. As briefly discussed above, the PheWAS design flips this process. It allowed us to look specifically at a highly conserved gene known to play a central role in the diseases of interest. In doing so, we were able to narrow down the list of SNPs within the HIF1A gene that play a potential role in respiratory pathology. Beyond this, we were able to detect significant effects of rare allelic variants. Conversely, this study design by definition excludes variants on other genes. While this is a limitation of the current study, given the hypoxemia dependent stabilization of HIF-1α and the experimental data supporting a role of HIF-1α in pulmonary conditions as outlined above it seemed reasonable to focus on HIF1A. Future studies may expand on the current work by including other members of the HIF family. Furthermoree, knowing that SNPs within the HIF1A gene are associated with respiratory diseases future studies can now refine our understanding of the associated phenotypes by looking at differences between patients with and without these SNPs.

DATA AVAILABILITY STATEMENT
The data analyzed in this study is subject to the following licenses/ restrictions: Some of the data used are available in deidentified format in dbGaP. Requests to access these datasets should be directed to https://www.ncbi.nlm.nih.gov/gap/.