Phenome-wide association studies demonstrating pleiotropy of genetic variants within FTO with and without adjustment for body mass index

Phenome-wide association studies (PheWAS) have demonstrated utility in validating genetic associations derived from traditional genetic studies as well as identifying novel genetic associations. Here we used an electronic health record (EHR)-based PheWAS to explore pleiotropy of genetic variants in the fat mass and obesity associated gene (FTO), some of which have been previously associated with obesity and type 2 diabetes (T2D). We used a population of 10,487 individuals of European ancestry with genome-wide genotyping from the Electronic Medical Records and Genomics (eMERGE) Network and another population of 13,711 individuals of European ancestry from the BioVU DNA biobank at Vanderbilt genotyped using Illumina HumanExome BeadChip. A meta-analysis of the two study populations replicated the well-described associations between FTO variants and obesity (odds ratio [OR] = 1.25, 95% Confidence Interval = 1.11–1.24, p = 2.10 × 10−9) and FTO variants and T2D (OR = 1.14, 95% CI = 1.08–1.21, p = 2.34 × 10−6). The meta-analysis also demonstrated that FTO variant rs8050136 was significantly associated with sleep apnea (OR = 1.14, 95% CI = 1.07–1.22, p = 3.33 × 10−5); however, the association was attenuated after adjustment for body mass index (BMI). Novel phenotype associations with obesity-associated FTO variants included fibrocystic breast disease (rs9941349, OR = 0.81, 95% CI = 0.74–0.91, p = 5.41 × 10−5) and trends toward associations with non-alcoholic liver disease and gram-positive bacterial infections. FTO variants not associated with obesity demonstrated other potential disease associations including non-inflammatory disorders of the cervix and chronic periodontitis. These results suggest that genetic variants in FTO may have pleiotropic associations, some of which are not mediated by obesity.


INTRODUCTION
Pleiotropy, the phenomenon in which a single gene or genetic variant is associated with multiple phenotypes, is essential to the functionality of the human genome (Crespi, 2010;Wagner and Zhang, 2011;Pavlicev and Wagner, 2012). Through comparing multiple genome-wide association studies (GWAS) and candidate gene studies, pleiotropy has been noted in many single nucleotide polymorphisms (SNPs) and genes, potentially providing greater insight into putative biological mechanisms (Sivakumaran et al., 2011;Stranger et al., 2011;Solovieff et al., 2013). The increasing prevalence of DNA biobanks linked to rich phenotype resources and large epidemiological databases have enabled the development of phenome-wide association study (PheWAS) method as an additional tool to investigate pleiotropy (Denny et al., 2010a;Pendergrass et al., 2011). As a complement to GWAS, PheWAS enables both the validation of genotypephenotype associations identified through traditional GWAS and the generation of new hypotheses, identifying potentially novel associations as well as putative instances of genetic pleiotropy Pendergrass et al., 2013). A recent application of PheWAS to 3144 GWAS-identified variants, replicated 210 known associations and noted 63 new, pleiotropic associations .
The Electronic Medical Records and Genomics (eMERGE) Network was formed in 2007 to use phenotypes derived from electronic health record (EHR) data to perform GWAS and other genomic investigations Pathak et al., 2011;Crosslin et al., 2013;Ding et al., 2013). eMERGE investigators have also used EHR-based PheWAS methods to evaluate multiple phenotypes associated with specific genetic variants (Denny et al., 2010a;Pathak et al., 2012;Hebbring et al., 2013). PheWAS has been used to enhance our understanding of the genetic determinants of complex traits discovered through GWAS. For example, a PheWAS of variants associated with longer cardiac conduction  demonstrated an association with atrial fibrillation, and a PheWAS of variants affecting platelet count and size identified associations with autoimmune diseases (Shameer et al., 2013).
Variants in the fat mass and obesity associated gene (FTO) have been studied since 2007, when it was discovered that some were associated with body mass index (BMI) and obesity (Frayling et al., 2007). Multiple GWAS have demonstrated further associations between variants in FTO and obesity (Jacobsson et al., 2012). Some of these variants have also been noted to be associated with both obesity and type 2 diabetes (T2D) (Hertel et al., 2011;Rees et al., 2011;Li et al., 2012) including SNPs rs9939609 and rs8050136, which are in high linkage disequilibrium (LD) with each other in people of European ancestry (r 2 = 1.00; using 1000 Genomes Pilot 1 reference in the CEU population). The SNP rs8050136 is located in an intronic region where the transcription factor cut-like homeobox (CUTL1) protein (Li et al., 2000) is predicted to bind (Stratigopoulos et al., 2008). This variant has been associated with T2D and obesity in Han Chinese and European populations (Hubacek et al., 2008;Liu et al., 2010;Hotta et al., 2011) but other studies found no association between this variant and T2D or obesity in the Chinese Han population (Li et al., 2008;Xi and Mi, 2009). These differences in associations of SNPs with phenotypes have been further analyzed through fine mapping of BMI loci (Gong et al., 2013). This study reported that GWAS studies primarily performed in European populations of numerous loci associated with BMI are not generalizable to other ethnic groups, for example African Americans. Another study demonstrated that rs8050136 was associated with increased energy intake from fat with similar total energy intake (Park et al., 2013). A more recent study noted that the mechanism of action for common variants in FTO may be through regulation of IRX3 expression, which is highly expressed in the brain (Smemo et al., 2014).
There is also evidence of other putative disease associations with FTO variants that have not achieved genome-wide significance, such as pancreatic cancer, Alzheimer's disease, attention deficit hyperactivity disorder, alcoholism, and osteoarthritis (Keller et al., 2011;Lurie et al., 2011;Sobczyk-Kopciol et al., 2011;arcOGEN Consortium et al., 2012;Corella et al., 2012;Reitz et al., 2012;Velders et al., 2012). These varied disease-SNP associations suggest that SNPs in FTO may have pleiotropic effects. Utilizing the population and diagnostic diversity contained within the realworld clinical environment for variants within FTO, our goal was to determine whether an EHR-based PheWAS could identify genetic pleiotropy that might otherwise remain undetected in traditional cohort study designs. In the present study, we utilized PheWAS method and data sets from the eMERGE network Gottesman et al., 2013) to evaluate pleiotropy of variants in FTO.

PARTICIPATION OF eMERGE SITES
The eMERGE Network data used in this study consists of seven institutions (Group Health Cooperative and University of Washington, Marshfield Clinic, Mayo Clinic, Northwestern University, Mount Sinai, Geisinger Health System, and Vanderbilt University Medical Center), each with DNA biorepositories linked to their EHRs. Each site pulled demographic, vital sign, and billing data from their EHR research data repositories for this study. All projects were either approved by local IRBs or classified as IRB exempt as non-human subjects research.

GENOTYPING OF eMERGE SUBJECTS
Variants for eMERGE subjects were selected from extant genome-wide genotypes with either the Human660W-Quadv1_A or Illumina OmniExpress chips. The Human660W-Quadv1_A BeadChip was completed at the Center for Genotyping and Analysis at the Broad Institute, and the Center for Inherited Disease Research at Johns Hopkins University. Genotyping for Illumina OmniExpress BeadChips was performed at the University of Pittsburgh Genomics and Proteomics Core Laboratories. These genotyping data comprised 10,487 individuals of European ancestry, as designated in the EHRs.
Quality-control (QC) of the genotype data was performed using a pipeline developed by the eMERGE Genomics Working Group (Turner et al., 2011). This process included call rate restrictions listed below, identification of sex mismatch and anomalies, checking duplicate and HapMap concordance, as well as identifying batch effects, sample relatedness, and minor allele frequency (MAF). Population stratification was evaluated using STRUCTURE (Pritchard et al., 2000) and EIGENSTRAT (Price et al., 2006). Only SNPs with call rates >99% and MAF >0.01 in unrelated samples were included for further study. Relatedness was determined on the basis of identity by descent (IBD) estimates generated from the genome-wide genotype data in PLINK (Purcell et al., 2007). All study sites had pairs of individuals with an IBD estimate greater than 0.25; only one of the individuals in each related pair was randomly selected and used in the analysis. Additional genotypes were imputed using IMPUTE2 (Marchini et al., 2007) and 1000 Genomes Project as the reference (1000Genomes Project Consortium et al., 2010. We used imputed SNPs with a minimum info score of 0.7 and called genotypes based on the maximum posterior probability. We selected 54 SNPs, of which 51 were imputed in at least one site, located in FTO that met the QC criteria above and were previously associated with obesity (Jacobsson et al., 2012). QC and subsequent association tests were performed using PLINK (Purcell et al., 2007) and the R statistical package (R Core Team, 2013).

GENOTYPING OF VANDERBILT SUBJECTS USING HumanExome BeadChips
We selected 13,711 individuals of European ancestry from the BioVU DNA databank with BMI data who were genotyped using the Illumina Infinium HumanExome BeadChip, which includes >240,000 markers, mostly within exonic regions, as well as SNPs from the GWAS catalog (Welter et al., 2014) including rs8050136 in FTO. Genotyping was performed at the Vanderbilt Technologies for Advanced Genomics (VANTAGE) Core, and genomic data were processed by the Vanderbilt Technologies for Advanced Genomics Analysis and Research Design (VANGARD) Core. Clustering was performed using GenomeStudio's GenTrain clustering algorithm followed by manual review and reclustering; genotype calling was performed using GenomeStudio's GenCall algorithm. Genotyping quality was evaluated using SNP call rates and concordance rates with HapMap controls; SNPs with <99.8% call rate or <98% concordance were excluded. In the first analysis, we focused on rs8050136, which had a call rate of >99.9%. In the subsequent analyses, we further analyzed eight FTO SNPs on the Exome chip, which had call rates greater than 99.8% and MAFs >0.01. Similar to the eMERGE set, for individuals with an IBD estimate greater than 0.25; only one of the individuals in each related group was selected randomly and used in our analyses.

PheWAS ANALYSES
We first tested the 54 eMERGE SNPs for association with BMI using linear regression. We calculated LD with our reference SNP rs8050136, chosen as the reference because of its GWAS associations with BMI and T2D in the literature and since it was directly genotyped on all of the platforms. To evaluate phenotype associations and potential pleiotropy among different FTO SNPs, we grouped SNPs into three groups for convenience based on their LD with rs805136: high LD (r 2 > 0.80), moderate LD (0.80 ≥ r 2 > 0.60) and low LD (r 2 ≤ 0.60) with rs8050136. Our hypothesis was that SNPs in high LD would show similar patterns of phenotype associations with rs8050136, and that different patterns may be observed in SNPs with lower LD.
Analyses for the eMERGE and the BioVU datasets were conducted separately and then meta-analyzed. The eMERGE population had 54 SNPs and the BioVU population had nine SNPs for analysis, which were also present in the eMERGE dataset. We used logistic regression adjusted for age, sex, eMERGE site, and the first three principal components as calculated for each dataset by EIGENSTRAT, using an additive genetic model. We performed PheWAS using each SNP using methods and code groupings described previously  using the R PheWAS package (Carroll et al., 2014), briefly, calculating comprehensive associations between SNPs and a total of 1645 clinical phenotypes derived from the International Classification of Disease, 9th CM (ICD-9) edition codes from each site's EHR. The ICD-9 codes that are associated with each phenotype can be found at the PheWAS catalog located online at http://phewas.mc.vanderbilt.edu/. Cases for a given disease were defined as having at least two relevant ICD-9 codes on different days. The PheWAS method also defines control groups for each disease, which ensures that related diseases do not serve as controls for the current disease being analyzed. We performed association testing for all PheWAS phenotypes occurring in at least 20 individuals (effectively 20 "cases").
We then compared our results to performing PheWAS for each FTO SNP adjusting for BMI. The BMI, obtained from each site's EHR, was estimated using the average BMI from individuals within our dataset. To minimize erroneous data, we only used BMI measurements between 15 and 70, a range that we have used in prior studies and has good precision (Denny et al., 2010b). Plotting was performed in R using the PheWAS and ggplot2 packages.
Meta-analysis was performed using the inverse-variance method (Hunter et al., 1982) for the nine shared SNPs. There were 1010 phenotypes that were in common across both datasets and met our minimum case criteria of at least 20 cases. This yields a Bonferroni corrected p-value of 4.95 × 10 −5 , (p = 0.05/1010 = 4.95 × 10 −5 ), for a single SNP. We chose a single SNP, phenomewide correction threshold since most of the SNPs in this analysis were in high LD with each other and thus do not represent truly independent tests. A false discovery rate (FDR) of q = 0.05, calculated with the Benjamin and Hochberg method using the R p.adjust method, yields a p-value of 2.48 × 10 −4 (Benjamini and Hochberg, 1995). For our latter analyses, we considered a total of 54 SNPs. Since many phenotypes are correlated with each other and many of the SNPs are in LD, we also used simpleM (Gao et al., 2010) to estimate the number of unique tests performed, leading to an adjustment of p = 2.36 × 10 −6 . All analyses assumed a two-tailed distribution.

OVERVIEW
A total of 24,198 individuals were used in our analyses ( Table 1). Both the eMERGE and BioVU datasets were similar in median age, sex, and BMI. Our analysis of the association of the FTO SNPs with BMI ( Table 2) showed that most SNPs in high linkage disequilibrium with rs8050136 (r 2 > 0.80) have highly significant p-values (< 3 × 10 −9 ) and betas for BMI ( Table 2). SNPs with lower correlations with rs8050136 have highly variable associations with BMI.

PheWAS OF FTO rs8050136 ADJUSTED FOR BMI
After adjusting for average BMI, some of the associations were greatly attenuated, while others remained relatively unchanged (

DISCUSSION
We studied the pleiotropic patterns for FTO variants with and without adjustment for BMI using phenome-wide associations in two large EHR cohorts. Consistent with other studies, we identified statistically significant associations with obesity, morbid obesity, and T2D among SNPs known to be associated with BMI; these associations were attenuated by adjustment for BMI. We also identified an association with OSA and trends toward association with NAFLD, fibrocystic breast disease, and infections, primarily gram-positive, with obesity-related SNPs. Some of these potential associations seem independent of BMI adjustment. Fibrocystic breast changes are a common benign breast disease and traditionally not thought related to obesity, including several epidemiological studies (Friedenreich et al., 2000;Baer et al., 2005;Li et al., 2005). Gram-positive infections could be explained in part by higher incidence of T2D in genetic variants of FTO. By analyzing other SNPs not significantly associated with BMI in our analysis, we also identified a few other potential associations with less common traits not associated with obesity (periodontitis, non-inflammatory diseases of the cervix); neither of these SNPs is in high LD with obesity-related SNPs. The most common ICD-9 code for "non-inflammatory disorders of the cervix" is cervical stenosis or stricture not related to congenital abnormalities or labor, which can result from surgical procedures, radiation, trauma, repeated vaginal infections, or menopause-related atrophy. These results, along with the recent association of FTO variants with IRX3 regulation (Smemo et al., 2014), suggest a broader role for FTO beyond that of regulating fat mass. The question of whether the association of FTO variants and T2D is influenced by obesity or both obesity and FTO has been studied previously. A UK study of 9103 individuals demonstrated the loss of association after adjustment for BMI, as the T2D-FTO association prior to adjustment for BMI showed an OR = 1.15, p = 9 × 10 −6 and after adjustment showed an OR = 1.03, p = 0.44 (Frayling et al., 2007). However, other studies suggest that T2D's association with FTO remains after adjustment for BMI (Hertel et al., 2011;Li et al., 2012). Li et al. studied 96,551 East and South Asians and demonstrated an association with T2D (OR = 1.15, p = 5.5 × 10 −8 ) that was only partially attenuated after adjustment for BMI (OR = 1.10, p = 6.6 × 10 −5 ) (Li et al., 2012). Similarly, Hertel et al. observed a significant T2D-FTO association even after adjustment for BMI in 41,504 Scandinavians, with the OR prior to adjustment of 1.13, p = 4.5 × 10 −8 and after adjustment, OR = 1.09, p = 1.2 × 10 −4 (Hertel et al., 2011). Finally a meta-analysis of 24,198 individuals demonstrated FTO rs9939609 (in high LD with rs8050136 with r 2 > 0.8) was highly significantly associated with T2D before and after adjustment for BMI (before adjustment OR = 1.14, 95% CI = 1.12-1.16, p = 1.00 × 10 −41 ; after adjustment OR = 1.07, 95% CI = 1.05-1.09, p = 6.42 × 10 −41 ) (Xi et al., 2014). However, among individuals of European ancestry, the association was markedly attenuated after adjustment for BMI (before adjustment OR = 1.14, 95% CI = 1.11-1.16, p = 1.36 × 10 −36 ; after adjustment OR = 1.06, 95% CI = 1.04-1.09, p = 3.51 × 10 −8 ). In our study, the association between FTO and T2D did not decrease after adjustment for BMI as markedly as phenotypes such as obesity or sleep apnea. The effect sizes of these associations with T2D in our study closely parallels these larger studies (before BMI adjustment: OR = 1.14, 95% CI = 1.08-1.21, p = 2.11 × 10 −6 ; after adjustment: OR = 1.09, 95% CI = 1.03-1.15, p = 2.62 × 10 −3 ). Although these results show an association of FTO with T2D, a mediation analysis first demonstrating the associations of FTO SNPs with BMI and pre-diagnostic BMI with T2D, and subsequently modeling both FTO SNPs and prediagnostic BMI on T2D would help determine the direct and indirect effects of FTO on T2D. Many of our findings, while having strong signals, were not significant after Bonferroni correction. The significant associations using Bonferroni correction included obesity, T2D, and OSA prior to BMI adjustment. After adjustment for average BMI, no associations retained statistical significance, but multiple phenotypes approached significance including T2D, NAFLD, and the protective effect on fibrocystic breast disease.
There is still much debate and uncertainty about both phenotypic association and protein functionality of FTO. Human FTO protein expression studies fail to replicate FTO's association with obesity observed in mouse models (Klöting et al., 2008;Wåhlén et al., 2008;Grunnet et al., 2009). Recent studies have shown that the SNPs in FTO that are associated with obesity regulate IRX3 expression, which is highly expressed in the brain (Smemo et al., 2014). Studies have described the association between FTO and obesity, while the association between T2D and FTO is debated (Hubacek et al., 2008;Li et al., 2008;Xi and Mi, 2009;Liu et al., 2010;Hotta et al., 2011). More studies with larger populations are required to assess the validity of many of these associations. The results of these associations show the power of the PheWAS method to efficiently detect known and novel pleiotropic associations of genetic variants.
BMI is an inexact surrogate for adiposity. Indeed, individuals with a high BMI do not necessarily have a high body fat percentage, thus BMI may not be the optimal definition of the phenotype (Müller et al., 2010). However, BMI has been shown to be as good a surrogate for obesity and diabetes as other central obesity indicators in multiple studies and meta-analyses (Vazquez et al., 2007;Nyamdorj et al., 2008Nyamdorj et al., , 2009. Prior studies have suggested several other phenotypes that may be associated with FTO variants, including pancreatic cancer, Alzheimer's disease, attention deficit hyperactivity disorder, and alcoholism (Keller et al., 2011;Lurie et al., 2011;Sobczyk-Kopciol et al., 2011;arcOGEN Consortium et al., 2012;Corella et al., 2012;Reitz et al., 2012;Velders et al., 2012). We did not find evidence for these associations in our data set (p > 0.05) ( Table 4), but in these cases we may be underpowered to find an association, with case sizes of 76 (attention deficit hyperactivity disorder), 183 (pancreatic cancer), 192 (Alzheimer's disease), and 267 (alcoholism) in our population. A trend toward association between FTO rs8044769 and osteoarthritis was observed in a previous GWAS study (rs8044769, r 2 = 0.647 with rs8050136, p = 4 × 10 −6 ) (arcOGEN Consortium et al., 2012). Our observation of a trend toward associations with joint effusions, which may be caused by osteoarthritis, lends some support to this inflammatory association.
Further analysis of multiple SNPs associated with obesity in FTO yielded some interesting results. First, the SNPs that are FIGURE 2 | PheWAS plots for other obesity associated SNPs in high LD with rs8050136. These plots show unadjusted values and the average BMI adjusted values on the same axis. These SNPs are associated with BMI and have different correlations with rs8050136. These SNPs are present in both datasets and are presented as meta-analyses below. The pink horizontal line represents p = 4.95 × 10 −5 , which is the Bonferroni correction, and the blue horizontal line represents an FDR q = 0.05 (p = 2.48 × 10 −4 ). (A) rs9939609 is reported widely in the literature and has a nearly identical pattern of associations to rs8050136 (r 2 = 0.96). (B) rs9941349 also has a similar pattern to rs8050136 but cystic mastopathy is marginally more associated (p = 5.41 × 10 −5 , OR = 0.81 before BMI adjustment) than in rs8050136 (r 2 = 0.88).

FIGURE 3 | Continued
Frontiers in Genetics | Applied Genetic Epidemiology August 2014 | Volume 5 | Article 250 | 10 FIGURE 3 | PheWAS plots for other obesity associated SNPs in low LD with rs8050136. These plots show values without adjustment for BMI (shown as triangles) and with adjustment for average BMI (shown as dots) plotted on the same axis. (A) rs6499640 is in both datasets with a lower LD with rs8050136 (r 2 = 0.06) and has a different phenotype pattern than rs8050136 (B) rs16952520 is only present in the eMERGE population and has low LD with rs8050136 (r 2 = 0.03) and while not strongly associated with obesity or diabetes does show significant association with non-inflammatory disorders of the cervix (OR = 6.76, p = 1.92 × 10 −6 ), unaffected by adjustment for BMI (OR = 6.66, p = 2.36 × 10 −6 ) (C) rs7199182 is only present in the eMERGE population and has a low LD with rs8050136 (r 2 = 0.04) and is associated with chronic periodontitis before and after BMI adjustment (no adjustment: p = 5.40 × 10 −5 ; BMI adjustment: p = 5.20 × 10 −5 ). in high correlation with rs8050136 (r 2 > 0.8) have very similar results to rs8050136, which is what we would expect. There are also SNPs that were associated with fibrocystic breast disease prior to adjustment for BMI. rs7199182, is in low LD with rs8050136 (r 2 < 0.01), showed significant associations with chronic periodontitis before and after adjustment for BMI. Further analysis of this SNP and its association with chronic periodontitis will need to be investigated to validate this finding.
One important consideration of this analysis is the small overlap of genotyped SNPs between the BioVU and eMERGE population. There are multiple SNPs that are present in both datasets and are highly correlated with rs8050136, but only rs6499640, which is in weak LD with rs8050136 (r 2 = 0.06), was genotyped in both datasets. We are unable to impute the BioVU. The lack of overlapping SNPs limits our sample size to evaluate more of the potentially novel findings. Limitations caution interpretation of this study. Some of the case sizes were small and will require larger populations to validate. PheWAS analyses require robust EHR systems that can query patient cohorts efficiently. We used ICD-9 codes for the determination of phenotypes, codes which can be unreliable, inaccurate, and incomplete (Kern et al., 2006;Campbell et al., 2011); however, this could tend to result in missed, rather than false, associations. In addition to the caveats of ICD-9 codes, there are limitations of multiple hypothesis testing that come with comparisons of over 1000 phenotypes. Significance corrections like Bonferroni may be too strict; some of the near-significant pleiotropic associations may, in fact, represent genuine associations. Further testing with larger populations and more carefully defined phenotypes are needed to determine whether these associations are real.
Here we demonstrate the use of the PheWAS method to illustrate pleiotropic effects of variation in the gene FTO. When examining this gene with known pleiotropy, we were able to reproduce previously-discovered associations and identify potential new associations, some of which appear independent of obesity.

ACKNOWLEDGMENTS
This work was supported by the eMERGE Network. The eMERGE Network was initiated and funded by NHGRI through the following grants: U01HG006389 (Essentia Institute of Rural Health); U01HG006382 (Geisinger Clinic); U01HG006375, UO1AG06781 (Group Health Cooperative and University of Washington); U01HG06379 (Mayo Clinic).); U01HG006380 (Mount Sinai School of Medicine); U01HG006388 (Northwestern University); U01HG006378 (Vanderbilt University); and U01HG006385 (Vanderbilt University serving as the Coordinating Center); and by National Center for Advancing Translational Sciences (NCATS) grant UL1TR000427 (Marshfield Clinic). Development of the PheWAS method is also supported by R01-LM010685 from the National Library of Medicine. BioVU received and continues to receive support through the NCATS grant 2 UL1 TR000445.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fgene. 2014.00250/abstract