Evaluation of Genome Wide Association Study Associated Type 2 Diabetes Susceptibility Loci in Sub Saharan Africans

Genome wide association studies (GWAS) for type 2 diabetes (T2D) undertaken in European and Asian ancestry populations have yielded dozens of robustly associated loci. However, the genomics of T2D remains largely understudied in sub-Saharan Africa (SSA), where rates of T2D are increasing dramatically and where the environmental background is quite different than in these previous studies. Here, we evaluate 106 reported T2D GWAS loci in continental Africans. We tested each of these SNPs, and SNPs in linkage disequilibrium (LD) with these index SNPs, for an association with T2D in order to assess transferability and to fine map the loci leveraging the generally reduced LD of African genomes. The study included 1775 unrelated Africans (1035 T2D cases, 740 controls; mean age 54 years; 59% female) enrolled in Nigeria, Ghana, and Kenya as part of the Africa America Diabetes Mellitus (AADM) study. All samples were genotyped on the Affymetrix Axiom PanAFR SNP array. Forty-one of the tested loci showed transferability to this African sample (p < 0.05, same direction of effect), 11 at the exact reported SNP and 30 others at SNPs in LD with the reported SNP (after adjustment for the number of tested SNPs). TCF7L2 SNP rs7903146 was the most significant locus in this study (p = 1.61 × 10−8). Most of the loci that showed transferability were successfully fine-mapped, i.e., localized to smaller haplotypes than in the original reports. The findings indicate that the genetic architecture of T2D in SSA is characterized by several risk loci shared with non-African ancestral populations and that data from African populations may facilitate fine mapping of risk loci. The study provides an important resource for meta-analysis of African ancestry populations and transferability of novel loci.


INTRODUCTION
Sub-Saharan Africa (SSA) is one of the regions with the fastest growth in type 2 diabetes (T2D) worldwide (Wild et al., 2004). There are an estimated 19.8 million people with type 2 diabetes in SSA in 2013 and this number is projected to increase to 41.5 million by the year 2035 (IDF, 2013). Genome-wide association studies (GWAS) have been particularly productive for understanding the genetic basis of T2D, with over 100 associated susceptibility loci reported, including a recent large meta-analysis (n ∼150,000) yielding 65 loci (Morris et al., 2012) in European ancestry populations alone. However, most of these success stories have come from European and Asian ancestry populations. A few GWAS for T2D have been done in African Americans, including a meta-analysis (Ng et al., 2014), but there is currently no similar study of indigenous Africans. To date, only one genome-wide linkage study of T2D in an African population has been published (Rotimi et al., 2004) and a GWAS of T2D in a SSA population has not yet been done. Here, we report a replication and fine mapping analysis of T2D in SSA with 1775 subjects (1035 cases, 740 controls) genotyped on the Affymetrix Axiom R PanAFR array (imputed into the 1000 Genomes phase 1 v3 reference panel). Given the relatively modest sample size and limited power for novel discovery, we focus on evaluation of previously reported T2D GWAS loci in this study of indigenous Africans, including looking for evidence of replication or transferability and conducting fine mapping studies to test whether the relatively weaker linkage disequilibrium (LD) and smaller haplotypes in this African sample could improve the resolution of previously reported loci.

Ethics Statement
Ethical approval for the study was obtained from the Institutional Review Board (IRB) of each participating institution. All subjects provided written informed consent for the collection of samples and subsequent analysis. This study was conducted in accordance with the principles expressed in the Declaration of Helsinki.

Study Participants
The initial study sample consisted of 1822 unrelated subjects from the Africa America Diabetes Mellitus (AADM) study (Rotimi et al., 2001(Rotimi et al., , 2004, a genetic epidemiology study of T2D in SSA. All subjects were SSA, enrolled from university medical centers in Nigeria, Ghana, and Kenya. Patients attending medical clinics at these medical centers or patients referred for clinical suspicion of diabetes were evaluated for potential inclusion in the study as described below. After providing informed consent, all participants underwent a clinical examination that included a medical history, clinical anthropometry, blood pressure measurements and blood sampling. Weight was measured in light clothes on an electronic scale to the nearest 0.1 kg, and height was measured with a stadiometer to the nearest 0.1 cm. Body mass index (BMI) was computed as weight in kg divided by the square of the height in meters. The other clinical measurements have been described elsewhere (Rotimi et al., 2001(Rotimi et al., , 2004. The definition of T2D was based on the American Diabetes Association (ADA) criteria: a fasting plasma glucose concentration (FPG) ≥ 126 mg/dl (7.0 mmol/l) or a 2-h postload value in the oral glucose tolerance test ≥ 200 mg/dl (11.1 mmol/l) on more than one occasion. Alternatively, a diagnosis of T2D was accepted if an individual was on pharmacological treatment for T2D and review of clinical records indicated adequate justification for that therapy. The detection of autoantibodies to glutamic acid decarboxylase (GAD) and/or a fasting C-peptide ≤ 0.03 nmol/l was used to exclude probable cases of type 1 diabetes. Controls were required to have FPG < 110 mg/dl or 2-h postload of < 140 mg/dl and no symptoms suggestive of diabetes (the classical symptoms being polyuria, polydipsia, and unexplained weight loss).

Genotyping
Samples were genotyped on the Affymetrix Axiom R PANAFR SNP array. This array of ∼2.1 million SNPs is one of Affymetrix's Axiom R Genome-Wide Population-Optimized Human Arrays and is optimized for African ancestry populations. The array offers pan-African genomic coverage, with ≥90% genetic coverage of common and rare variants (MAF >2%) of the Yoruba (West African) genome and >85% coverage of common and rare variants (MAF >2%) of the Luhya and Maasai (East African) genomes. Starting from 1822 subjects, 14 (one duplicated and 13 sex-discordant) samples were excluded after initial quality control and 33 subjects were excluded because they showed cryptic relatedness with other subjects (IBD Pi ∧ Hat > 0.125 indicating more than 3rd degree relatedness). The remaining 1775 subjects (1035 T2D cases, 740 controls) formed the basis of this analysis. The sample-level genotype call rate was at least 0.95 for all subjects. The 1775 subjects included 1598 (90%) West Africans enrolled from Nigeria and Ghana (Rotimi et al., 2001(Rotimi et al., , 2004 and 177 (10%) East Africans enrolled from Kenya. The most common ethnic groups represented were: Yoruba (31.2%), Igbo (23.5%), Akan (20.5%), Gaa-Adangbe (8.6%), and Kalenjin (5.6%).
The resulting imputed dosage data were filtered for imputed allelic dosage frequency < 0.01 and r 2 < 0.3, yielding ∼15M SNPs for analysis.

Statistical Analysis
Association analysis was done with mach2dat using the imputed SNP dosage data within a logistic regression framework. Use of the allelic dosages is preferable to using the best guess genotypes because it accounts for the uncertainty in imputation. Covariates included were age, sex, BMI and the first three PCs of the genotypes. Residual population stratification was low (genomic inflation factor, λ = 1.016) after adjusting for the first three PCs of the genotypes (Supplementary Figure 3). Top association hits with a p ≤ 5 × 10 −7 can be found in the Supplementary Tables.
We looked for evidence of transferability of established T2D susceptibility loci reported in the literature from GWAS and meta-analysis of GWAS (n = 106 loci-Supplementary Table  2). More than half of these susceptibility loci (n = 65) were reported in the largest meta-analysis of T2D in populations of European ancestry (DIAGRAM+) (Morris et al., 2012). We first examined the p-value at each reported SNP in our study (exact transferability or replication) and considered a p < 0.05 and consistency of direction of effect for the same allele as evidence for significant transferability. Next, we examined all the SNPs in the LD block (as determined by the method of Gabriel et al., 2002) containing the index SNP in the 1000 Genomes EUR or CHB population reference (as appropriate for the discovery hit) for evidence of local transferability. P-values were adjusted for the number of SNPs tested around each index SNP. The number of independent SNPs was determined and correction for multiple testing was done using the method of the effective degrees of freedom for the spectrally decomposed covariance matrix for the block of SNPs (Bretherton et al., 1999;Ramos et al., 2011). Briefly, we estimate the covariance matrix for the block of SNPs using the genotype data. Then, the covariance matrix was spectrally decomposed and the effective degrees of freedom (N eff ) estimated using the relationship, λ 2 k , in which λ k is the kth eigenvalue of the K × K covariance matrix for the K SNPs. Finally, the nominal significance threshold α = 0.05 was divided by N eff . We consider the "best SNP" in the haplotype block as the SNP showing the smallest p-value and that is in LD with the reported SNP. Using data from our study sample and from the 1000 Genomes YRI, haplotype blocks were constructed around each locus that showed transferability to determine if African ancestry samples helped to fine-map the locus.
The original reports of the loci studied presented effect sizes ranging from an OR of 1.01-1.6 in most studies and the minor allele frequencies at the risk loci ranged from 0.02 to 0.49 in our dataset. We estimated power for replication of a reported SNP at a one-sided α of 0.05 (i.e., same direction of effect) for OR ranging from 1.10 to 1.50 and for a range of allele frequencies in our data set (Supplementary Figure 4). For example, power for replication was 83% for a locus with OR 1.2 at a risk allele

RESULTS
The characteristics of the study participants are shown in

Transferability of Reported GWAS Type 2 Diabetes Susceptibility Loci
From our association tests, we looked for evidence of transferability of 106 established T2D SNPs. We had data on 103 of the 106 SNPs in our dataset. We found exact replication with the index SNP (same allele, consistent direction of effect, p < 0.05) with 11 loci ( Table 2). Using a local replication strategy in which we examined SNPs in LD with the reported index SNP, we found an additional 30 SNPs showing significant association with T2D in this dataset ( Table 3). In sum, we found significant association with T2D for 41 of the 103 GWAS established T2D loci we examined in this study. Overall, 76 of the 103 tested SNPs are directionally consistent with the initial report (p = 1 × 10 −6 , binomial test). The TCF7L2 SNP rs7903146 showed the strongest association with T2D in this study (p = 1.61×10 −8 , OR 1.50, 95% CI 1.26-2.15)- Supplementary Table 5. It should be noted that this SNP shows the strongest evidence of association with T2D in most GWAS and remains the most consistently associated locus in most populations studied so far. Two of the 106 loci, INS-IGF2 rs3842770, and HLA-B rs2244020, were reported by the only meta-analysis GWAS in an African ancestry population [the MEta-analysis of T2D in African Americans (MEDIA) Consortium, Ng et al., 2014]. In our sample of SSA, we found suggestive evidence of association for INS-IGF2 rs3842770 (p = 0.067) and no significant association for HLA-B rs2244020 (merged into rs74995800, p = 0.878).

Fine Mapping
For the 11 loci that showed exact transferability, we examined the LD structure around the index SNP to see if the locus could be fine-mapped. In 9 of the 11 loci, we found smaller haplotype block sizes around the lead SNPs in this study when compared to the original discovery population (Figure 1). Two examples, SLC30A8 and CDKAL1, are shown in Figure 2. Notably, 9 of the 11 SNPs that showed exact transferability had another SNP in LD that showed stronger evidence of association (i.e., smaller p-values) than the reported index SNP (Supplementary Figure  5). The two exceptions in which the reported SNP also had the smallest p-value in the haplotype block were TCF7L2 and ZBED3.

DISCUSSION
The field of T2D genetics has been remarkably successful in identifying risk loci using the GWAS approach, especially when multiple studies are combined in meta-analyses. Such studies of T2D (and other cardiovascular and metabolic diseases) remain rare in SSA. This study, evaluating for the first time a large number of reported T2D loci in individuals of African ancestry living on the continent, provides insight into the genetic architecture of T2D in SSA and promises to be a valuable resource for replication and meta-analysis as more GWAS are conducted in Africans. We focused on transferability of GWAS established T2D loci rather than discovery, given our limited sample size in the context of the known modest effect sizes of risk variants. We also conducted fine mapping studies, capitalizing on the lower LD and shorter haplotypes in populations of African ancestry. The low LD in African populations compared to European and Asian populations should make association studies in African ancestry populations a good way to fine map risk loci reported from large studies in these other populations.
In addition, differences in diet, physical activity and other environmental factors could have an impact on association results, potentially improving the utility of African ancestry populations in genetic association studies. We found evidence of transferability for 41 of 103 reported T2D loci tested in this study using both exact and local replication strategies. Our transferability rate for exact replication (11/103 or 10%) is somewhat lower than in earlier studies of African Americans. For example, Long et al. (2012) replicated 7 of 29 (24%) T2D associated SNPs while Ng et al. (2013) replicated 7 of 40 (18%) loci in their study of African Americans in the Candidate Gene Association Resources Plus Study. It is also lower than the 18% (19/104) transferability reported by a metaanalysis of African Americans (Ng et al., 2014). This is probably a reflection of sample size differences between the studies, since larger sample sizes have greater power to detect associations of a given effect size. Another factor that could account for these differences is that this study analyzed SSA living in Africa while the other studies were of African Americans: despite similar genetic ancestry, the environmental background is dramatically different, especially in terms of diet, physical activity, and obesity, all relevant for T2D risk.
TCF7L2 rs7903146 showed the strongest association with T2D in this study. This locus is one of the most consistently replicated susceptibility loci for T2D in multiple populations. Notably, an African sample from the AADM study was instrumental to the refinement of the TCF7L2 locus after its initial discovery (Grant et al., 2006;Helgason et al., 2007). Since then a number of candidate gene studies in Africans have confirmed its association with T2D in Ghana (Danquah et al., 2013), Cameroon (Guewo-Fokeng et al., 2015;Nanfa et al., 2015) and various North African groups (Bouhaha et al., 2010;Kifagi et al., 2011;Mtiraoui et al., 2012;Ben-Salem et al., 2014;Turki et al., 2014). Most of these studies have genotyped a few SNPs. A notable exception is an evaluation study of 37 GWAS-associated T2D loci in North African Arabs (Cauchi et al., 2012) which found nominal evidence for 13 of the loci reported in Europeans. In a wider context, the findings of this study are consistent with the expectation of observing differential effects when replicating tag SNPs found in European ancestry GWAS in non-European ancestry populations. This observation is most pronounced in African ancestry individuals with differential effects diluted toward the null (Carlson et al., 2013).
An expectation of association studies of African ancestry populations is that it would be possible to fine map or refine disease-associated loci because of lower LD and smaller haplotypes. The first demonstration of this principle for T2D was for the TCF7L2 locus (Helgason et al., 2007). Several other studies have demonstrated the same phenomenon for T2D (Ng et al., 2013), as well as for glucose-related traits (Ramos et al., 2011), uric acid (Charles et al., 2011), bilirubin levels , and serum lipids  in African Americans. In the present study, the majority of loci that showed transferability were fine-mapped with neighboring SNPs showing stronger association with T2D than the reported index SNP. Together, these findings provide compelling evidence that the reduced and different LD patterns present in African populations can facilitate trans-ethnic fine mapping of disease loci. It is therefore expected that the number of loci that can be fine mapped will increase as more studies are done in African ancestry populations.
Other than for replication and fine mapping, discovery studies in populations of different ancestries are needed as they have the potential to find novel susceptibility loci which could be population-specific or cosmopolitan yet more easily discovered in a specific population (McCarthy, 2008). Notable examples include the discovery of the T2D associated genes KCNQ1 in East Asians (Yasuda et al., 2008;Unoki et al., 2008), SGCG in Punjabi Sikhs  and of SLC6A11 in Mexicans (SIGMA Type 2 Diabetes Consortium et al., 2014). Given the genetic and environmental diversity represented on the African continent, doing such studies in African populations has the potential to discover novel loci and FIGURE 2 | Association plots and LD patterns at regions flanking SLC30A8 and CDKAL1. The "best SNP" in the haplotype block is the SNP showing the smallest p-value that is in LD with the reported SNP. enrich our knowledge of the genetics of T2D on the continent. In addition, similar to the European and Asian experiences, it is expected that more shared T2D loci across global populations will be discovered as additional studies are conducted in Africans and larger sample sizes become available for meta-analysis.
A potential limitation in the present study is the sample size. Larger samples have the potential to identify and replicate more T2D risk loci, especially those with smaller effect sizes or with lower allele frequencies. Nonetheless, the study provides a resource for future studies of T2D in Africans for purposes of replication and meta-analysis.
In conclusion, this first large scale replication and fine mapping analysis of reported T2D-associated risk loci in Africans successfully demonstrated evidence of transferability and transethnic fine mapping of several loci reported in European and Asian ancestry populations. Notably, 41 reported GWAS loci for T2D were found to be associated with disease risk in this study. These findings indicate that the genetic architecture of T2D in SSA is characterized by several risk loci shared with non-African ancestral populations and that data from African populations may facilitate fine mapping of risk loci.

AUTHOR CONTRIBUTIONS
CR, AA, GD, FC designed the study; OF, TJ, BE, KA, JA, WB, CA, AA2, JA, DC, CA, GO, JO did participant recruitment, phenotyping and field laboratory assays; AD, HH, AE, SC did molecular laboratory assays and genotyping; AA, FT, AB, JZ, GC, DS did data management and statistical analysis; AA, FT, AB drafted the manuscript; CR, DS, FC edited the manuscript; all authors reviewed and approved the manuscript.