Characterization of the genetic variation present in CYP3A4 in three South African populations

The CYP3A4 enzyme is the most abundant human cytochrome P450 (CYP) and is regarded as the most important enzyme involved in drug metabolism. Inter-individual and inter-population variability in gene expression and enzyme activity are thought to be influenced, in part, by genetic variation. Although Southern African individuals have been shown to exhibit the highest levels of genetic diversity, they have been under-represented in pharmacogenetic research to date. Therefore, the aim of this study was to identify genetic variation within CYP3A4 in three South African population groups comprising of 29 Khoisan, 65 Xhosa and 65 Mixed Ancestry (MA) individuals. To identify known and novel CYP3A4 variants, 15 individuals were randomly selected from each of the population groups for bi-directional Sanger sequencing of ~600 bp of the 5′-upstream region and all thirteen exons including flanking intronic regions. Genetic variants detected were genotyped in the rest of the cohort. In total, 24 SNPs were detected, including CYP3A4*12, CYP3A4*15, and the reportedly functional CYP3A4*1B promoter polymorphism, as well as two novel non-synonymous variants. These putatively functional variants, p.R162W and p.Q200H, were present in two of the three populations and all three populations, respectively, and in silico analysis predicted that the former would damage the protein product. Furthermore, the three populations were shown to exhibit distinct genetic profiles. These results confirm that South African populations show unique patterns of variation in the genes encoding xenobiotic metabolizing enzymes. This research suggests that population-specific genetic profiles for CYP3A4 and other drug metabolizing genes would be essential to make full use of pharmacogenetics in Southern Africa. Further investigation is needed to determine if the identified genetic variants influence CYP3A4 metabolism phenotype in these populations.


INTRODUCTION
The human CYP3A enzymes are regarded as the most prominent Cytochrome P450 (CYP) subfamily in facilitating the elimination of drugs, other xenobiotic compounds and endogenous molecules from the body (Lamba et al., 2002). The pharmacogenetically relevant CYP3A4 is responsible for metabolizing 50-60% of all clinically prescribed drugs (Guengerich, 1999) and is listed among The Pharmacogenetics and Pharmacogenomics Knowledge Base's (PharmGKB's) "very important pharmacogenes" (http://www.pharmgkb.org/gene/ PA130?tabType=tabVip). CYP3A4 can be inhibited by drugs (e.g., ketoconazole and ritonavir) and is often involved in unfavorable drug-drug interactions, due to its ability to accommodate two or more similar or dissimilar molecules in its active site (Sevrioukova and Poulos, 2013). The enzyme is predominantly expressed in the liver and small intestine (Shimada and Guengerich, 1989). Expression has as much as 40-fold variation between individual human livers and a 10-fold variation in the metabolism of CYP3A4 substrates in vivo (Shimada and Guengerich, 1989;Lown et al., 1995;Guengerich, 1999). While complex regulatory pathways and environmental factors are important, it is suspected that a portion of this inter-individual variation can be attributed to genetic variants located within the coding gene regions as well as its core regulatory regions, which affect either the expression level or the functional protein of the gene (Steimer and Potter, 2002;Lamba et al., 2002).
Few pharmacogenetically-relevant polymorphisms have been identified in the CYP3A4 gene; however, some polymorphisms have been associated with, amongst others, immunosuppressant dose requirements (Elens et al., 2011), clopidogrel response variability (Angiolillo et al., 2006), and withdrawal symptoms and adverse reactions in patients receiving methadone treatment (Chen et al., 2011). Furthermore, a rare haplotype, CYP3A4 * 20, results in a complete loss of function (Westlind-Johnsson et al., 2006), while CYP3A4 * 1B is suspected to alter the expression levels of CYP3A4 (Westlind et al., 1999), although conflicting results have been reported . Although genetic variants in the CYP3A4 gene have been extensively studied in populations such as Caucasians, Asians, and African-Americans, little research has been conducted in present-day African populations, including those indigenous to South Africa . Not only are these research disparities observed in candidate gene studies, but they also extend to recent large scale re-sequencing projects such as the 1000 Genomes Project, which although comprehensively examining the genomic variation present in many individuals, provides no information for South African populations (1000 Genomes Consortium, 2010).
We have therefore tried to aid in addressing the disparity of pharmacogenetic data that exists for South African populations by analyzing three of the diverse population groups, which are representative of: (1) the most ancient population group: the Khoisan, (2) the most globally-admixed population group: the South African Mixed Ancestry (MA) population, and (3) the largest language family in South Africa: the Bantuspeaking population group, represented by the Xhosa population. The ancient Khoisan population used in this study consisted of individuals from the !Kung and Khwe linguistic groups (Chen et al., 2000). These individuals are descendant from people of the latter Stone Age and are believed to be some of the first lineages of Homo sapiens (Kaessmann and Pääbo, 2002). The MA population, with Xhosa, Khoisan, European, and Asian ancestral contributions, has been shown to exhibit the highest levels of admixture across the globe (Tishkoff et al., 2009) and is therefore of interest for pharmacogenetic applications as genetic variants present in many different populations may affect these individuals as has been reported for other admixed populations such as those from Brazil (Suarez-Kurtz, 2005, 2010Suarez-Kurtz et al., 2012). Lastly, 9 of the 11 official South African languages are classified as Bantu languages , spoken by ∼75% of the total South African population, and therefore it is imperative that representatives of this group are included in pharmacogenetic research. In this study, we utilized the Xhosa population, which are representative of the Nguni-speaking tribes  and are the biggest Bantu-speaking population in the Western Cape of South Africa, where this research was conducted.
In our experience, it is important that pharmacogenes, such as the CYP genes are comprehensively characterized in South African populations, as we have discovered both novel alleles and unique variation profiles for the CYP2C19 and CYP2D6 genes Wright et al., 2010). It is hoped that the comprehensive characterization of CYP3A4 in these populations will aid future CYP3A4 genotype-phenotype studies in African populations to determine whether functionally relevant CYP3A4 polymorphisms exist that have an impact on drug metabolism phenotype. We therefore screened the 5 -flanking region and thirteen exonic regions of the CYP3A4 gene in the three South African populations described above in order to determine which common allelic variants, novel or previously characterized, are present.

CLINICAL SAMPLES
Ethical approval was obtained for this study from the Human Research Ethics Committee of Stellenbosch University (S12/07/190) and informed consent was acquired from all participants. Genomic DNA (gDNA) was available for 29 Khoisan, 65 Xhosa, and 65 MA healthy individuals. The Khoisan samples used in this study were collected from !Kung and Khwe speaking individuals from the Schmidtsdrift region of the Kalahari desert in the Northern Cape Province of South Africa (Chen et al., 2000), while samples from the Xhosa and MA populations were collected from the Western Cape Province of South Africa.

POLYMERASE CHAIN REACTION (PCR) AMPLIFICATION
Primers were designed to amplify ∼600 bp of the 5 -upstream region and all 13 exons with flanking intronic regions of CYP3A4 (GenBank: AF280107.1; Ensembl ID: ENSG00000160868) (refer to Table 1 for primer sequences). PCR amplifications were carried out in a total reaction volume of 25 µl, with each reaction containing 20-30 ng of gDNA, 10 pmol of each primer, 320 µM dNTPs, 4 mM MgCl 2 , 0.5 U BIOTAQ™ DNA polymerase and 1X Reaction Buffer. All reagents were supplied by Bioline, London, UK. The reaction cycle conditions consisted of an initial denaturation step at 94 • C for 3 min, followed by 30 cycles of 15 s denaturation at 94 • C, 15 s annealing at varying temperatures (refer to Table 1 for specific annealing temperatures), and 30 s extension at 72 • C, with a final extension at 72 • C for 5 min.

DNA SEQUENCING
To identify common CYP3A4 genetic variation occurring in each of the three populations, 15 individuals were randomly selected from every population groups for bi-directional sequence analysis, allowing for detection of alleles with a frequency of more than 10%, with 95% confidence. The PCR products from each of the 13 amplicons were purified with SureClean (Bioline) and bidirectionally sequenced using the BigDye® Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems, CA, USA), after which capillary electrophoresis was performed by the Central Analytical Facility of Stellenbosch University on a 3130xl Genetic Analyser (Applied Biosystems). The obtained sequences were subsequently compared to the reference sequence (GenBank: AF280107.1; Ensembl ID: ENSG00000160868) to detect the presence of variants. The generated sequence data also served to ensure that the reaction conditions used did not amplify any of the associated CYP3A4 isoforms or pseudogenes. The effect of the detected variants was determined using the software programs Sorting Intolerant From Tolerant (SIFT), Polymorphism Phenotyping (PolyPhen) and the Alternative Splice Site Predictor (ASSP) (http://www.es.embnet.org/~mwang/assp.html; Ramensky et al., 2002;Ng and Henikoff, 2003).

GENOTYPING OF THE DETECTED VARIANTS
To determine the frequencies of the genetic variants detected in the 5 -flanking region and the coding regions of the CYP3A4 gene through sequence analysis, an additional 14 individuals from  the Khoisan population and 50 individuals each from the Xhosa and MA populations were genotyped using a combination of restriction fragment length polymorphism (RFLP) analysis and additional sequence analyses. In cases where no restriction sites were created or destroyed by the SNPs of interest, mutagenic primers were designed to introduce artificial restriction enzyme recognition sequences (refer to Table 2 for the primer sequences used for RFLP genotyping). Amplification using the mutagenic primers was performed by means of a nested PCR, using 1 in 200 dilutions of PCR product as template, to avoid co-amplification of isoforms and pseudogenes. The nested PCR conditions were identical to those used during original PCR amplification, except that the cycle number and MgCl 2 concentration were changed to 25 cycles and 2 mM, respectively (refer to Table 2 for annealing temperatures). To ensure that the RFLP assays were successful, samples with known genotypes were selected as controls for each of the individual restriction enzyme analyses. Due to the large number of variants detected in the exon 7 amplicon, all the individuals from all three of the population groups were sequenced for this region, rather than utilizing individual RFLP genotyping assays. Additionally, due to the fact that the RFLP genotyping assay for rs57409622 in exon 6 would detect both the presence of this SNP and the adjacent rs4986907 (allele defining SNP of CYP3A4 * 15), any individuals testing positive for this assay underwent bi-directional sequencing to determine which one, or both, of the SNPs were in fact present. For genotyping specifications, including a list of the specific restriction enzymes used, refer to

VARIANT DETECTION
This study detected 24 SNPs in 45 individuals from three South African populations using CYP3A4 DNA sequencing. Three of the intronic SNPs and one SNP in the 5 -flanking region are novel. Genotyping of rs12721624 in intron 8 and rs147972695 in intron 12 could not be performed in the entire cohort, due to technical difficulties. Genotyping of the remaining 22 SNPs was successful, and all SNPs were in HWE (refer to Table 3 for the positions and frequencies of the detected SNPs). The previously described alleles CYP3A4 * 1B and * 1G were present in all three populations, while CYP3A4 * 12 and CYP3A4 * 15 were only present in the Xhosa population. Furthermore, two novel alleles, designated CYP3A4 * 23 and CYP3A4 * 24, which are characterized by the two non-synonymous SNPs, resulting in p.R162W and p.Q200H, were detected. CYP3A4 * 24 was present in all three population groups, while CYP3A4 * 23 was present in the Xhosa and Khoisan populations. Of particular interest, the amino acid change caused by R162W (CYP3A4 * 23) was predicted by both the SIFT and PolyPhen algorithms to affect the function of the protein product. None of the variants were predicted to alter any splice-sites.

POPULATION VARIANT FREQUENCY COMPARISONS
When examining the successfully genotyped variants in the three South African populations, we noticed that the allele frequencies for several SNPs differed significantly between the population groups (P < 0.05) (refer to Table 4). The smallest difference was seen when the allele frequencies of the Khoisan and Xhosa populations were compared, with the allele frequencies of three SNPs differing significantly between these two population groups. With regards to the comparisons of (1) the Khoisan and MA populations, and (2) the Xhosa and MA populations, five and six SNPs showed significant allele frequency differences, respectively. The novel CYP3A4 alleles, although detected in the South African populations (refer to  Browser. When examining the previously described CYP3A4 alleles, in the case of CYP3A4 * 12 and CYP3A4 * 15, the data from the 1000 Genomes Browser was not available for all populations, therefore we utilized frequency data from the HapMap Project. Both of these variants were not detected in any of the HapMap populations, however, it should be noted that frequency data for the CYP3A4 * 15 SNP was not available for the YRI population.
These SNPs were also absent in the MA and Khoisan populations, but were present in the Xhosa population at a frequency of at 2.3% and 2.4%, respectively. With regards to the remaining potentially functional CYP3A4 allele, namely CYP3A4 * 1B, the frequencies of this variant in the three South African populations as well as those reported by the 1000 Genomes Browser differed substantially, as demonstrated by Figure 1.

CYP3A4 GENETIC VARIATION IN THE THREE SOUTH AFRICAN POPULATIONS
To our knowledge, this was the first study in which the entire coding region of the CYP3A4 gene was screened for common genetic variation in any Southern African population. This study identified a total of 24 variants in the three South African population groups, which included the discovery of four novel SNPs (i.e., ∼17% of the total genetic variation). Overall this study revealed the presence of the previously described CYP3A4 * 1B, CYP3A4 * 1G, CYP3A4 * 12, and CYP3A4 * 15 alleles, in addition to two novel alleles, CYP3A4 * 23 and CYP3A4 * 24. The number of novel alleles reported here is in accordance with the number of novel alleles that we have detected previously through the resequencing of other CYP genes in South African populations Wright et al., 2010). Prior to this study, these novel CYP3A4 alleles had not been recorded on the CYP allele database and were present at very low frequencies in the populations described on the 1000 Genomes Browser. Both were present in the Xhosa and Khoisan populations examined in this study and CYP3A4 * 24 was additionally detected in the MA population. The p.R162W amino acid change in exon 6, characterizing CYP3A4 * 23, may have functional consequences for the CYP3A4 protein, as arginine is a positively charged and hydrophilic amino acid; while tryptophan is a polar, aromatic and hydrophobic amino acid. The likely functional consequences of this variant are in agreement with the predictions made by both the SIFT and PolyPhen algorithms. Although the p.Q200H variant, characterizing CYP3A4 * 24, was not predicted to change the function of the protein product, the presence of this variant has also been reported in the genome of another Khoisan individual sequenced by Schuster et al. (2010), which correlates to the fact that the frequencies of both novel alleles were highest in the Khoisan population . Additionally, the low frequency of these variants in the 1000 Genomes populations in comparison to the presence of these alleles at varying frequencies in the South African population groups, highlights the unique genomic composition of South African populations. Thus, results obtained from other population groups cannot be directly inferred onto the South African populations and comprehensive re-sequencing studies such as this one are required to characterize South African genomes.
The recent discovery of the CYP3A4 * 22 allele confirmed that novel alleles may have functional relevance to the field of pharmacogenetics . This allele was initially found to influence RNA expression and statin dose requirement . These findings have subsequently been replicated with regards to statin therapy and the allele has additionally been shown to influence the dose requirements of the immunosuppressants, tacrolimus, and cyclosporine (Elens et al., 2011(Elens et al., , 2012. CYP3A4 * 22 is characterized by the intron 6 SNP rs35599367, which, however, was not genotyped in the current study as the aim of the study was to examine only coding regions, including the exon-intron boundaries, and the core promoter region of the gene. Furthermore, this variant does not occur in the 1000 genomes AFR or ASN populations and occurs at very low frequencies (2-5%) in the EUR and AMR populations and is thus unlikely to occur at pharmacogenetically relevant frequencies in the South African populations (http://browser.1000geno mes.org/Homo_sapiens/Variation/Population?r=7:993658169936 6816;source=dbSNP;v=rs35599367;vdb=variation;vf=11936818). The reported functional significance of this intronic CYP3A4 * 22 variant does, however, highlight the importance of non-coding regions. The significance of these areas, including regions that are not in close proximity to the gene has been further emphasized by the recent release of the ENCODE data (ENCODE Project Consortium et al., 2012). These data suggest that in the future additional analyses to examine the variation present in such areas, including the functional significance of the four novel non-coding SNPs identified by this study, are warranted.
The detection of novel variants in this study highlights the fact that although large re-sequencing studies such as the 1000 Genomes Project have played an integral role in characterizing the human variome (1000 Genomes Consortium, 2010), novel variation still exists, underlining the importance of re-sequencing studies such as this one. These re-sequencing studies are specifically required in African populations, as these populations have been under-represented in genomic studies to date (Rosenberg et al., 2010;Drögemöller et al., 2011). Furthermore, it may be important to compare results obtained by next generation sequencing studies to those obtained through the use of Sanger sequencing. Although the throughput of next generation sequencing studies is beyond comparison, it may be beneficial to evaluate the accuracy of next generation sequencing for the complex and polymorphic CYP genes, whose sequences show high similarity to one another and to their corresponding pseudogenes . This may be particularly important with regards to the genotyping of CYP3A4, which shows between 76 and 88% sequence similarity to the CYP3A43, CYP3A5, and CYP3A7 genes (http://www.ensembl.org/Homo_ sapiens/Gene/Compara_Paralog?g=ENSG00000160868;r=7:9935 4604-99381888) and is thus likely to be affected by the consequences of misalignment or co-amplification of other genes during the use of high-throughput technologies.
Of the previously identified alleles that were detected in this study, both CYP3A4 * 1B and CYP3A4 * 12 have been reported to have functional relevance for pharmacogenetic applications. The high frequency CYP3A4 * 1B is characterized by a 5 -upstream c.−392A>G point mutation in a regulatory element, namely the putative nifedipine-specific element, which has been linked to altered gene expression in vitro (Amirimani et al., 2003;Georgitsi et al., 2011). Furthermore, this allele has been associated with various disease states such as prostate cancer and secondary leukemias (Lamba et al., 2002). Of relevance to pharmacogenetic applications, PharmGKB lists this SNP as affecting the metabolism of a number of therapeutic drugs, although the level of evidence for variant-drug associations is still low currently (http://www.pharmgkb.org/rsid/rs2740574; Whirl-Carrillo et al., 2012). The lack of pharmacogenetic evidence for this allele is further questioned by the results obtained by Wang et al. (2011) and the functional significance of this variant may require further examination. On the other hand, while CYP3A4 * 1B appears to affect the expression of CYP3A4, CYP3A4 * 12 (p.L373F) affects the protein product. p.L373 has been identified as one of the key residues affecting substrate binding, cooperativity and regioselection of metabolism (Sevrioukova and Poulos, 2013) and therefore the amino acid change has been shown to result in a protein that amongst others, displays an altered testosterone metabolite profile and a four-fold increase in the Km value for 1 -OH midazolam formation (Eiselt et al., 2001). While CYP3A4 * 1B occurs at a relatively high frequency, both CYP3A4 * 12 and CYP3A4 * 15 occur at low frequencies, possibly limiting the relevance that these two variants may have for pharmacogenetic applications, especially when considering their absence from the HapMap populations. Similarly, the lack of applicability of the SNPs defining CYP3A4 * 3, CYP3A4 * 13, CYP3A4 * 17, and CYP3A4 * 18 to pharmacogenetics in the South African setting is also likely as they were not detected in this study or a previous study (Ikediobi et al., 2011). These conclusions should however, be made with caution, as the relatively frequent occurrence of rare variants in African populations (Tishkoff et al., 2009) cannot be ignored and the effect of such variants should possibly also be taken into account when considering the implementation of pharmacogenetics in the African context.

VARIANT FREQUENCY DIFFERENCES BETWEEN THE THREE SOUTH AFRICAN POPULATIONS
When comparing the significant differences in allele frequencies between the three population groups, it was observed that the three groups differed significantly from one another for eight SNPs (refer to Table 4). These results reflect the unique genomic compositions of South African populations  and indicate that the results of one South African population are not always representative of another South African population. When looking at the three populations independently, the Khoisan and Xhosa were shown to be the most similar to one another, while the differences observed between the Khoisan and MA and the Xhosa and MA were comparable. The fact that the MA population showed the greatest number of genetic differences may be explained by the large number of ancestry contributions, other than the Xhosa and Khoisan, that have been made to this population (Schlebusch et al., 2009;De Wit et al., 2010;Quintana-Murci et al., 2010;Warnich et al., 2011). The large degree of similarity observed between the Xhosa and Khoisan is to be expected and can be explained by the large ancestry contribution that the Khoisan have made to the Xhosa population (De Wit et al., 2010;Warnich et al., 2011).
The differences in allele frequencies observed for the CYP3A4 * 1 sub-allele, CYP3A4 * 1B, between the different population groups (refer to Figure 1), serves as an excellent illustration of how pharmacogenetic applications may differ between population groups. It is important to remember that drugs designed to optimally treat one population group based on the presence of a certain allele, may be harmful to another population group for which the opposite allele is dominant. Figure 1 shows how the CYP3A4 * 1B allele is more frequent in the African populations (Khoisan, Xhosa and AFR), while in the ASN and EUR population groups the opposite allele occurs more often. Interestingly, both the MA and AMR admixed populations show allele frequencies that are intermediate between the African and EUR/ASN populations. Furthermore, the MA is more similar to the African populations, while the AMR is more similar to the non-African populations. These results are in accordance with the ancestral history of these population groups. The MA have ancestry contributions from the Xhosa, Khoisan, European, and Asian populations (Schlebusch et al., 2009;Tishkoff et al., 2009;De Wit et al., 2010;Quintana-Murci et al., 2010), which explains why although the frequencies of the variants in this population are between the African and EUR/ASN populations, they are more similar to the African populations. On the other hand, the AMR population, which consists of the Mexicans, Puerto Ricans, Columbians, and Peruvians (http://www.1000genomes.org/faq/ which-populations-are-part-your-study), is more similar to the EUR/ASN population groups due to the larger ancestry contribution that these populations have made to the AMR, when compared to the ancestry contribution of Africans (Galanter et al., 2012).
The allele frequency differences observed in admixed populations, as previously reported in admixed Brazilian populations (Suarez-Kurtz, 2005, 2010Suarez-Kurtz et al., 2012), bring to light an important consideration for the implementation of pharmacogenetics. Individuals within admixed populations are likely to exhibit different levels of ancestry contributions, as has been shown with the use of STRUCTURE analyses for both the MA (Tishkoff et al., 2009) and Brazilian populations (Suarez-Kurtz, 2010). Population based pharmacogenetic testing is thus unlikely to detect all pharmacogenetically relevant variants and it may be necessary to implement pharmacogenetics on an www.frontiersin.org February 2013 | Volume 4 | Article 17 | 9 individualized level. In the context of South Africa with its diverse population groups, which exhibit both rare variants and variants from several different population sources , this may be especially important. However, before this goal can be realized it will be necessary to consider whether individualized treatment will be feasible in the resource limited settings of the country.

CONCLUSIONS
Although this study identified both novel and known SNPs of functional significance in all three population groups, due to the current lack of validated evidence regarding the pharmacogenetic application of CYP3A4, the relevance of these SNPs in the clinical setting remains unknown. The SNP markers detected in the current study should therefore be included in genotyping panels in future pharmacogenetic association studies involving CYP3A4 substrate medications. Nonetheless, this study provides an excellent example of how re-sequencing studies are required in African populations in order to identify variation that remains novel. These differences in allele frequencies were not only seen when comparing the South African populations to other populations, but also when comparing them to each other. These results demonstrate that a one-size-fits-all approach is not ideal when implementing therapeutic treatment regimes, also within the South African context.