Genetic variation in the 3′-UTR of CYP1A2, CYP2B6, CYP2D6, CYP3A4, NR1I2, and UGT2B7: potential effects on regulation by microRNA and pharmacogenomics relevance

Introduction: Pharmacogenomics research has concentrated on variation in genes coding for drug metabolizing enzymes, transporters and nuclear receptors. However, variation affecting microRNA could also play a role in drug response. This project set out to investigate potential microRNA target sites in 11 genes and the extent of variation in the 3′-UTR of six selected genes; CYP1A2, CYP2B6, CYP2D6, CYP3A4, NR1I2, and UGT2B7. Methods: Fifteen microRNA target prediction algorithms were used to identify microRNAs predicted to regulate 11 genes. The 3′-UTR of the 6 genes which topped the list of potential microRNA targets was sequenced in 30 black South Africans. In addition, genetic variants within these genes were investigated for interference with mRNA-microRNA interactions. Potential effects of observed variants were determined using in silico prediction tools. Results: The 11 genes coding for DMEs, transporters and nuclear receptors were predicted to be targets of microRNAs with CYP2B6, NR1I2 (PXR), CYP3A4, and CYP1A2, interacting with the most microRNAs. The majority of identified genetic variants were predicted to interfere with microRNA regulation. For example, the variant, rs1054190C in NR1I2 was predicted to result in the presence of a binding site for the microRNA miR-1250-5p, while the variant rs1054191G was predicted to result in the absence of a recognition site for miR-371b-3p, miR-4258 and miR-4707-3p. Fifteen of the seventeen, novel variants occurred within microRNA target sequences. Conclusion: The 3′-UTR harbors variation that is likely to influence regulation of specific genes by microRNA. In silico prediction followed by functional validation could aid in decoding the contribution of variation in the 3′-UTR, to some unexplained heritability that affects drug response. Understanding the specific role of each microRNA may lead to identification of markers for targeted therapy and therefore improve personalized drug treatment.


INTRODUCTION
Drug metabolizing enzymes (DMEs), drug transporters and nuclear receptors, play an important role in response outcomes to therapeutically prescribed drugs. For example, the analgesic drug codeine is dependent on CYP2D6 metabolism to the more active morphine metabolite (Armstrong and Cozza, 2003). However, CYP2D6 exhibits extensive genetic polymorphism resulting in individuals being broadly classified as poor metabolizers (PMs), extensive metabolizers (EMs), and ultrarapid metabolizers, phenotypes that vary across world populations. Another example is that of the disposition of efavirenz and nevirapine used in treatment of HIV/AIDS, both of which are metabolized by CYP2B6. The genetic variants CYP2B6 c.516T, and c.983C have been linked to high efavirenz or nevirapine plasma levels and the development of central nervous system side effects (Marzolini et al., et al., 2001) and predicted to regulate more than 30% of protein coding genes in the human genome (Yu, 2009;Breving and Esquela-Kerscher, 2010;Shomron, 2010;Rodrigues et al., 2011) which include genes with pharmacogenomics relevance. MicroRNAs regulate genes post-transcriptionally by binding to the 3 -UTR of mRNA target sequences and facilitating degradation or translational repression (Baek et al., 2008;Mishra and Bertino, 2009;Marin and Vanicek, 2012). MicroRNAs have been implicated in regulation of the following pharmacogenomically relevant genes; CYP1B1 (Tsuchiya et al., 2006), CYP2E1 (Mohri et al., 2010), CYP3A4 (Pan et al., 2009), and PXR (Takagi et al., 2008). Pan et al. (2009) showed that microRNA miR-27b regulates expression of CYP3A4 directly, while miR-148a regulates CYP3A4 indirectly through regulation of PXR (Takagi et al., 2008). SULT1A1 has also been shown to be regulated directly by miR-631 in an allele-specific manner (Yu et al., 2010). Genes coding for CYP1A2 and CYP2B6, are predicted to be regulated by microRNAs based solely on the length of their 3 -UTRs, but these predictions are yet to be confirmed through functional studies (Ingelman-Sundberg et al., 2007;Ramamoorthy and Skaar, 2011;Singh et al., 2011;Mishra, 2012).
Genetic variation that occurs either in the sequence coding for microRNAs or in the sequence targeted by microRNA on the mRNA, is referred to as miRSNPs. MiRSNPs potentially alter microRNA function or interfere with microRNA-mRNA interaction (Mishra et al., 2008;Liu et al., 2012). MiRSNPs located in the mRNA can result in the presence or absence of microRNA target sites and to date only a few pharmacogenomics studies have included genetic variation in the 3 -UTR of pharmacogenomically-relevant genes as possible contributors to interindividual variability observed in drug response. In order to evaluate the role of microRNA in drug response, prediction of likely microRNA target sequences should be followed by functional validation.
There are several algorithms to predict microRNA targets and these algorithms are designed based on a number of different criteria (Witkos et al., 2011). To increase the precision and power of microRNA target prediction as well as reduce false positives, the use of a combination of prediction algorithms is advised (Mishra and Bertino, 2009;Marin and Vanicek, 2012). Recently, a study by Ramamoorthy and Skaar (2011) used six algorithms to predict microRNAs that potentially target 12 major drug metabolizing CYPs. With more than 10 algorithms available, the challenge is which combinations or how many algorithms should be used to predict microRNA targets. The use of a combination of microRNA target prediction, using different features including; sequence complementarity, target site accessibility, binding energy, and conservation of target sites, in a complementary manner may improve sensitivity, specificity, and consistency of microRNA target prediction (Zhang and Verbeek, 2010). In this study, we set out to identify microRNAs predicted to regulate genes associated with drug metabolism and disposition using 15 in silico prediction algorithms that are available online to evaluate the spectrum of microRNA target sequences in the 3 -UTR of 11 genes. Variation in the 3 -UTR of six genes was evaluated by sequencing resulting in the identification of microRNA SNPs (miRSNPs) which were predicted to potentially affect microRNA binding.
The total number of unique microRNAs were calculated for each gene and the percentage overlap was estimated based on the number of microRNAs predicted by at least two algorithms. Analysis using these prediction algorithms was performed using the most recent version and respective default settings. Eight datasets containing microRNA expression profiles in liver tissue was obtained from mimiRNA (http://mimirna.centenary.org. au/mep/formulaire.html) (Ritchie et al., 2010) and compared to microRNAs predicted by the different algorithms and microR-NAs with targeting predicted to be affected by miRSNPs, to determine overlap between predicted and functionally relevant microRNAs.

INVESTIGATION OF GENETIC VARIANTS WITHIN 3 -UTR AND ITS POTENTIAL EFFECTS ON microRNA TARGET SITES
After prediction of likely microRNAs targeting genes coding for DMEs, the top six genes (with the longest 3 -UTR, most microRNA targets as well genes of interest in our research group) were chosen for sequencing of the 3 -UTR among 30 Bantuspeaking South Africans, and these genes included CYP1A2, CYP2B6, CYP2D6, CYP3A4, NR1I2, and UGT2B7. The assumption for selection of the six genes was based on the prediction that a gene with a longer 3 -UTR is more likely to be regulated by a larger number of microRNAs.

STUDY PARTICIPANTS
Participants were South African HIV/AIDS patients receiving efavirenz-based treatment for at least 6 months and recruited to participate in pharmacogenomics research. All subjects were of Bantu origin from Gauteng Province, South Africa. Participants gave information on their ethnicity, age, health status (including self-reported adherence to treatment or pill counts), dietary, and smoking habits. Ethics and study approval (HREC REF 103/2009) was given by the University of Cape Town, Faculty of Health Sciences Research Ethics Committee. Written informed consent was obtained from participants as part of a study focussing on pharmacogenetics of HIV therapy. Patients were mostly female (90%) and had an average age of 42 years. The research was performed in accordance with guidelines of the Helsinki Declaration of 2008.
PCR product was cleaned using FastAP (Fermentas Life Sciences, Burlington, Canada) and ExoI (Fermentas Life Sciences, Burlington, Canada). The reaction conditions were as reported previously (Swart et al., 2013). The cleaned PCR product was sequenced on a "GeneAmp® PCR System 9700 version 3.08 (Applied Biosystems, Carlsbad, CA, USA) and capillary electrophoresis was performed on an ABI3130xl Genetic Analyzer (Applied Biosystems, Carlsbad, CA, USA). Analysis of the sequences was performed using DNAstar Lasergene Sequence Alignment Editor V.10 (DNASTAR, Inc., Madison, WI, USA).

BIOINFORMATICS PREDICTION OF POTENTIAL EFFECTS OF SNPs LOCATED IN THE 3 -UTR
The PolymiRTS database 3.0 (Bao et al., 2007;Ziebarth et al., 2012;Bhattacharya et al., 2014) (http://compbio.uthsc.edu/ miRSNP/) was used to predict if SNPs located in the 3 -UTR of genes would affect mRNA-microRNA interaction. A search in the database was performed using the GeneID obtained from the NCBI database (http://www.ncbi.nlm.nih.gov/) and default settings. Pre-computed context+ scores were included in a recent update of TargetScan as a measure of predicted efficacy of microRNA targeting and down regulation of the target mRNA (log2 fold change in mRNA abundance after microRNA transfection assays) (Grimson et al., 2007) by summing up contributions made by individual sites using information on the site-type contribution, 3 -pairing contribution, local AU contribution, position contribution, target-site abundance contribution and seed-pairing stability contribution. The PolymiRTS database use the context+ scores from TargetScan to calculate the difference in context+ scores between the reference and derived alleles for each SNP. Differences in context+ scores caused by a SNP in the microRNA target site have been included in this study and a more negative difference in context+ scores indicates an increased likelihood that the target sites are either absent or present due to the variant allele, aiding in prioritizing miRSNPs with potential effect instead of showing miRSNPs that significantly alter microRNA binding (Bhattacharya et al., 2014).
To assess the potential effect of novel SNPs, the mrSNP software was used by selecting assembly hg19 of the human genome, using default cut-offs and inserting the chromosome position and two alleles. The prediction method applied is adapted from Diana-microT (Maragkakis et al., 2009;Deveci et al., 2014) (http://mrsnp.osu.edu/). Prediction is based on the ratio of binding energy and maximum binding energy of a microRNA (maximum MFE in Supplementary Table S3). The identified SNPs need to be further characterized in terms of their differential tissue expression as well as strength in achieving the predicted effects.

COMPARISON OF 3 -UTR GENETIC VARIATION IN DIFFERENT POPULATIONS
Minor allele frequencies (MAF) for SNPs in the 3 -UTR of the 6 genes sequenced among the 30 black South Africans, were compared to those of other world population groups, obtained from the dbSNP database (http://www.ncbi.nlm.nih. gov/SNP/), the HapMap project (http://hapmap.ncbi.nlm.nih. gov/), and the 1000 genomes project (http://www.1000genomes. org/). Statistical analyses were performed using the Graphpad Prism statistical program (Version 5, GraphPad Software Inc., San Diego, CA). Pearson's χ 2 -test and Fisher's exact test were used to examine differences between the population groups with regards to distribution of minor alleles. For all statistical tests significance was defined as P < 0.05. The novel SNPs identified in the 3 -UTR of the six genes could not be compared to other world population groups as they have not been reported before.

GENETIC VARIANTS IDENTIFIED BY SEQUENCING OF THE 3 -UTR OF GENES OF INTEREST
Based on the total microRNAs predicted to target the mRNA, the top four genes CYP1A2, CYP2B6, CYP3A4, and NR1I2 were selected for further analysis. CYP2D6 and UGT2B7 were included as they form part of our ongoing pharmacogenomics studies. In total six genes had their 3 -UTR sequenced among 30 individuals. A total of 52 genetic variants were identified, including 17 novel and 35 previously reported SNPs. Novel SNPs identified, despite comprehensive re-sequencing of various population groups, are potentially rare and population specific or this may be an indication of the exclusion of populations from this region in the re-sequencing projects. CYP2B6 had the most genetic variants in the 3 -UTR accounting for 26 of the 52. No SNPs were identified within the 75bp 3 -UTR of CYP2D6 ( Table 3). Forty of the 52 genetic variants were predicted to potentially affect regulation of 212 microRNAs by creating or abolishing microRNA target sites depending on which of the variants is the ancestral variant (Supplementary Table S2). Based on predictions by mrSNP, 15 of the 17 novel SNPs were located within microRNA target sites and potentially result in the creation of microRNA target sites (Supplementary Table S3). Comparison between the number of microRNAs potentially affected by miRSNPs and microR-NAs expressed in liver tissue, showed an overlap of about 3% (10 microRNAs both expressed and predicted to potentially be affected by miRSNPs/363 expressed microRNAs).

COMPARISON OF MINOR ALLELE FREQUENCIES BETWEEN DIFFERENT POPULATIONS
Of the 52 variants identified in sequencing the 3 -UTR of selected genes among South African black participants, 18 SNPs which also had information for other population groups (i.e., through the HapMap and 1000 genomes projects) were compared to other African groups, African-Americans, Asian, and European population groups ( Table 4). Statistically significant differences were observed when the South African group was compared to Asian, Caucasian and other Africa groups. For example, NR1I2 rs1054190T allele occurring at a frequency of 0.18 in the South African group, was absent among Asians, significantly lower among another African group the Yoruba (0.02), yet comparable to that reported among Caucasians (0.15). For example, the variant, rs1054190C in NR1I2 was predicted to result in the presence of a binding site for the microRNA miR-1250-5p, while the variant rs1054191G was predicted to result in the absence of a recognition site for miR-371b-3p, miR-4258, and miR-4707-3p. A second SNP in NR1I2, rs3732359A allele showed no significant differences when the South African group (0.56) was compared to Asians (0.47), but showed statistically significant differences when compared to the Caucasians (0.84) and Yoruba (0.24). As expected, differences in allele frequencies of miRSNPs were observed when comparing the South African Bantu population group to both Caucasian and Asian populations as highlighted by the CYP2B6 rs28969420T allele (20, 4, 3%, respectively).

DISCUSSION microRNAs PREDICTED TO TARGET GENES CODING FOR DRUG METABOLIZING ENZYMES
Pharmacogenomics studies have focussed largely on the effects of genetic variation in genes coding for DMEs (Ingelman-Sundberg et al., 2007) and very few have investigated the role of variation at the level of microRNA target sites. MicroRNAs affect global gene expression, thus, including miRSNPs in pharmacogenomic tests, could potentially explain some of the geneticsassociated differences in drug response (Mishra and Bertino, 2009). Computational prediction analysis carried out in this study has highlighted microRNAs that regulate some of the genes with pharmacogenomics relevance, further confirming observations by Ramamoorthy and Skaar (2011) with respect to CYP1A2, CYP2B6, and CYP3A4.
At least 10% more microRNAs were predicted to target DMEs, for example the genes CYP1A2, CYP2B6, CYP2D6, CYP3A4, and CYP3A5 were reported to be targeted by 436,515,196,429, and 176 potential microRNAs using the 15 prediction algorithms, compared to 386, 416, 40, 333, and 32 using six algorithms as reported by Ramamoorthy and Skaar (2011). In addition, the number of microRNAs predicted using TargetScan v.6.2 in the current study were more than double those reported by Ramamoorthy, using an earlier version (Ramamoorthy and Skaar, 2011). The higher number of microRNAs predicted, is    potentially a result of using newer versions of the algorithms compared to those used by Ramamoorthy and Skaar (2011). The use of more algorithms was justified by the low percentage (14-30%) overlap between microRNAs predicted by two or more algorithms. However, the question still remains as to what is the appropriate minimum number or combination of algorithms to pick most microRNAs. MicroRNA prediction using 7 of the 15 algorithms (miR2Gene-DIANA-microT v.3, miRanda-MirSVR, miRSystem, PACCMIT, PITA, TargetScan, and TargetSpy), showed on average 83% (range 43-98%) of all unique microRNAs, pointing toward the major contribution of these seven algorithms (especially TargetScan) compared to the remaining eight algorithms. The appropriate combination of algorithms will only be apparent after confirmatory functional validation of the predicted binding targets.
A few microRNAs have been experimentally shown to regulate expression of DMEs. For example, Pan et al. (2009) reported that CYP3A4 is regulated by miR-27b directly, and by miR-148a indirectly through the regulation of PXR (Takagi et al., 2008). This observation is confirmed by our prediction analysis as five algorithms predicted miR-27b to target CYP3A4, while four algorithms predicted miR-148a to target NR1I2 (coding for PXR). Two microRNAs namely; miR-590 and miR-27a have been associated with either targeting CYP2B6 (miR-590) or CYP3A4 (miR-27a) (Ramamoorthy et al., 2013). The finding of CYP2B6, CYP3A4, and UGT1A1 being targets of miR-590, miR-27a, miR-491-3p, respectively (Dluzen et al., 2014), is supported by our prediction analysis. However, our observations on SULT1A1 mRNA are contradictory to those of Yu et al. (2010), who reported targeting by miR-631, yet none of the 15 computational algorithms picked it. MiR-26b-5p is reportedly associated with CYP2D6 expression (Gennarino et al., 2009) and miR-335-5p with CYP3A5, but none of the two microRNAs was picked by the 15 algorithms used in the current prediction analysis. It appears that the prediction algorithms might have inherent weaknesses, for example, GSTP1 has been experimentally validated to be affected by miR-513a-3p, miR-133a-3p, miR-26b-5p, and miR-92a-3p, but only miR-133a-3p was picked by the use of 15 algorithms. It is important to note that computational prediction algorithms are thought to predict microRNA-mRNA targeting with a fraction of false positives estimated at 31% and it is, thus, of importance to prioritize microRNAs for further studies (Lewis et al., 2003). The overlap between microRNAs predicted by the 15 algorithms and microRNAs potentially affected by miRSNPs with microRNAs expressed in liver tissue is very low. Prediction algorithms use different criteria to rank microRNAs that potentially target mRNA of a gene of interest, and are likely to be more lenient in the prediction as it is mostly based on sequence complementarity, target site accessibility, binding energy and conservation of target sites. Also on the other hand, expression is dependent on many other factors including export and tissues involved. Further comprehensive microRNA expression profiling in liver as well as extrahepatic tissues would contribute toward identifying microR-NAs that are potentially involved in regulation of DMEs. There is also a strong need for experimental validation of predicted microRNA targets.

GENETIC VARIANTS IDENTIFIED AFTER SEQUENCING THE 3 -UTR AND COMPARISON OF MINOR ALLELE FREQUENCIES IN THE SOUTH AFRICAN GROUP TO OTHER WORLD POPULATION GROUPS
A total of 52 SNPs were identified within the 3 -UTR of CYP1A2, CYP2B6, CYP3A4, NR1I2, and UGT2B7 including 17 novel SNPs, which is an indication of the extensive degree of genetic diversity characteristic to African and specifically South African populations (Matimba et al., 2009;Warnich et al., 2011;Dandara et al., 2014). SNPs (miRSNPs) identified in the South African group and already available in the PolymiRTS database, were predicted to alter the microRNA targeting of more than 200 microRNAs (Supplementary Table S2). As expected, differences in allele frequencies of miRSNPs were observed when comparing the South African Bantu population group to both Caucasian and Asian populations as highlighted by the CYP2B6 rs28969420T  N rs11636419 rs45564134 rs17861162 rs1038376 rs1042389 rs707265 rs3181842 rs7246465 rs28969420 rs10511395 rs1054190 rs1054191 rs3732359 rs3732360 rs3814057 rs3814058 rs6600893   allele (20% compared to 4 and 3%, respectively) and the NR1I2 rs3732359A-allele (59% compared to 84 and 47%, respectively). Differences in the frequency of miRSNPs among population groups as highlighted by the NR1I2 rs37372359A allele, which results in the presence of a target site for several microRNAs including miR-362-5p, miR-500b-5p, and miR-501-5p, could translate into population differences in drug disposition and response. These qualitative and quantitative differences in allele frequencies of SNPs further support previous studies arguing against directly inferring effects of therapeutic drugs from one population to another (Swart et al., 2012a,b,c;Dandara et al., 2014). The 3 -UTR variation identified in the current study contributes to further genetic characterizing of South Africans. However, the observed variation should be functionally validated in order to deduce its contribution to the observed variability in drug response.

THERAPEUTIC POTENTIAL OF microRNAs
Response to therapeutic treatment is a complex phenotype determined by genetic variation in DMEs, nuclear receptors, drug transporters, miRSNPs, and pharmacoepigenetics . MicroRNAs have the potential to be used as therapeutic targets through their ability to regulate expression of pharmacogenomically relevant genes (Singh et al., 2011;Van Rooij et al., 2012). Creating a microRNA recognition site leads to down regulation of mRNA expression of specific genes, while disruption of a site causes loss of down regulation of mRNA expression (Mishra and Bertino, 2009).Caution should however be exercised when developing microRNA-based therapeutics because of unpredictable off-target effects due to microRNAs having broad mRNA targets (Singh et al., 2011). A study by Mishra et al. (2007) demonstrated the effect of a SNP near the miR-24 binding site in the 3 -UTR of the dihydrofolate reductase (DHFR) gene, that interferes with miR-24 binding and resulting in DHFR overexpression and methotrexate resistance. Resistance to the cancer therapeutics cisplatin and 5-fluorouracil has been linked to induced miR-148a expression and targeting of the gene KIT (Hummel et al., 2011). MiR-148a has been shown by Takagi et al. (2008) to regulate PXR and affect downstream expression of CYP3A4, resulting in reduced drug metabolism. Presence of the NR1I2 rs1054190T allele results in the absence of a miR-148a target site, potentially leading to increased CYP3A4 expression. It is important that future studies investigate the functional significance of microRNAs and miRSNPs. The recently developed Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) technology could be used to create knockout cell lines for specific microRNAs to validate their role in targeting of genes with pharmacogenomics relevance (Dandara et al., 2014).

CONCLUSION
The integration of microRNA variation and epigenetics into pharmacogenomics studies may provide additional insights into the mechanisms of drug response and advance individualized therapy (Welsh et al., 2009;Zhang and Dolan, 2010). Use of in silico methods should be complimented by functional validation to confirm the predicted effects. It is therefore possible that some of the observed non-reproducibility of pharmacogenomics studies could partially be due to the unexplained variation in the 3 -UTR, thus, functional validation of microRNA targets should be another area of research to advance pharmacogenomics.

AUTHOR CONTRIBUTIONS
Marelize Swart carried out the in silico prediction and sequencing analysis and drafted the manuscript. Collet Dandara conceived of the study, designed, coordinated the study, and also assisted with statistical data analysis, helped to draft the manuscript and approved the final version. Both authors read and approved the final manuscript.