Distribution of a novel CYP2C haplotype in Native American populations

The CYP2C19 gene, located in the CYP2C cluster, encodes the major drug metabolism enzyme CYP2C19. This gene is highly polymorphic and no-function (CYP2C19*2 and CYP2C19*3), reduced function (CYP2C19*9) and increased function (CYP2C19*17) star alleles (haplotypes) are commonly used to predict CYP2C19 metabolic phenotypes. CYP2C19*17 and the genotype-predicted rapid (RM) and ultrarapid (UM) CYP2C19 metabolic phenotypes are absent or rare in several Native American populations. However, discordance between genotype-predicted and pharmacokinetically determined CYP2C19 phenotypes in Native American cohorts have been reported. Recently, a haplotype defined by rs2860840T and rs11188059G alleles in the CYP2C cluster has been shown to encode increased rate of metabolism of the CYP2C19 substrate escitalopram, to a similar extent as CYP2C19*17. We investigated the distribution of the CYP2C:TG haplotype and explored its potential impact on CYP2C19 metabolic activity in Native American populations. The study cohorts included individuals from the One Thousand Genomes Project AMR superpopulation (1 KG_AMR), the Human Genome Diversity Project (HGDP), and from indigenous populations living in Brazil (Kaingang and Guarani). The frequency range of the CYP2C:TG haplotype in the study cohorts, 0.469 to 0.598, is considerably higher than in all 1 KG superpopulations (range: 0.014—to 0.340). We suggest that the high frequency of the CYP2C:TG haplotype might contribute to the reported discordance between CYP2C19-predicted and pharmacokinetically verified CYP2C19 metabolic phenotypes in Native American cohorts. However, functional studies involving genotypic correlations with pharmacokinetic parameters are warranted to ascertain the importance of the CYP2C:TG haplotype.


Introduction
The human CYP2C subfamily comprises four members, namely, CYP2C18, CYP2C19, CYP2C9 and CYP2C8, with encoding genes located in tandem in chromosome 10q23-24. CYP2C19 provides the major pathway for biotransformation of a variety of drugs from different therapeutic classes, including antidepressants, both tricyclic (e.g., imipramine) and selective serotonin reuptake inhibitors (escitalopram), antifungal (voriconazole), antimalarial (proguanil) and antiplatelet (clopidogrel) drugs, and proton pump inhibitors (omeprazole). CYP2C19-mediated metabolism may lead to drug inactivation (e.g., omeprazole and voriconazole) as well as to active metabolites, which account for the clinical effects of pro-drugs such as proguanil and clopidogrel (Botton et al., 2021). The clinical relevance of the CYP2C19 pathway is reflected in the CPIC (Clinical Pharmacogenetics Implementation Consortium) guidelines: five of the 26 guidelines currently available have dosing recommendations based on CYP2C19 metabolic phenotypes, predicted from CYP2C19 genotypes (Hicks et al., 2015(Hicks et al., , 2017Moriyama et al., 2017;Lima et al., 2021;Lee et al., 2022).
The present study investigates the distribution of the CYP2C:TG haplotype and its potential impact on prediction of CYP2C19 metabolic phenotypes in Native American populations. Previous studies have shown that the CYP2C9*17 and, consequently, the genotype-assigned rapid (RM) and ultrarapid (UM) CYP2C19 metabolic phenotypes, are absent or rare in Native Americans (Vargens et al., 2012;Bonifaz-Peña et al., 2014;de Andrés et al., 2021de Andrés et al., , 2017Naranjo et al., 2018;Rodrigues Soares et al., 2020;de Andrés et al., 2021). However, discordance between genotype-predicted and pharmacokinetically determined CYP2C19 phenotypes in Native American cohorts have been reported, such that individuals genotyped as CYP2C19*1/*1 and assigned the normal metabolic (NM) phenotype showed greater CYP2C19 activity than UMs (de Andrés et al., 2017).

Study populations
Four cohorts were investigated, namely, 1 KG_NAT -a subcohort of the One Thousand Genomes Project Admixed American superpopulation (denoted 1 KG_AMR; Auton et al., 2015) -, HGDP Native Americans (Cavalli-Sforza, 2005), and Kaingang and Guarani living in Brazil (Petzl-Erler et al., 1993;Tsuneto et al., 2003). The 1 KG AMR comprises individuals from the South American countries Colombia (denoted CLM) and Peru (PEL), from Puerto Rico (PUR) as well as people of Mexican Ancestry (MXL) living in Los Angeles, United States of America. The 1 KG_ NAT comprised the 68 1 KG_AMR individuals (58 PEL and 10 MXL) with the highest proportions of Native ancestry: median 94.5%, IQR 85.7%-100% (Suarez-Kurtz et al., 2020). The HGDP cohort (n = 61) is formed by samples of Native American groups, from Brazil (Surui and Karitiana), Mexico (Maya and Pima) and Colombia. Kaingang (KRC) and Guarani (GRC and GKW) are represented by adults enrolled in a population genetics study of Brazilian Amerindians, approved by the Brazilian National Ethics Committee (CONEP123/98). Kaingang and Guarani, the two major Amerindian tribes of southern Brazil, are culturally quite distinct from each other, the Guarani belonging to the Tupi linguistic group, while Kaingang are Gê-speaking. The KRC and GRC live in different villages within the Rio das Cobras reservation (25o18′S, 52o32′W), whereas GKW are from the Amambai and Limão Verde reservations (23o06′S, 55o12′W and 23o12′S, 55o06′W, respectively).
Individual haplotypes and diplotypes were inferred using the HaploStats software implemented on the R platform. This software attributes a posterior probability value for the diplotype configuration for each individual on the basis of estimated haplotype frequencies. The minimal posterior probability value for inclusion of an individual in these analyses was set at 0.95. We adopted the labelling used by Bråten et al. (2021) to denote CYP2C haplotypes and diplotypes comprising the CYP2C18 rs2860840 and rs11188059 SNPs.

Statistical analyses
Deviation of genotype distribution from Hardy-Weinberg equilibrium was assessed by the goodness-of-fit χ 2 test. Chi square tests were applied to compare the distribution of CYP2C haplotypes and predicted CYP2C19 across cohorts. Significance level was set at p < 0.05.

Results and discussion
There were no significant deviations from Hardy-Weinberg equilibrium at the CYP2C18 and CYP2C19 loci interrogated. The two CYP2C18 SNPs were common in all study cohorts (Table 1): the minor allele frequency (MAF) of rs11188059G>A ranged from 0.107 (HGDP) to 0.481 (Kaingang), whereas the frequency of the variant rs2860840 T allele reached 0.963 in Kaingang, and ranged between 0.705-0.757 in the other cohorts. These results are consistent with data for North and South American Native populations in the Allele Frequency Database (https://alfred.med.yale.edu/). Three haplotypes comprising rs2860840C>T and rs11188059G>A (CYP2C haplotypes) were identified in the study cohorts (Table 1): CYP2C:TG was the most common, with frequencies ranging from 0.469 to 0.598, while CYP2C:CG and CYP2C:TA frequencies ranged between 0.038-0.295 and 0.107-0.471, respectively. The frequencies of these CYP2C haplotypes in the study cohorts differed markedly from the African (1 KG_AFR), European (1 KG_EUR), East Asian (1 KG_EAS) and South-Asian (1 KG_SAS) 1 KG superpopulations (Supplementary Table S1): CYP2C:TG and, to a lesser extent, CYP2C:TA were more common in the study cohorts than in the 1 KG superpopulations, whereas the opposite was observed for CYP2C:CG. The CYP2C:CA haplotype, absent or extremely rare in the 1 KG superpopulations was absent in the Native American cohorts of our study. Pairwise comparisons of the frequency of CYP2C:TG, TA and CG haplotypes in any study cohort versus any 1 KG superpopulation disclosed highly significant differences (chi square p < 0.0001).
The high frequency of CYP2C:TG in the study cohorts was of special interest to us in reference to the reported discordance observed in Native Americans, between CYP2C19 metabolic phenotypes predicted from CYP2C19 diplotypes versus phenotypes determined by pharmacokinetic measurements (de Andrés et al., 2017;Naranjo et al., 2018). For example, de Andrés et al. (2017) observed that several Mexican Amerindians genotyped as CYP2C19*1/*1 and assigned the normal metabolizer (NM) phenotype had higher CYP2C19 activity than genotypepredicted ultrarapid metabolizers (UMs), i.e., carriers of the CYP2C19*17/*17 diplotype. We hypothesized that such discrepancy might result from linkage of CYP2C19*1 with the CYP2C:TG haplotype, as reported in the pivotal study of Bråten et al. (2021).
To explore this hypothesis, we genotyped the CYP2C19 star alleles *2, *3, *4 and *17 which are used in the CPIC guidelines to predict CYP2C19 metabolic phenotypes. As shown in Table 2, CYP2C19*2 was absent in Kaingang and its MAF ranged between 0.057 and 0.109 in the other three cohorts. CYP2C19*3 and CYP2C19*9 were not detected in 1 KG_NAT and HGDP Native Americans and could not be interrogated in Kaingang and Guarani samples due to the limited amount of DNA available. CYP2C19*17 was absent in HGDP and Kaingang, and had MAF of 0.022 in 1 KG_ NAT and Guarani. These data are consistent with previous studies in other Native American groups (Vargens et al., 2012;Bonifaz-Peña et al., 2014;de Andrés et al., 2021de Andrés et al., , 2017Naranjo et al., 2018;Rodrigues Soares et al., 2020). For example, Rodrigues-Soares et al.
(2020) showed that CYP2C19*2 is absent in Guaymi from Costa Rica and Tzeltal from Mexico, CYP2C19*17 is absent in various Mayan and Uto-Aztecan groups from Mexico, as well as from Arawak and Quechuamara from Peru, while CYP2C19*3 is not detected in the vast majority of Native American populations, including a Guarani cohort previously studied by our group (Vargens et al., 2012).
Next, we applied the Haplo-Stats software to infer the individual haplotypes and diplotypes formed by the CYP2C19 star alleles and the CYP2C diplotypes (Table 3). Five haplotypes were identified, of which three had the wild-type CYP2C19*1 linked to one of CYP2C: CG (denoted haplotype *1CG), CYP2C:TA (*1 TA) or CYP2C:TG (*1 TG); the two other haplotypes were formed by CYP2C:CG linked to either CYP2C19*2 (*2CG) or CYP2C19*17 (*17CG). The *1 TG haplotype was the most frequent in all cohorts (range 0.470-0.598). While the *1CG and *1 TA haplotypes ranged in frequency between 0.037-0.238 and 0.107-0.481, respectively of notice, the CYP2C:TG haplotype was never linked to either rs4244285 A (CYP2C19*2) or rs12248560 T (CYP2C19*17) variant alleles, in concordance with the observation of Bråten et al. (2021), that CYP2C:TG is in "complete linkage disequilibrium with the c.991A>G (I331V; CYP2C19*1.002) variant, like the majority of CYP2C19*1-alleles". Also, following  Frontiers in Genetics frontiersin.org 04 these authors' approach, the CYP2C:CG and CYP2C:TA haplotypes were merged for statistical analyses of the distribution of CYP2C19:CYP2C diplotypes and assigned metabolic phenotypes (Table 3). Eight diplotypes were observed, leading to assignment of NM, IM, RM and UM phenotypes. For visual comparison, plots of the distribution of CYP2C19 metabolic phenotypes predicted according to the CYP2C19 (Table 2) or the CYP2C19-CYP2C diplotypes (Table 3) are shown in Figure 1. The two assignment procedures resulted in highly significant differences in phenotype distribution in all cohorts (p < 0.0001). The overall picture is the unveiling of UMs and large increases in frequency of RMs at the expense of NM, with no impact on IM frequency or on the absence of PMs when the CYP2C19-CYP2C diplotypes are used for phenotype assignment. This was observed in all cohorts, and most flagrant in Kaingang: the absence of the CYP2C19*2 or CYP2C19*17 alleles leads to assignment of the NM phenotype to all Kaingang individuals, according to CYP2C19 diplotypes. By contrast, when phenotype assignment is based on CYP2C19-CYP2C diplotypes, NMs represent only 24% of the cohort, while RMs and UMs account for 56% and 20%, respectively.
The fact that all study individuals with CYP2C19:CYP2Cpredicted RM or UM phenotypes carry the CYP2C19*1/*1 diplotype might possibly offer an explanation for the reported discordance, alluded above, between CYP2C19-predicted and pharmacokinetically-verified phenotypes in Native Americans (de Andrés et al., 2017;Naranjo et al., 2018). However, we are fully aware that functional studies involving genotypic correlations with pharmacokinetic parameters are warranted to validate this suggestion. Modulation of CYP2C19 activity by the CYP2C:TG genotype was first observed in relation to escitalopram disposition in a cohort of predominantly "White origin" (Bråten et al., 2021), and was subsequently associated with failure of omeprazole treatment of New Zealand European GERD (gastroesophageal reflux disease) patients (Kee et al., 2022). There is no comparative data in Native American populations. The mechanism whereby the CYP2C:TG haplotype may increase CYP2C19-dependent metabolism needs to be addressed further. Bråten et al. (2021) suggested tentatively that the rs2860840 T allele "has a functional role as increasing the enhancer function and CYP2C19 expression" whereas the rs11188059 A variant abolishes this effect, such that the CYP2C:TG-haplotype, but not the CYP2C:TA haplotype associates with increased CYP2C19 activity. There is prior evidence for long-range haplotypes across the CYP2C cluster, that may form functional units, notably one defined by rs12777823, an intergenic polymorphism reported to be strongly associated with requirement of reduced warfarin doses among African Americans and black Africans (Perera et al., 2013;Ndadza et al., 2019).
We acknowledge the low number of individuals of distinct groups in the HGDP and Guarani cohorts as a limitation of our study. Practical and ethical difficulties are commonly encountered in recruiting participants from Native American populations, such that in a recent overview of the distribution of CYP2C19 variants and predicted phenotypes among Native American groups, 9 out of the 19 studied cohorts had less than 50 individuals (Rodrigues-Soares et al., 2020). In addition, we caution that the present data should not be interpreted as representative of all extant Amerindian populations, in view of their high level of (pharmaco)genetic diversity (Gaspar et al., 2002;Suarez-Kurtz et al., 2019;Fernandes et al., 2022).

FIGURE 1
Distribution of predicted CYP2C19 metabolic phenotypes in the study cohorts. In each pair of columns, the left column shows phenotype prediction based on CYP2C19 diplotypes (Table 2) and the right column shows phenotype prediction based on CYP2C19-CYP2C diplotypes ( Table 3). The two assignment procedures resulted in highly significant differences in phenotype distribution in all cohorts (chi-square p < 0.0001). NM, normal metabolizer; RM, rapid metabolizer; UM, ultrarapid metabolizer; IM intermediate metabolizer.