Genetic associations with 25-hydroxyvitamin D deficiency in HIV-1-infected youth: fine-mapping for the GC/DBP gene that encodes the vitamin D-binding protein

Serum 25-hydroxyvitamin D [25(OH)D] is often deficient (<12 ng/ml) or insufficient (<20 ng/ml) in youth living with human immunodeficiency virus type 1 infection (YLH). Based on evidence from multiple genome-wide association studies, we hypothesized that genetic factors associated with 25(OH)D deficiency should be readily detectable in YLH even when controlling for other known factors, including use of the antiretroviral drug efavirenz (EFV). Genotyping by bi-directional sequencing targeted 15 single nucleotide polymorphisms (SNPs) at the GC/DBP locus, with a focus on coding and regulatory variants, as well as those repeatedly reported in the literature. Three intronic SNPs (rs222016, rs222020, and rs222029) in a conserved haplotype block had unequivocal association signals (false discovery rate ≤ 0.006). In particular, the minor allele G for rs222020 was highly unfavorable among 192 YLH (99 African–Americans and 93 others), as gauged by relatively low likelihood for 25(OH)D sufficiency at enrollment (odds ratio = 0.31, p = 9.0 × 10-4). In a reduced multivariable model, race, season, latitude, body mass index, exposure to EFV, and rs222020-G were independent factors that collectively accounted for 38% of variance in the log10-transformed 25(OH)D concentration (p < 0.0001). Interaction terms were evident for rs222020-G × season (p < 0.001), latitude × season (especially fall and winter; p < 0.01), and race × EFV use (p = 0.024). Overall, variance in serum 25(OH)D is substantially attributable to multiple factors, but the exact contribution of genetic and non-genetic factors can be obscured by partial overlaps and frequent interactions.

Serum 25-hydroxyvitamin D [25(OH)D] is often deficient (<12 ng/ml) or insufficient (<20 ng/ml) in youth living with human immunodeficiency virus type 1 infection (YLH). Based on evidence from multiple genome-wide association studies, we hypothesized that genetic factors associated with 25(OH)D deficiency should be readily detectable in YLH even when controlling for other known factors, including use of the antiretroviral drug efavirenz (EFV). Genotyping by bi-directional sequencing targeted 15 single nucleotide polymorphisms (SNPs) at the GC/DBP locus, with a focus on coding and regulatory variants, as well as those repeatedly reported in the literature. Three intronic SNPs (rs222016, rs222020, and rs222029) in a conserved haplotype block had unequivocal association signals (false discovery rate ≤ 0.006). In particular, the minor allele G for rs222020 was highly unfavorable among 192YLH (99 African-Americans and 93 others), as gauged by relatively low likelihood for 25(OH)D sufficiency at enrollment (odds ratio = 0.31, p = 9.0 × 10 −4 ). In a reduced multivariable model, race, season, latitude, body mass index, exposure to EFV, and rs222020-G were independent factors that collectively accounted for 38% of variance in the log 10 -transformed 25(OH)D concentration (p < 0.0001). Interaction terms were evident for rs222020-G × season (p < 0.001), latitude × season (especially fall and winter; p < 0.01), and race × EFV use (p = 0.024). Overall, variance in serum 25(OH)D is substantially attributable to multiple factors, but the exact contribution of genetic and non-genetic factors can be obscured by partial overlaps and frequent interactions.
Suboptimal serum 25(OH)D is seen in 54% of youth living with human immunodeficiency virus type 1 (HIV) infection (YLH; Havens et al., 2012a,b). The problem with 25(OH)D insufficiency (<20 ng/ml) or deficiency (<12 ng/ml) can be exacerbated by long-term use of antiretroviral drugs, especially efavirenz (EFV) that is known to interfere with 25(OH)D metabolism (Childs et al., 2012;Panayiotopoulos et al., 2013). Longitudinal data from YLH with and without vitamin D supplementation can provide an important platform for dissecting multifactorial influences on the vitamin D pathway, including pre-vitamin D transport mediated by the vitamin D-binding protein (VDBP; Schlingmann et al., 2011).

www.frontiersin.org
The GC/DBP gene 1 encoding VDBP is mapped to chromosome 4q12-q13, with hundreds of known single nucleotide polymorphisms (SNPs). When 25(OH)D concentration is analyzed as a trait for vitamin D status, both genome-wide association studies  and candidate gene approaches (Bu et al., 2010) have consistently pointed to the potential importance of GC SNP variants. In an attempt to confirm the GC genotypes associated with 25(OH)D deficiency, our work here provides further evidence to justify fine-mapping for the GC locus in YLH populations.

STUDY POPULATION
YLH (18-25 years old) represented two self-identified racial groups (African-American (AAs) and others) participating in a randomized, double-blind, placebo-controlled, multicenter trial (NCT00490412 2 ) within the Adolescent Medicine Trials Network for HIV/AIDS Interventions (ATN; Havens et al., 2012a,b). The research protocols, including procedures for written informed consent, were approved by the Institutional Review Board (IRB) at 16 ATN clinics and 19 International Maternal Pediatric Adolescent AIDS Clinical Trials (IMPAACT) sites in the United States and Puerto Rico. Ancillary studies summarized here were further approved by the IRB at University of Alabama at Birmingham (UAB).

INTERVENTION AND OUTCOME MEASURES
All participants were treated with ≥3 antiretrovirals (ARVs) for ≥90 days and with plasma HIV-1 RNA (viral load) <5,000 copies/mL within 60 days. After screening, subjects free of renal disease, pregnancy, and medicines that may affect bone mineral density, interfere with vitamin D absorption, or cause renal toxicity were enrolled into two relatively equal groups based on their ARV regimens (with or without tenofovir disoproxil fumarate, TDF). Within each group, eligible participants were randomized to receive vitamin D supplementation or placebo every 4 weeks for three doses. Serum 25(OH)D concentration was measured at baseline (week 0) and at study week 12 as the primary outcomes for analyses here.

CANDIDATE LOCI AND GENOTYPING
Earlier reports on phenotypes related to vitamin D (Bu et al., 2010;Wang et al., 2010;Levin et al., 2012), including bone mineral density and fracture (Cho et al., 2009;Richards et al., 2009;Rivadeneira et al., 2009), have revealed various loci with modest associations (as judged by effect sizes instead of p values). For this study, SNP selection focused on the most promising GC/DBP locus that encodes vitamin D-binding protein. SNPs reported repeatedly in the literature were considered first, followed by flanking SNPs (to facilitate analysis of linkage disequilibrium, LD) and SNPs found in coding and regulatory sequences. Using DNA extracted from Isohelix buccal swabs (Cell Projects Ltd., Kent, UK), all SNP genotypes were resolved by bi-directional DNA sequencing using the gold-standard Sanger chemistry (Polymorphic DNA Technologies, Inc., Alameda, CA, USA). For SNPs 1 http://www.ncbi.nlm.nih.gov/gene/2638 2 http://clinicaltrials.gov/ct2/show/NCT00490412 with minor allele frequencies (MAF) exceeding 0.05, the pairwise LD patterns were tested using the HaploView program (Barrett et al., 2005).

STATISTICAL ANALYSES
The study population was first grouped by race (AAs vs. others) for comparison of baseline (week 0) characteristics, with Wilcoxon test, Student t-test, and Chi-squared test applied to appropriate measurements. Subsequent analyses focused on three specific aims. Aim 1 was to demonstrate that serum 25(OH)D concentration is a relatively stable phenotype in YLH. Measurements at baseline and at week 12 were compared in participants in the placebo group (who did not receive vitamin D supplementation), using Spearman method (rho) and Pearson's correlation coefficient (r; before and after log 10 -transformation/ "normalization," respectively). Aim 2 was to identify individual SNP genotypes associated with three clinically relevant 25(OH)D categories at baseline: <12 ng/ml (deficiency), ≥12-<20 ng/ml (insufficiency) and >20 ng/ml (sufficiency), using the ordinal logistic regression models adjusted for non-genetic factors (age, sex, and race). All relationships with statistical significance (p < 0.05) and low false discovery rate (FDR; q < 0.05) were included in multivariable models. Aim 3 was to quantify multifactorial influences on serum 25(OH)D, when log 10 -transformed serum 25(OH)D was analyzed as a continuous outcome in generalized linear models (GLMs). The summary statistics focused on relative effect sizes (regression beta and R 2 values) attributable to genetic factors (SNP genotypes), demographic features (age, sex, and race), body mass index (BMI), environmental factors (season and latitude), and exposure to EFV. Similar approaches have been applied earlier to analyses of quantitative traits related to HIV infection (Yue et al., 2013). Whenever possible, secondary (exploratory) models were evaluated for AAs and other races separately.

STABILITY OF SERUM 25(OH)D CONCENTRATION OVER A 12-WEEK PERIOD
In a subset of subjects (46 AAs and 42 others) who were randomized to the placebo group, log 10 -transformed 25(OH)D concentrations were moderately stable between the two visits regardless of race (Figure 1), with Pearson r values ranging from 0.73 in AAs (p < 0.0001) to 0.77 in others (p < 0.0001; p > 0.50 between the two r values). Statistical adjustments for other factors Frontiers in Genetics | Pharmacogenetics and Pharmacogenomics  Table 2). All but one SNP (rs114282916) showed differential distribution between the two racial groups (AAs and others). Most SNPs had weak pairwise LD in both racial groups, but three intronic SNPs (rs222016, rs222020, and rs222029) were within a conserved haplotype block (Figure 2). Additional SNPs dismissed based on rarity of minor alleles (singleton to MAF < 0.05) included rs9016, rs3737553, rs80324156, rs114737000, rs6843222, and 10 polymorphisms not captured in the dbSNP database (last accessed in April 2013).
In univariable models testing three clinically relevant 25(OH)D levels at baseline: <12 ng/ml (deficiency), ≥12-<20 ng/ml www.frontiersin.org Measurements at baseline (week 0) and at study week 12 are shown for 46 African-Americans (AAs) and 42 non-AA subjects (others). The predicted slope and its 95% confidence intervals in each subgroup are represented by sold and dotted lines, respectively. Six subjects (four AAs and two others) with missing data at week 12 are excluded.
(insufficiency), and >20 ng/ml (sufficiency), seven SNPs showed promising trend (p < 0.05 and q ≤ 0.10) for associations in dominant models, with proportional odds ratios (pOR) ranging from 0.17 (rs222016 and rs222020, unfavorable) to 3.41 (rs7041, favorable) and q values from 0.002 (rs222016 and rs222020) to 0.10 (rs35096193, favorable; Table 3). Among the top four SNPs with q < 0.05, rs7041 has known associations with vitamin D status and related outcomes (Fang et al., 2009;Wood et al., 2011). Three other SNPs in strong LD (Figure 2) could be represented by rs222020, which has been associated with vitamin D status and related outcomes as well (Bu et al., 2010;Xu et al., 2010;Jung et al., 2011;Zhang et al., 2012). Further genotyping in the rest of the study cohort focused on rs7041 (a coding SNP) and rs222020 (an intronic SNP).

UNIVARIABLE AND MULTIVARIABLE MODELS FOR TWO GC SNPs (rs222020 and rs7041) IN THE ENTIRE COHORT
Both rs7401-G and rs222020-G were associated with baseline serum 25(OH)D categories in univariable models (pOR = 2.32 and 0.31, p = 0.008 and 9.0 × 10 −4 , respectively). After statistical adjustment for demographic features (sex, age, and race) and exposure to EFV, rs7041 allele G was no longer a predictor (adjusted pOR = 1.08 and p = 0.827), while rs222020 allele G remained predictive of serum 25(OH)D categories (pOR = 0.45 and p = 0.014). However, further adjustments for BMI and environmental factors (latitude and enrollment season) diminished the association of rs222020-G (adjusted p = 0.069; Table 4). The strong independent predictors included race (p < 0.0001), enrollment season (p < 0.0001), latitude of residence (p < 0.001), BMI (p = 0.002), and use of EFV (p = 0.006). Summer had the most dramatic impact on seasonal fluctuation in serum 25(OH)D (pOR = 8.23, p < 0.0001), while fall and spring were also quite favorable against winter (pOR = 4.90 and 3.55, respectively). Similar results are obtained from analyses of SNP genotypes in AAs and other races (excluding SNPs rs112205706 and rs115617005). By default, absolute LD (r 2 = 1.00) is indicated by a dark square.

VARIANCE IN log 10 -TRANSFORMED SERUM 25(OH)D CONCENTRATION EXPLAINED BY GC GENOTYPES AND OTHER PERTINENT FACTORS
At least six factors independently contributed to the variability in log 10 -transformed serum 25(OH)D concentration (Table 5).
Ranking of the six individual (independent) predictors of log 10 -transformed serum 25(OH)D concentration was often complicated by issues with partial overlap. For example, variance explained by the rs222020-G allele varied substantially (from 2.9 to 10.7%) according to the order in which three partially overlapping factors (race, use of EFV, and rs222020-G) were added to the model. In addition, the variance attributable to rs222020-G differed somewhat between AAs (4.5%, adjusted p = 0.029) and others (3.3%, adjusted p = 0.080) when conditioned on the effect of EFV (Table 7). In contrast, after accounting for the effect of rs222020-G, the impact of EFV use on 25(OH)D was only apparent in AAs (adjusted R 2 = 8.8%, p = 0.003) and not in others (adjusted R 2 = 0.1%, p = 0.812; Table 7).

INTERACTION TERMS
Multivariable models further revealed several pairwise interactions, i.e., rs222020-G × season (p < 0.001), latitude × season (especially fall and winter; p < 0.01), and race × EFV use (p = 0.024). Seasonality of the rs222020-G effect on 25(OH)D was apparently restricted to spring, as genotype-specific differences were not detected in other seasons (Figure 3).

Figure 2 ). c False discovery rate (FDR) is based on p values from analyses of all 15 SNPs
shown in Table 2 .

OBSERVATION OF DIFFERENCES BETWEEN RACIAL GROUPS
At least two racial differences were noted in separate analyses of AAs (n = 99) and other races (n = 93). First, negative association of rs222020-G with 25(OH)D was restricted to AAs (R 2 = 4.5%, adjusted p = 0.029) and not other subjects (3.3%, adjusted p = 0.080) when conditioned on the effect of EFV (Table 7). Second, the deleterious impact of EFV use on 25(OH)D was seen in AAs (adjusted R 2 = 8.8%, p = 0.003) and not in other subjects (adjusted R 2 = 0.1%, p = 0.812) after accounting for the contribution of rs222020-G.

DISCUSSION
Despite a modest sample size, our analyses here reveal five major findings concerning vitamin D metabolism in youth living with HIV-1 infection. First, serum 25(OH)D concentration is relatively stable over a 12-week period regardless of race. Second, at least one GC SNP variant, the rs222020-G allele, is independently predictive of suboptimal serum 25(OH)D, especially during the spring  season. Third, use of EFV is associated with low serum 25(OH)D in the combined cohort based on univariable models, but the EFV effect is restricted to AAs when the rs222020-G allele is added to multivariable models. Fourth, the exact contribution of genetic and non-genetic factors (latitude, season, BMI, and race) can be obscured by partial overlaps and frequent interactions. Fifth, statistical models are not uniformly applicable to racial groups. Most of these observations are novel and highly relevant to public health. As a main focus of this study, the GC gene 3 consists of 13 exons and has hundreds of known SNPs, but neither genomewide association studies nor candidate gene approaches reported in the literature have covered this locus sufficiently enough to allow fine-mapping. To avoid heavy penalty for multiple testing of randomly selected GC SNPs, we chose to examine coding and regulatory (promoter) sequences at both ends of several SNPs with relatively consistent associations. For example, the minor allele G (or C in the complementary strand) for rs222020 has been highlighted recently in the context of compression strength index of the femoral neck (Xu et al., 2010), peripheral arthritis in ankylosing spondylitis (Jung et al., 2011), and plasma 25(OH)D concentration (Zhang et al., 2012). By analyzing rs222020 and multiple neighboring SNPs, it was evident that rs222020-G is able to tag several intronic variants within a single haplotype block. However, rs222020-G did not seem to tag other functionally relevant variants. Mechanisms underlying its independent association with suboptimal 25(OH)D concentration remain elusive, and search for further clues may need to consider less obvious pathways (DNA-DNA and DNA-protein interactions) being actively pursued by the ENCODE project 3 http://www.ncbi.nlm.nih.gov/gene/2638

FIGURE 3 | Season-dependent association of GC genotypes with serum 25-hydroxyvitamin D [25(OH)D] concentration.
The log 10 -transformed 25(OH)D (ng/ml) values in 192 HIV-1-infected youth are plotted according to four enrollment seasons and three genotypes defined by the GC SNP, rs222020 (major allele A and minor allele G). For each stratum, the horizontal bars connected by a vertical line correspond to the mean ± standard deviation (SD). The nominal p values are based on Student's t -test, assuming a dominant effect of allele G (see Table 6 for full analyses of interactions between rs222020-G and enrollment season). (Dunham et al., 2012;Harrow et al., 2012;Rosenbloom et al., 2012;Wang et al., 2013).
Two other prominent GC SNPs, rs7041, and rs4588, do cause amino acid substitutions at codon 416 (D/E) and codon 420 (T/K), respectively, in exon 11. Three haplotypes involving these non-synonymous SNPs correspond to different protein isoforms known as GC1F, GC1S, and GC2. Earlier studies have demonstrated the potential importance of rs7041 variants alone (Wood et al., 2011) or in conjunction with rs4588 variants (Fang et al., 2009). Although rs7041-G appeared to be highly favorable in our initial screening (univariable models only), it was subsequently dismissed by multivariable models in which race and other prominent factors were treated as covariates. The distribution of rs7041-G differs between AAs (low) and other races (high; Table 2), so definitive analyses may require a third population with intermediate allele frequency. Nonetheless, rs7041-G may serve as a useful biomarker for disparity in serum 25(OH)D concentration, especially since its biological relevance is so obvious. Additional GC SNPs of interest, including rs2070741 (Wood et al., 2011) and rs2282679 (Vimaleswaran et al., 2013), are not part of our study design. Judging by their reported effect sizes, it is unlikely that inclusion of these SNPs will alter our main conclusions. www.frontiersin.org