Association of maternal CNVs in GSTT1/GSTT2 with smoking, preterm delivery, and low birth weight

Preterm delivery (PTD) is an adverse birth outcome associated with increased infant mortality and negative lifelong health consequences. PTD may be the result of interactions between genetics and maternal/fetal environmental factors including smoking exposure (SMK). A common deletion in the GSTT1 gene was previously reported to affect birth outcomes in smokers. In this study, we dissect the associations among SMK, birth outcomes, and copy number variations (CNVs) in the GSTT1/GSTT2 region. A preterm birth case-control dataset of 1937 mothers was part of the GENEVA preterm birth study, which included genome-wide genotyping used to identify CNVs. We examined the association of SMK with birth outcomes, detected CNVs within the GSTT1/GSTT2 region using PennCNV, and examined associations of the identified CNVs with preterm birth and with birth weight (BW) in full term birth controls, including interactions with SMK. Finally, we tested the association of CNVs in GSTT1/GSTT2 with SMK. We confirmed the association of smoking with low BW and PTD. We identified 2 CNVs in GSTT2 (GSTT2a and GSTT2b), 1 CNV in GSTTP1 and 2 CNVs in GSTT1 (GSTT1a and GSTT1b). The GSTT2a deletion was associated with reduced BW (−284 g, p = 2.50E-7) in smokers, and was more common in smokers [odds ratio(OR) = 1.30, p = 0.04]. We found that the size of the reported common deletion CNV in GSTT1 was larger than previously shown. The GSTTP1 and GSTT1b null genotypes were in high linkage disequilibrium (LD) (D′ = 0.89) and less common in smokers (OR = 0.68, p = 0.019 and OR = 0.73, p = 0.055, respectively). These two deletions were in partial LD with GSTT2a and GSTT2b duplications. All 5 CNVs seem to be associated with increased risk of preterm birth before 35 completed weeks. CNVs in the GSTTT1/GSTT2 region appear associated with low BW and PTD outcomes, but LD complicated these CNVs in GSTT1/GSTT2. In genetic association studies of BW, multiple CNVs in this region need to be investigated instead of a single polymorphism.


INTRODUCTION
Low birth weight (LBW) refers to the weight of a newborn being less than 2500 g (Kramer, 1987) and occurs in 16% of all livebirths worldwide (deOnis et al., 1998). Birth weight is regulated by two major processes: duration of gestation and intrauterine growth rate. Preterm delivery (PTD) of babies born with less than 37 weeks of gestation is responsible for one-third to two-thirds of infants with LBW (Arifeen et al., 2000;Martin et al., 2007). PTD and LBW are independent risk factors for fetal and infant mortality. However, the causes of PTD and LBW are not clear. Multiple factors may contribute to the development of LBW and/or PTD, including genetic and environmental factors, and other specific maternal-fetal characteristics (e.g., demographic, obstetric, nutritional factors, and maternal morbidity during pregnancy) (Kramer, 1987). Based on twin studies, the heritabilites of low birthweight (<2500 g) and PTD (<37 gestational weeks) are estimated to be 37-42 and 36%, respectively (Clausson et al., 2000).
One of the important environmental factors in birth outcomes is maternal tobacco smoking. Smoking exposure may reduce mean birth weight, and increase the risk of PTD and intrauterine growth restriction (Asmussen and Kjeldsen, 1975;Asmussen, 1977;Nilsen et al., 1984;Ronco et al., 2005;Goldenberg and Culhane, 2007;Haram et al., 2007). Part of the effect of smoking on birth outcomes may be due to the metabolism of the tobacco compound PAH (polycyclic aromatic hydrocarbons) (Perera et al., 2005;Tsui et al., 2008;Wu et al., 2010). PAH is a carcinogenic compound that is detoxified in a two-stage process. PAHs are converted into procarcinogen, which is then conjugated into excretal metabolites. The conjugation is catalyzed by the product of the gene GSTT1 (Glutathione S-transferase theta 1), which belongs to the theta class of GSTs. The class members, GSTT1 and GSTT2, are located on human chromosome 22. The genes are approximately 50 kb away from each other, with a GSTT pseudo gene (GSTTP1) located between them. GSTT1 and GSTT2 share 55% amino acid sequence identity and have detoxification roles (Coggan et al., 1998), which make the GSTT1/GSTT2 region an interesting candidate for explaining smoking-induced adverse effects.
A small deletion in GSTT1 is the only reported common copy number variant (CNV) in the region. A null genotype (homozygous deletion) of GSTT1 modifies the effect of maternal smoking on birth outcomes. Smokers have infants with lower mean birth weight compared to non-smokers; however, the reduction varies according to the GSTT1 copy number. Mean birth weight decreases dramatically in smokers with the GSTT1 null genotype compared to smokers with the GSTT1 wild-type genotype (Wang et al., 2002;Wu et al., 2007;Grazuleviciene et al., 2009;Aagaard-Tillery et al., 2010). It is unclear whether other CNVs may exist in the GSTT1/GSTT2 region.
The goal of this study was to explore whether additional CNVs are distributed in GSTT1/GSTT2 and to investigate their associations with smoking and birth outcomes. First, we detected CNVs in the GSTT1/GSTT2 regions. Next, we examined the association of GSTT1/GSTT2 CNVs with birth outcomes, stratifying by smoking status. Finally we investigated the relationship of GSTT1/GSTT2 CNVs with smoking.

STUDY POPULATIONS
Data were generated as part of the GENEVA (Gene Environment Association studies) preterm birth study (http://www.ncbi.nlm.n ih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000103.v1.p1). This study is part of dbGAP (http://www.ncbi.nlm.nih.gov/gap) (Mailman et al., 2007). A case-control study of preterm birth identified ∼1000 mother-child case pairs (spontaneous births at <37 weeks of gestation), and 1000 mother-child controls pairs (spontaneous births at 40 weeks of gestation) from the Danish National Birth Cohort (DNBC) (Olsen et al., 2001). 1937 mothers had genotype information generated from the Illumina Human660W-Quad chip. Of these, 893 had a preterm birth, 978 delivered at term, and 66 were categorized as neither case nor control (births between 37 and 39 weeks of gestation) and were excluded from the current study. Smoking data came from DNBC prenatal interviews conducted at several points during gestation. For the purposes of this study, we categorized smoking as "any smoking" or "no smoking" during pregnancy.

GENOTYPING AND QUALITY CONTROL
Complete genotyping and data cleaning reports for the GENEVA preterm birth study are available in dbGAP (https://www. genevastudy.org/sites/www/content/files/datacleaning/data_clean ing_reports/Preterm_Birth_DN_Murray_DCR_9-3-2010.pdf). The genotyping quality was extremely high. For this CNV study, SNPs were not filtered by minor allele frequency or Hardy-Weinberg equilibrium testing, as those filters tend to remove precisely the SNPs in CNV regions.

CNV CALLS BY PennCNV AND IDENTIFICATION OF CNV REGIONS
CNV calls for mothers were generated using the PennCNV software and published guidelines (2009Aug27 version) (Wang et al., 2007). A GC wave adjustment option was applied in PennCNV. We removed all samples that had been genome-wide amplified, as CNV calling in such samples is generally poor (Zheng et al., 2012). Samples were then filtered using the criterion log R ratio standard deviation (LRRSD) >0.3 resulting in 1617 out of 1937 mothers eligible for further analysis. All analyses were restricted to autosomes. Human genome build 36 was used for this study. CNVs with copy number >2 were defined as duplications, while those with copy number <2 were considered deletions. Each CNV contained at least three consecutive markers. For CNVs with high allele frequency of a deletion, we also applied the PennCNV validation algorithm.
To identify CNV regions between GSTT1 to GSTT2, we first merged CNVs detected by PennCNV if the length of overlap between any two CNVs is greater than 50% of the size of at least one of the CNVs. The start position of the resulting CNV region is defined as the minimum base pair position of the overlapping CNVs, and the end position is defined to be the maximum base pair position of the overlapping CNVs.

STATISTICAL ANALYSIS
We examined the association of smoking with PTD and birth weight in our dataset using logistic and linear regression, respectively. The association of CNVs with birth weight was evaluated using linear regression. Since preterm birth is known to be collinear with birth weight, analyses of birth weight were performed in the controls (full term births) only. The relationship between CNVs and PTD stratified by smoking status was examined using logistic regression. The association between CNVs and smoking adjusted for preterm birth state was assessed using logistic regression.
In regression analysis, deletion CNVs [copy number (CN) < 2] and duplication CNVs (CN > 2) were coded as dummy variables and compared to normal copy number (CN = 2), for two separate comparisons. For common deletions (allele frequency of deletion >35%) in GSTT1 and GSTTP1, we compared the homozygous deletions CN = 0 vs. CN > 0 as a single dichotomous comparison. We adjusted for infant sex and maternal BMI in all regression models. All calculations were completed in R (version 2.15.1) (R Development Core Team, 2009). Associations with p = 0.05 were considered of interest and are reported without additional correction. The linkage disequilibrium (LD) of two CNVs was measured by D .

ASSOCIATION BETWEEN SMOKING AND BIRTH OUTCOMES
The characteristics of the DNBC/GENEVA mothers by case and control status are described in Table 1. The average maternal BMI was similar between cases and controls (p = 0.70). As expected, the average birth weight in infants of cases was 2463 g, which is significantly lower than in controls 3719 g (p < 2.20E-16). The prevalence of smoking reported during pregnancy was significantly higher in cases than in controls (31.4 and 26.2% respectively, p = 0.029). We conducted association analysis of birth weight in controls (40 weeks gestation) only. Smoking was a significant predictor of birth weight (p = 2.01E-10). Among the infants with mothers who smoked, the mean birth weight was 277.87 g lower than in the non-smoking group.

IDENTIFICATION OF CNVs
Using the methods described above, we found five potential CNV regions located in three genes in the area of GSTT1∼GSTT2 (Figure 1). Two CNVs were in GSTT2 (GSTT2 a and GSTT2 b ), one was in GSTTP1, and the remaining two CNVs were in GSTT1 (GSTT1 a and GSTT1 b ). These five CNVs were not independent. The duplications in GSTT2 (GSTT2 a and GSTT2 b ) are correlated with two deletions, located in GSTTP1 and GSTT1 (GSTT1 b ). Detailed information on the relationship among these CNVs is summarized in the Table A1. None of our array markers were located within the GSTT1 null genotype region that has previously been reported in the literature, but our GSTT1 b region spans that location (see Figure 1), which suggests that the deletion region is larger than previously reported.

ASSOCIATION BETWEEN CNVs IN GSTT1/GSTT2 AND BIRTH WEIGHT
The 5 CNVs were assessed for association with birth weight in controls (full term births) (Tables 2, 3A,B). The GSTT2 b deletion was the only CNV associated with birth weight in full term controls (Table 3B, 122.0 g, p = 0.05). The GSTT2 a deletion was borderline associated birthweight regardless of smoking status and when examined in solely smokers (Table 3B. in all subjects, p = 0.060; in smokers only, p = 0.07). The GSTT2 b deletion was also borderline associated with birthweight when looking in non-smokers only ( Table 2; p = 0.09 among full-term births only). None of the interactions between these five CNVs with smoking were near statistical significance after evaluating a model of birthweight regressed on maternal BMI, infant sex, genotype, and genotype x smoking status (Table 2, P int ).

FIGURE 1 | CNV calls in a sample of 34 subjects.
Each dot represents a marker. Each line represents a CNV called by PennCNV. Red is duplication, blue is deletion. No marker was located in the region of the previously-reported GSTT1 null genotype.

ASSOCIATION BETWEEN CNVs IN GSTT1/GSTT2 AND PRETERM BIRTH
We then analyzed the association of CNVs with preterm birth (gestational weeks <37) and with birth <35 gestational weeks ( Table 2). None of the CNVs were significantly associated with the risk of preterm birth among smoking and non-smoking mothers considered separately (Table 4). However, the GSTT2 a duplication was borderline significantly associated with an increased risk of birth <35 weeks (odds ratio = 1.44, p = 0.08) in non-smokers ( Table 2). Among smokers, the duplication did not significantly influence the risk of birth <35 weeks (Table 4; odds ratio = 1.06, p = 0.88). The GSTT2 a deletion was associated with reduced birth weight in full term controls with borderline significance when smoking status was not considered (Table 3B; −65.3 g, p = 0.06), but was not associated with preterm birth or birth before 35 weeks ( Table 2). Three CNVs [GSTT2 b duplication (odds ratio = 1.48, p = 0.05), GSTTP1 deletion (odds ratio = 1.38, p = 0.09) and GSTT1 b deletion (odds ratio = 1.42, p = 0.06)] were all independently associated or borderline associated with birth <35 weeks in non-smokers ( Table 2). These three CNVs were also in partial LD with each other. The deletions in GSTTP1 and GSTT1 b are common and the homozygous deletion (null genotypes) frequencies are 15.5 and 15.8% respectively. The duplication of GSTT2 b when combined with the effect of smoking was similar to the CNV's effect among nonsmokers ( Table 2: odds ratio = 1.86, p = 0.05). The deletion of GSTT1 b was associated with increased risk of birth before 35 weeks (odds ratio = 1.42, p = 0.06) in non-smokers and when combined with smoking had comparable effect but was www.frontiersin.org October 2013 | Volume 4 | Article 196 | 3

ASSOCIATION BETWEEN GSTT1/GSTT2 AND SMOKING
To better understand the relationship among birth outcomes, smoking, and the CNVs we examined, we also tested association

"Change of mean BW" is the amount of birthweight (in grams) altered that is attributed to the CNV. This value is obtained from a regression coefficient estimate for the CNV in the model while adjusting for maternal body mass index and infant sex.
"P-value" is the p-value associated with change of mean BW.
−No estimate obtained.
between the CNVs and smoking. Our examination of the relationship between smoking and GSTT1/GSTT2 CNVs while adjusting for birth weight indicated that the duplication CNVs were not significantly associated with smoking. However, the deletion CNV in GSTT2 a was significantly associated with smoking ( Table 5. odds ratio of smoking vs. non-smoking in carriers of this deletion = 1.30, p = 0.04). This same deletion and smoking were jointly associated with decreased birth weight ( Table 2; −284 g, p = 2.50 × 10E-07), but the deletion alone accounted for less than half of the decreased weight (Table 3B. −116.8 g, p = 0.07) in smokers. Smokers were also less likely to have the deletions in GSTTP1 and GSTT1 b ( Table 5: odds ratios = 0.68 (p = 0.02) and 0.73 (p = 0.06), respectively). These deletions tend to increase birth weight in controls, although not significantly ( Table 2).

DISCUSSION
Linkage and SNP association studies have identified some genetic factors that are associated with adverse birth outcomes, including GSTT1. However, few CNV studies have been conducted for LBW and PTD outcomes. We focused on examining CNVs in the region from GSTT1 to GSTT2 and investigating their association with adverse birth outcomes and smoking.
Our finding that smoking may be associated both preterm birth and birth weight is consistent with other reports (Horta et al., 1997;Chan et al., 2001).
After identifying five CNV regions in GSTT2, GSTTP1, and GSTT1, we noted the complex association patterns with preterm birth and birth weight, which may be related to LD among the CNVs. The GSTT2 duplications GSTT2 a and GSTT2 b ), GSTTP1 deletion and GSTT1 b deletion are in partial LD (see Table A1). The GSTT2 a deletion decreased the mean birth weight in infants of smoking mothers and is more likely to be present in smokers, but the weight change was of borderline statistical significance. In contrast, the GSTT2 b deletion increases the birth weight (Table 3B: 120.2 g, p = 0.08) in infants of non-smoking mothers and is more common in non-smokers ( Table 5: odds ratio of smoking vs. non-smoking = 0.68, p = 0.20), but again the weight change was of borderline statistical significance. Interestingly, neither of the GSTT2 deletions was associated with preterm birth or "very preterm" birth (<35 weeks) (Windham et al., 2000;Danileviciute et al., 2012), but both GSTT2 duplications appear to be associated with increased risk of birth before 35 weeks in nonsmokers, and associated with an even higher risk when smoking and the duplications were considered jointly. These findings suggest that GSTT2 may have a role in detoxification of tobacco, although it is also possible that these results are explained by LD www.frontiersin.org October 2013 | Volume 4 | Article 196 | 5

Odds ratio P-value Odds ratio P-value
Odds ratio P-value   with other polymorphisms or multiple testing effects. CNVs in GSTT2 may influence the metabolism of nicotine, and exacerbate the toxic effects of smoking.

Odds ratio P-value Odds Ratio P-value
The deletions in GSTTP1 were in high LD with GSTT1 b (22,695,725,333 bp) (D = 0.89) and in moderate LD with two duplications in GSTT2 (D = 0.62). We did not observe significant effects of GSTTP1 and GSTT1 b deletions on birth weight in full term controls, but they were associated with increased risk of birth before 35 weeks in both non-smokers and smokers, with the odds ratio slightly higher when the deletions were considered jointly with smokers. This suggests that GSTTP1 and GSTT1 b deletions are associated with reduced birth weight through very preterm birth, but the effect on birth weight was not detected when including full term infants. We also found that GSTTP1 and GSTT1 b deletions were less likely to occur in smokers. Smoking could be a proxy for some other linked yet unknown behavior or may be independently associated with these deletion CNVs.
We compared our CNV results in GSTT1 and GSTT2 with published findings for the previously described GSTT1 CNV, which was reported to modify the effect of smoking on birth weight (Wang et al., 2002;Wu et al., 2007;Grazuleviciene et al., 2009;Aagaard-Tillery et al., 2010). Smokers with the null genotype of GSTT1 had infants with lower mean birth weight than those with the wildtype genotype of GSTT1. The deletion's physical genomic position (22,698,706,887 bp on chromosome 22) was not covered by the HumanHap660 chip resulting in a marker gap between 22,697,104, and 22,713,954 bp in our arrays. However, our GSTT1 b deletion encompasses the reported GSTT1 null genotype region. It is notable that the frequency of the homozygous GSTT1 b deletion CNV in our data was highly consistent with the previously reported frequency of the GSTT1 null genotype in Denmark (15%) (Saadat et al., 2001). In contrast to most previous studies, we were able to examine the influence of GSTT1 a and GSTT1 b deletions on fetal growth in women with full term births, which means that we were able to examine the effects on birth weight more or less independently of effects on gestational age. We did not find significant association of GSTT1 deletions with birth weight in full term birth, nor did we observe interactions of genotypes and smoking status on birth weight in controls. This may be due to the relatively small number of mothers who smoked during pregnancy, or due to the smoking measurement being a dichotomous measure. When contrasted with earlier reports of a GSTT1-smoking interaction impacting birth outcomes, the current study's lack of evidence for an interaction may be attributed to differences in confounders included in models, variability of smoker definitions, and this study's mothers being categorized as non-smokers based on the pregnancy time period only.
A single SNP (rs1622002) in GSTT2 has been associated with metabolism of major tobacco carcinogen PAH , but this SNP was not included in the present study's genotype panel.
There are several epidemiologic limitations in our study. Smoking was recorded as any smoking present during pregnancy, which did not consider the influence of smoking in different trimesters. Secondly, regressions were adjusted for some demographic factors, but excluded adjustments for psychosocial factors. Statistical evaluations were not corrected for multiple testing which tempers the conclusions that can be drawn about associations, but the uncorrected p-values do suggest a prioritization of findings to pursue in future studies.
In addition to the epidemiological issues, CNV calls are notoriously error-prone. It is not feasible to validate specific CNV calls within the context of a large-scale association study. However, for any given CNV with calling error rates independent of phenotype, those errors should only reduce power and not cause bias in the association measure (Zheng et al., 2012). We have no reason to expect CNV calling biases in the current study, so we feel that our results are quite robust. Moreover, the fact that our frequency for the GSTT1 b homozygous deletion was an almost perfect match to the frequency reported in this population for the known deletion was very reassuring. We did not compare our PennCNV CNV calls to any other algorithm in this study, although we did so previously in a more explicit algorithm comparison (Zheng et al., 2012).
Despite these limitations, our study is able to shed light on CNVs in smoking-associated adverse birth outcomes. Our candidate gene analysis in GSTT1/GSTT2 and identification of five CNVs that appear to be associated with birth weight and/or birth before 35 weeks is novel. The distribution and LD pattern of these CNVs are more complicated than expected, which may impact our understanding of the relationship between the GSTT1 deletion and adverse birth outcomes. Follow-up genetic association studies of birth weight need to include multiple CNVs in this region instead of single polymorphisms. Additional studies including molecular evidence will be needed to validate the detected CNVs and replicate these results.
GEI grant "JH/CIDR Genotyping for Genome-Wide Association Studies," National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA. This study was also supported by funding through NIH HD-004423-01 "Genome-wide Association Studies of Prematurity and Its Complications" and NIH HD-052953 "Identification of Maternal and Fetal Genetic Factors in Preterm Birth," National Institutes of Health, Bethesda, MD, USA. The work of XZ was supported by T32MH095169.