Genome-wide Interaction Study Implicates VGLL2 and Alcohol Exposure and PRL and Smoking in Orofacial Cleft Risk

Non-syndromic cleft lip with or without cleft palate (NSCL/P) is a common birth defect, affecting approximately 1 in 700 births. NSCL/P has complex etiology including several known genes and environmental factors; however, known genetic risk variants only account for a small fraction of the heritability of NSCL/P. It is commonly suggested that gene-by-environment (G×E) interactions may help explain some of the “missing” heritability of NSCL/P. We conducted a genome-wide G×E interaction study in cases and controls of European ancestry with three common maternal exposures during pregnancy: alcohol, smoking, and vitamin use using a two-stage design. After selecting 127 loci with suggestive 2df tests for gene and G x E effects, 40 loci showed significant G x E effects after correcting for multiple tests. Notable interactions included SNPs of 6q22 near VGLL2 with alcohol and 6p22.3 near PRL with smoking. These interactions could provide new insights into the etiology of CL/P and new opportunities to modify risk through behavioral changes.


INTRODUCTION
Interest in identifying the causal factors for birth defects, including orofacial clefts (OFCs) can be traced back centuries (Marazita, 2012) and has often involved a debate as to the contribution of genetic versus environmental risk factors (Beames and Lipinski, 2020). Evidence for a multifactorial model for OFC etiology originated with Fraser's early studies exposing pregnant mice to cortisone showing that the incidence of corticosteroid induced cleft palate varied by strain. The multifactorial model is favored in human nonsyndromic OFCs, where twin and family studies provide evidence for a strong, but incomplete, genetic component. Many environmental factors have been postulated to modify risk of OFCs including maternal medications, smoking or alcohol consumption (Romitti et al., 2007;Grewal et al., 2008), nutrition (Kelly et al., 2012), obesity (Blomberg and Källén, 2009), gestational diabetes (Figueiredo et al., 2015), and occupational exposures. Of these, cigarette smoking, alcohol consumption, and folic acid supplementation are the most widely studied, but, except for smoking (Hackshaw et al., 2011), these exposures have inconsistent results in epidemiological studies. Maternal smoking, on the other hand, has been shown to increase risk consistently across many studies with similar effect sizes (Little et al., 2004).
Over the last 10 years, 20 independent genome-wide association studies (GWAS) or meta-analyses have identified at least 50 loci associated with OFCs (Beaty et al., 2016). Cumulatively, these loci are estimated to account for 25-30% of the heritable risk attributed to additive genetic effects. Other approaches are needed to identify the "missing" heritability, some of which may be attributed to interaction effects. Leveraging published data from GWAS studies, gene-environment interaction (GxE) studies have become a popular approach to further elucidate OFC risk. Following one such GWAS study in European and Asian case-parent trios (Beaty 2010), candidate gene and pathway-based GxE analyses have suggested interactions between smoking and RUNX2 (Wu et al., 2012), and between multiple exposures and BMP4 (Chen et al., 2014). Genome-wide GxE using a variety of statistical approaches have identified interactions for maternal smoking, alcohol, or folate supplementation in cleft palate Wu et al., 2014;Haaland et al., 2017) and cleft lip with or without cleft palate (Haaland et al., 2018;Haaland et al., 2019).
In this study, we performed genome-wide GxE analyses in a case-control sample of European ancestry from the Pittsburgh Orofacial Cleft Study to identify interactions between genetic variants and three exposures (maternal smoking, alcohol consumption, or vitamin supplementation) during the periconceptional period that influence risk of OFCs.

Study Samples and Genotyping
The sample for these analyses was derived from a larger multiethnic OFC cohort, which has been previously described . Briefly, participants were recruited from 18 sites worldwide as part of ongoing genetic studies conducted by the University of Iowa and the University of Pittsburgh Center for Craniofacial and Dental Genetics. All sites obtained Institutional Review Board approval both locally and at the University of Iowa or the University of Pittsburgh. All participants gave informed consent. These data are available through dbGaP (accession number: phs000774. v2. p1).
This large multiethnic cohort contains OFC-affected probands and their unaffected relatives in addition to controls without a family history of OFC or other craniofacial anomalies. Additionally, data were obtained on three maternal periconceptional exposures: alcohol use, smoking, and vitamin use. Periconceptional exposures were gathered from maternal self-report of alcohol use, personal smoking, or vitamin use during the 3 months prior to pregnancy and during the first trimester for each child as were used as binary indicator variables in analyses.
From this larger cohort, a subset of unrelated cases and controls of European ancestry and with complete data for the three environmental exposure measures was drawn. It consisted of 344 NSCL/P cases and 194 controls of European descent from Denmark, Hungary, and the United States (Supplemental Table S1).
The genotyping, quality control procedures, imputation, and generation of principal components of ancestry (PCAs) have been previously described . Briefly, samples were genotyped using the Illumina HumanCore + Exome chip (Illumina Inc., San Diego, CA), with approximately 97% of the genotyped SNPs passing quality control filters, resulting in a total of 539,473 genotyped SNPs. Genotype imputation for an additional 34,985,077 unobserved polymorphisms was performed using IMPUTE2 software with the multiethnic 1,000 Genomes Project Phase 3 reference panel (Howie et al., 2009). Imputed genotype probabilities were converted to mostlikely genotypes using GTOOL; only most-likely genotypes with probabilities >0.9 were retained for statistical analysis. Imputed SNPs with INFO scores <0.5 or those deviating from Hardy−Weinberg equilibrium (p < 1 × 10 -4 in a set of unrelated European controls) were also excluded from the analysis. The University of Washington, Genetics Coordinating Center released a full online report on the data cleaning, quality assurance, ancestry analyses, and imputation for this study (http://www.ccdg.pitt.edu/docs/Marazita_ofc_QC_report_ feb2015.pdf, last accessed April 25, 2016). Additionally, we excluded variants with a minor allele frequency <10%, as the power to detect interaction effects among low-frequency variants is very limited and analysis of such variants is prone to type 1 errors. After these filtering steps, 5,165,675 variants were included in the genome-wide interaction analyses. Population structure was measured through generation of principal components of ancestry using the R package SNPRelate following the approach introduced by Patterson, Price, and Reich (Patterson et al., 2006).

Statistical Analyses
To assess the association between genetic variants and NSCL/P that may or may not be modified by an environmental factor, we used a two-stage approach for each of the three environmental exposure. First, we employed the strategy laid out in Kraft et al., 2007 a 2 degree of freedom (2df) joint test of the gene (G) and gene-environment interaction (GE) effects was employed genome-wide using logistic regression, adjusting for five principal components of ancestry (Kraft et al., 2007). This provides a sensitive test for two scenarios simultaneously: (Marazita, 2012) where there is a genetic effect that is similar across all environmental strata, and (Beames and Lipinski, 2020) where there is a genetic effect that is specific to a specific environmental stratum. Then, the GE effect was interrogated for variants demonstrating at least suggestive association from the G-GE joint test (i.e., p < 0.05) to identify regions in which there is a genetic effect differing across environmental stratum, not just a genetic effect alone. To estimate the dependence between the joint G-GE and GE alone tests and derive a significance threshold to use which accounts for this dependence, we performed simulation (details available in Supplemental Note). In simulation, using a first-stage threshold of p < 0.05, with a second-stage threshold of p < 0.00275, yielded a procedure with a properly controlled type-1 error rate of 0.05. To Frontiers in Cell and Developmental Biology | www.frontiersin.org February 2022 | Volume 10 | Article 621261 account for multiple testing, we then Bonferroni-adjusted this second-stage significance level for interrogating the GE effect, adjusting for the number of regions with suggestive results from the G-GE joint test (i.e., 0.00275/572 = 4.8 × 10 -6 ). All analyses were conducted in PLINK v1.9 assuming an additive genetic model (Chang et al., 2015).

RESULTS
As a follow-up study to previous GWAS identifying marginal gene effects (G) increasing risk for NSCL/P, this study focused on identifying G x E interactions. We carried out a two-stage analysis, where we first examined the 2df joint test of G and G x E (GE). We observed suggestive evidence of association in the joint G-GE test for 572 loci (Figure 1). Several loci had multiple SNPs with p < 0.05 in each of the three analyses. These included 8q24, one of the most reliably associated loci with NSCL/P in European populations (Beaty et al., 2016). As expected, this locus showed strong G effects, but did not have a significant GE effect (p > 0.05). A locus on 2p23 with similarly strong G effects but no GE effects was identified in all three analyses but to our knowledge has not been reported before in GWAS of NSCL/P. To identify gene-environment interactions, we next examined the GE effect alone for each variant with p < 0.05 in the joint test. Three regions (5 SNPs) had a significant GE p-value less than Frontiers in Cell and Developmental Biology | www.frontiersin.org February 2022 | Volume 10 | Article 621261 3 4.8 × 10 -6 (i.e., Bonferroni correction for 572 loci) and several more had suggestive results (GE p-value less than 5 × 10 -4 ) (Supplementary Table S2). Among these regions, we observed a significant interaction effect between rs706954, a variant <5 kb downstream of VGLL2 on 6q22.1, and periconceptional alcohol exposure (p GE = 4.62 × 10 -6 ). The minor allele (G) of rs706954 was associated with higher odds of NSCL/P within individuals with periconceptional alcohol exposure, but lower odds with individuals without periconceptional alcohol exposure ( Figure 2). We also observed a significant GE effect with periconceptional alcohol exposure for a SNP on chrX (rs5912923, p GE = 3.71 × 10 -6 ).
Two additional loci had several SNPs per locus with suggestive GE p-values for alcohol exposures. These included 6q21 (lead SNP rs6929494, p GE = 1.18 × 10 -5 ) and 11q21 (lead SNP rs7116990, p GE = 1.80 × 10 -5 ). At the 6q21 locus, which included genes REVL3 and TRAF3IP2, the major allele (A) of rs6929494 was associated with higher odds of NSCL/P within individuals with periconceptional alcohol exposure, but had an opposite (protective) effect for individuals without periconceptional alcohol exposure. The associated SNPs at the 11q21 locus, are located in an intergenic region between CNTN5 and JRKL; for the lead SNP, rs7116990, the major allele (A) was associated with greater odds of NSCL/P in the individuals with periconceptional alcohol exposure and lower odds of NSCL/P for individuals without periconceptional alcohol exposure.
In the vitamin exposure analysis, one locus on 16p21.1 harbored a variant with a significant GE effect (rs9930171, p GE = 4.78 × 10 -6 ). While not statistically significant, several SNPs on 3q27.3 showed a suggestive interaction with vitamin exposure (lead SNP rs61069053, p GE = 1.05 × 10 -5 ). The minor allele (G) of rs61069053, located in an intergenic region between CRYGS and DGKG, had higher estimated odds of NSCL/P in the individuals without periconceptional vitamin exposure, but had lower odds of NSCL/P for the periconceptional vitamin exposure group. However, this result may be confounded by the distribution of vitamin use and allele frequency; only one individual was homozygous for the minor allele and did not have vitamin exposure.
In the periconceptional smoking analysis, no variants achieved statistical significance. However, at 6p22.3, several SNPs showed suggestive GE p-values for smoking (lead SNP rs35097027, p GE = 1.96 × 10 -5 ). These variants are upstream of PRL; the minor allele (C) of rs35097027 conferred higher odds of CL/P for individuals with maternal smoking exposure, but lower odds of NSCL/P for individuals without maternal smoking exposure (Figure 2).

DISCUSSION
The primary goal of this paper was to identify additional geneenvironment interactions for NSCL/P and build on our previous work in this dataset . These analyses were motivated by multiple studies reporting associations between OFCs and periconceptional smoking, alcohol use, and vitamin supplementation (Dixon et al., 2011;Beaty et al., 2016). Genome-wide interaction scans, however, have been limited for NSCL/P (Wu et al., 2014;Haaland et al., 2018). Our analyses add to a growing list of loci for which interactions between SNPs and maternal environmental exposures influence risk of OFCs.
While procedures for identifying GxE interactions have low power in general, the 2df joint G-GE test is nearly always more powerful than the marginal G and/or GE tests, even with a relatively small sample size, making it a useful screening tool, especially for common variants with strong effects (Kraft et al., 2007). Consistent with this, the loci in these analyses with statistically significant G-GE associations that were driven by the GE effect had markedly strong effect estimates. Notably, the estimated odds of NSCL/P were 2.73 times higher with each copy of the minor allele G of rs117083 (in linkage disequilibrium with rs706954, R 2 = 0.88) for individuals with maternal periconceptional alcohol exposure and was 35% lower for individuals without periconceptional alcohol exposure. For the only locus with more than one SNP with a significant GE p-value [i.e., 6q22.1 (alcohol)], the genotypic odds ratio ranged from 40% lower odds of CL/P to 178% higher odds. For alcohol and smoking exposures, we primarily identified scenarios described as "cross-over" interactions where the genotype group at lowest risk for NSCL/P without the exposure had high risk for NSCL/P with the exposure. In each of these interactions, the influence of genotype entirely depends on the exposure. The method we used to detect interactions is better suited for these "cross-over" interactions. However, it is limited in its ability to detect associations that are only present in one environmental context. For example, it is quite plausible that a genetic variant may only modify risk in the presence of a specific environment. This method has lower statistical power to detect such an association; additional methods are needed to explicitly search for these association signals.
We identified three loci with significant interactions and several loci with suggestive interactions. Of these, two loci have notable biological plausibility which we discuss in detail. The only locus with several variants demonstrating significant GE effects with periconceptional alcohol exposure was on 6q22.1, approximately 5 kb upstream of VGLL2, encoding vestigial-like 2. Vgll2 is one of several Vestigial-like factors that plays a role in muscle fiber differentiation and the distribution of skeletal muscle fibers (Honda et al., 2017). Although its expression in adult tissues is restricted to skeletal muscle, during development, Vgll2 is more broadly expressed. In mice, its expression is enriched in the mandibular and maxillary prominences at embryonic day 10.5 and is localized to the epithelia of these structures (Johnson et al., 2011). In the zebrafish, Vgll2 is expressed in the pharyngeal pouches and somites (Maeda et al., 2002;Mielcarek et al., 2002). A role for VGLL2 in craniofacial development is further supported by vgll2a zebrafish morphants, which induce death of neural crest cells in the pharyngeal arches causing hypoplasia of the Meckel's and palatoquadrate cartilages and truncation of the ethmoid plate (Johnson et al., 2011). Models for fetal alcohol syndrome indicate that ethanol exposure enhances cell death of cranial neural crest cells (Cartwright and Smith, 1995;Smith et al., 2014). Although a specific interaction between VGLL2 and ethanol has not been described in vitro or in vivo, these data, combined with our statistical interaction suggest this may be a fruitful area for future studies.
At the 6p22.3 locus, which was suggestively associated with exposure to maternal smoking, the lead SNP was located 60 kb upsteam of PRL, which encodes prolactin. Prolactin is a reproductive hormone primarily known for its ability to stimulate mammary gland development and lactation, but it also has a variety of functions in reproduction, growth, and development (Bole-Feysot et al., 1998) transcripts and proteins have been detected in non-lactogenic tissues notably including the facial cartilage and olfactory epithelium in mammals (Freemark et al., 1997). Prolactin levels are known to be altered by both active smoking and exposure to secondhand tobacco smoke (Muraki et al., 1979;Ng et al., 2006;Xue et al., 2010;Benedict et al., 2012). Although smoking is associated with lower prolactin levels in pregnant women, the same is not true for fetal prolactin levels (Andersen et al., 1984). A role for prolactin in craniofacial development and OFCs is unclear as knockouts of mouse PRL receptors are viable and lack gross morphological defects (Ormandy et al., 1997). However, in amphibian models, prolactin signaling has been suggested as the pathway underlying the ability of premetamorphic Xenopus laevis tadpoles to correct craniofacial defects induced by thioridazine (Pinet et al., 2019). However, the same group also found that increased prolactin signaling during development alone does not cause craniofacial defects. Nonetheless, there is evidence linking smoking to prolactin levels and for prolactin signaling in craniofacial development; additional research will be needed to connect these results and determine a role for PRL and smoking in OFCs.
Our previous studies using this dataset have made use of the ancestral diversity that resulted from the 13 different recruitment sites worldwide. However, this study was limited to individuals of European descent as the number of unrelated participants with complete data from the other ancestry groups was quite small. In previous GWASs for gene-environment interactions, a lack of observations precluded the analysis of smoking and alcohol exposures in Asian trios (Haaland et al., 2018). The authors speculated that this was consistent with general trends of low alcohol and cigarette use among Asian women (Ng et al., 2014). This may also be true for other populations recruited into our study, but the incomplete data may also be due to changing study designs over the recruitment period. We also note that the recruitment of these participants was not population-based and thus does not reflect the population prevalence of NSCL/P nor the three environmental exposures we considered. In addition, a binary exposure variable does not capture or represent the range of possible environmental exposures that may influence development of NSCL/P. Therefore, the conclusions drawn here are limited to the study sample used in these analyses and cannot be used to draw conclusions at a broader level.
To summarize, we performed a genome-wide analysis to detect gene-environment interactions influencing risk of NSCL/P in individuals of European ancestry. We identified two notable interactions: near VGLL2 and PRL with maternal periconceptional alcohol use and smoking, respectively that are plausibly associated with risk of NSCL/P. These interaction effects are novel and warrant further investigations. If confirmed, these interactions provide new insights into the etiology of NSCL/P and could provide opportunities to modify risk through behavioral changes.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Institutional Review Boards at the University of Pittsburgh and Emory University. Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin.