Analysis of Parent-of-Origin Effects on the X Chromosome in Asian and European Orofacial Cleft Triads Identifies Associations with DMD, FGF13, EGFL6, and Additional Loci at Xp22.2

Background: Although both the mother's and father's alleles are present in the offspring, they may not operate at the same level. These parent-of-origin (PoO) effects have not yet been explored on the X chromosome, which motivated us to develop new methods for detecting such effects. Orofacial clefts (OFCs) exhibit sex-specific differences in prevalence and are examples of traits where a search for various types of effects on the X chromosome might be relevant. Materials and Methods: We upgraded our R-package Haplin to enable genome-wide analyses of PoO effects, as well as power simulations for different statistical models. 14,486 X-chromosome SNPs in 1,291 Asian and 1,118 European case-parent triads of isolated OFCs were available from a previous GWAS. For each ethnicity, cleft lip with or without cleft palate (CL/P) and cleft palate only (CPO) were analyzed separately using two X-inactivation models and a sliding-window approach to haplotype analysis. In addition, we performed analyses restricted to female offspring. Results: Associations were identified in “Dystrophin” (DMD, Xp21.2-p21.1), “Fibroblast growth factor 13” (FGF13, Xq26.3-q27.1) and “EGF-like domain multiple 6” (EGFL6, Xp22.2), with biologically plausible links to OFCs. Unlike EGFL6, the other associations on chromosomal region Xp22.2 had no apparent connections to OFCs. However, the Xp22.2 region itself is of potential interest because it contains genes for clefting syndromes [for example, “Oral-facial-digital syndrome 1” (OFD1) and “Midline 1” (MID1)]. Overall, the identified associations were highly specific for ethnicity, cleft subtype and X-inactivation model, except for DMD in which associations were identified in both CPO and CL/P, in the model with X-inactivation and in Europeans only. Discussion/Conclusion: The specificity of the associations for ethnicity, cleft subtype and X-inactivation model underscores the utility of conducting subanalyses, despite the ensuing need to adjust for additional multiple testing. Further investigations are needed to confirm the associations with DMD, EGF16, and FGF13. Furthermore, chromosomal region Xp22.2 appears to be a hotspot for genes implicated in clefting syndromes and thus constitutes an exciting direction to pursue in future OFCs research. More generally, the new methods presented here are readily adaptable to the study of X-linked PoO effects in other outcomes that use a family-based design.


INTRODUCTION
Orofacial clefts (OFCs) are the most common craniofacial birth defects in humans, affecting approximately 1-2/1000 live births worldwide (Mossey et al., 2009). They are broadly categorized according to whether the affected region includes the primary palate, the secondary palate, or both, and whether they occur with or without additional congenital anomalies (Dixon et al., 2011;Marazita, 2012;Beaty et al., 2016). OFCs pose a substantial public health burden in terms of the medical costs and sequelae associated with their treatment, which may persist from infancy to childhood and throughout life (Wehby and Cassell, 2010;Berg et al., 2016a,b). OFCs have also been linked to higher risk of specific types of cancer in later life (Zhu et al., 2002;Bille et al., 2005), increased overall mortality well into adulthood (Christensen et al., 2004), and lower academic achievement (Wehby et al., 2014). Despite major progress in surgery and other medical interventions aimed at repairing and managing the cleft itself, current understanding of the biological underpinnings of these relatively common birth defects is still incomplete.
A PoO effect describes the situation where the effect of an allele in the offspring differs according to the parental origin of the allele (Guilmatre and Sharp, 2012;Gjerdevik et al., 2017). PoO effects are particularly relevant for birth defects because the mother influences the development of the fetus through the action of her own genes and through providing the prenatal environment for the fetus. To estimate PoO effects, one contrasts the frequency of alleles transmitted to an affected offspring from the mother versus the father, and if transmission distortion to the affected offspring is stronger for mothers than fathers (or vice versa), there is evidence of a PoO effect (Weinberg, 1999;Jugessur et al., 2012b;Connolly and Heron, 2015). PoO effects might also occur on the X chromosome. Given the consistently observed excess of females with cleft palate only (CPO) and excess of males with cleft lip with or without cleft palate (CL/P), OFCs are good examples of traits that might be caused by various types of effects on the X chromosome (including PoO effects). Moreover, several genes on the X chromosome are known to cause syndromic forms of clefts, and there is growing evidence that X-linked genes might also contribute to isolated clefts (Jugessur et al., 2012a;Patel et al., 2013;Fonseca et al., 2015;Wise et al., 2016;Skare et al., 2017).
A major shortcoming of most previous GWAS has been the systematic exclusion of SNPs on the X chromosome prior to analysis, even though this chromosome comprises approximately 5% of the human genome and many genotyping platforms do include X-linked SNPs. This has led to only a few analyses of Xlinked markers for complex traits in general (Wise et al., 2013), and for OFCs in particular, only two studies have explored PoO effects at the genome-wide level (Shi et al., 2012;Garg et al., 2014), and neither investigated PoO effects on the X chromosome.
With these gaps in mind, we upgraded our R-package Haplin (Gjessing and Lie, 2006) with new functionalities to enable PoO analyses at the genome-wide level, as well as an assessment of statistical power for different statistical models. The current analyses are based on the largest collection of case-parent triads of OFCs to date (Beaty et al., 2010). We implemented a sliding-window approach to haplotype analysis and used two Xinactivation models, one with and one without the assumption of X-inactivation in females , to explore PoO effects on the X chromosome. In addition, we performed separate analyses on female offspring alone to allow for possible sexspecific differences.

Study Populations
Characteristics of the study populations, the genotyping platform, and the quality control criteria used for data cleaning have been detailed elsewhere (Beaty et al., 2010;Shi et al., 2012;Patel et al., 2013;Skare et al., 2017). Briefly, genotyping was performed on an Illumina Human610-Quad R platform and genotypes for 589,945 SNPs (99.56% of the attempted SNPs) were released and later deposited in the Database of Genotypes and Phenotypes (dbGaP; http://www.ncbi.nlm.nih.gov/gap) under study accession ID phs000094.v1.p1. Genotypes for 14,486 Xchromosome SNPs in 1,291 Asian and 1,118 European caseparent triads of isolated clefts were available for the current analyses. For additional data cleaning, we used PLINK (Purcell et al., 2007) to exclude individuals with more than 10% missing genotypes, SNPs with more than 1% missing genotypes, and SNPs with a minor allele frequency (MAF) less than 0.01. Mendelian errors were not assessed during data cleaning; however, SNPs with more than 30 Mendelian errors were excluded at the analysis stage. After data cleaning, 13,180 X-chromosome SNPs were available for the current PoO analyses. Table 1 shows the number of triads according to ethnicity, cleft subtype and child's sex. For each ethnicity, isolated CL/P and isolated CPO triads were analyzed separately.

Statistical Methods
We recently published a new approach to chromosome-wide analysis of X-linked SNPs using the same dataset as here (Skare et al., 2017). In the current paper, we extend the approach to also cover PoO effects on the X chromosome, as implemented in our statistical software Haplin (Gjessing and Lie, 2006). Two types of analyses were performed on the Asian and European samples: (i) Single-marker analyses, where SNPs were analyzed individually, and (ii) haplotype analyses, where up to four SNPs per sliding window were analyzed together. Haplin fits a log-linear model to genotype data from case-parent triads. It reconstructs haplotypes from multiple-SNP data and estimates the relative risk and confidence interval for one or two copies of a target allele or haplotype. Since the expectation-maximization algorithm is implemented in Haplin, incomplete triads can also be used in the analyses after accounting for missing parental genotypes in the maximum likelihood estimation.
For haplotype analysis, we used the HaplinSlide function in Haplin. For additional information on data formats and the HaplinSlide function, see our recent chromosome-wide analysis of X-linked SNPs (Skare et al., 2017), our website at https://people.uib.no/gjessing/genetics/software/haplin/, or the R help pages for Haplin at https://CRAN.R-project.org/package= Haplin. Estimating PoO Effects on the X Chromosome Using Haplin Within the maximum-likelihood framework of Haplin, autosomal PoO effects are derived essentially by contrasting the relative frequency of the variant allele when transmitted from the mother to the affected child versus the relative frequency of the variant allele when transmitted from the father; there is evidence of a PoO effect if the frequencies differ (Weinberg, 1999;Gjerdevik et al., 2017). Haplin provides estimates of RR m and RR f , which are the relative risk increase (or decrease) associated with inheriting the variant allele from the mother and from the father, respectively. The measure of PoO effect is then the ratio of relative risks, RRR = RR m /RR f , i.e., the PoO effect is a comparison of the relative risks derived from the maternally and paternally inherited alleles. A value of RRR = 1 would be obtained from an allele with no PoO effect, i.e., RR m = RR f . Note that even when RRR = 1, RR m and RR f could themselves still be different from 1, meaning that there could be an effect of the allele carried by the fetus even in the absence of a PoO effect. Under a multiplicative model, the risk is assumed to be RR m * RR f when the allele is inherited both from the mother and the father (i.e., a double dose of the variant allele).
Similarly, on the X chromosome, the estimates of RR m and RR f can be obtained by restricting the analyses to girls only. However, since the X chromosome in boys is maternally derived, Haplin allows an increase in power by combining the relative risk estimate RR B from boys with the estimates from girls. The two options for doing so assume either X-inactivation among girls, where RR m * RR f = RR B , or no X-inactivation, where RR m = RR B . Note that in the model assuming X-inactivation, equating the estimate RR B from the boys with the product of RR m and RR f may influence the RR m and RR f estimates themselves, but has relatively little influence on the ratio RRR. When studying the significance of PoO specifically, the hits obtained from investigating girls only will thus be similar to those obtained from the model assuming X-inactivation. The combined model assuming no X-inactivation, however, may well produce other hits. Both combined models allow different baselines risks for boys and girls to be fitted. Figure 1 provides a detailed explanation of the Haplin model for PoO effects on the X chromosome.

Post-processing of Results
For each cleft subtype, -log 10 (p) were plotted against chromosomal position in a Manhattan plot, before all the Manhattan plots were collated and displayed in a single figure.
To control for the proportion of falsely rejected hypotheses, we applied a false discovery rate (FDR) method where the original p-values were replaced by "q-values" (Storey and Tibshirani, 2003). For example, among SNPs with a q-value ≤ 0.2, one would expect an FDR of less than 20%.

Electronic Database Information
Haplin version 6.2.1 is implemented as a package in the R statistical software (R Development Core Team, 2014) and can be installed from the CRAN library. More information can be found FIGURE 1 | An illustration of the Haplin model for parent-of-origin (PoO) effects on the X chromosome. The red arrows show the relative risks associated with girls inheriting the risk allele "a" from the mother (RR m ) or from the father (RR f ). Under the multiplicative risk model illustrated here, the relative risk increase for "aa" girls, i.e., girls inheriting allele "a" from both the mother and the father, is RRaa = RR m *RR f . The ratio RR m /RR f = 1.4 is a measure of the PoO effect (blue arrow). The risk increase for boys when inheriting the "a" allele is RR B = 2.0. Under the assumption of X-inactivation (A), the risk increase for girls inheriting "aa" is the same as that for a single "a" in boys, i.e., RRaa = RR B . Under the assumption of no X-inactivation (B), the risk increase for girls inheriting "a" from the mother is the same as that for boys, i.e., RR m = RR B . In the model without X-inactivation, when a girl inherits "a" from the father as well, this may lead to a higher total risk increase for girls than for boys inheriting the one "a" from the mother. In this illustration, RRaa = RR m *RR f = 2.86 > RR B . The model allows different baseline risks for girls and for boys, here 0.4 and 0.5%, respectively. at our web site (https://people.uib.no/gjessing/genetics/software/ haplin/).

Ethics Approval
Ethics approvals for the International Cleft Consortium were obtained from the respective institutional review boards of the participating sites. The consortium was formed in 2007 and each participating institution approved research protocols for the recruitment of case-parent triads from 13 individual sites. All participants have granted their written informed consents. The participating sites included institutions in the US (Johns Hopkins University; University of Iowa; Utah State University; National Institute of Environmental Health Sciences (NIEHS); University of Pittsburgh), Denmark (University of Southern Denmark), Norway (University of Bergen), China (Peking University Health Science Center; Wuhan University; Peking Union Medical College; West China School of Stomatology, Sichuan University; School of Stomatology, Beijing University), Korea (Yonsei University), Taiwan (Chang Gung Memorial Hospital), and Singapore (KK Women's & Children's Hospital; National University of Singapore). For more details on the recruitment sites, the research approvals and protocols, see the online "Supplementary Note" of the original publication (Beaty et al., 2010), as well as the study outline at dbGAP (https:// www.ncbi.nlm.nih.gov/gap), under study accession number phs000094.v1.p1.

RESULTS
The results are organized in two main parts: (Part A) presents the results of the analyses of the boys and girls together (Figure 2), and (Part B) presents the results of the analyses of the girls only ( Figure 3). For each of the X-inactivation models depicted in Figure 2, the results of the single-marker analyses are presented first, followed by the results of the haplotype analyses. Table 2 provides additional information on the SNPs and haplotypes, along with their relative risks (RRs) and 95% confidence intervals (CIs). Note that the Manhattan plots for the single-marker and haplotype analyses (Figures 2, 3) only show the lead SNPs and haplotypes. A more complete list of SNPs and haplotypes lying above the p-value cutoff of 10 −4 in Figures 2, 3 is provided in Table 2. Figure 4 shows the results of power simulations for different statistical models (girls only, without X-inactivation, with Xinactivation), based on sample sizes reflecting those available in the current GWAS dataset. Table 3 provides a synopsis of all the genes in which associations were identified in the current analyses. To determine whether a given SNP or haplotype was located in or near a gene within 20 kb, we used the 1,000 Genomes browser (https://www.ncbi.nlm.nih.gov/ variation/tools/1000genomes).

(i) Asian Sample
There were no associations in the single-marker analyses of isolated CPO or isolated CL/P in the model without Xinactivation. By contrast, three SNPs stood out in the model with X-inactivation: rs5935623 in CPO; and both rs12690207 and rs2405829 in CL/P. rs5935623 is located in the gene for "EGFlike domain multiple 6" (EGFL6 at Xp22.2), rs12690207 is located in "Fibroblast growth factor 13" (FGF13 at Xq26.3-q27.1), and rs2405829 is located in "Dystrophin" (DMD at Xp21.2-p21.1).
In haplotype analyses, haplotype rs6527993-rs4969754-rs5990881-rs4969757 was associated with CL/P in the model without X-inactivation. Although the q-value for this haplotype was 0.008 ( Table 2), none of the SNPs is located in or near a gene within 20 kb. In the model with X-inactivation, we found associations with two haplotypes: (i) rs4366220-rs5928243-rs988530-rs5928245 in CPO and (ii) rs5979748-rs2162-rs5935409-rs5935412 in CL/P. The first SNP in haplotype (i) lies ∼9.7 kb from DMD, and the first SNP in haplotype (ii) lies ∼5.3 kb from "Phosphoribosyl pyrophosphate synthetase 2" (PRPS2 at Xp22.2).

(ii) European Sample
In the model without X-inactivation, we found associations with rs4559361, rs5933262, and rs4128285 in CPO (all three SNPs had the same q-value of 0.155; Figure 2, Table 2). rs4559361 lies ∼4.4 kb from the gene for "SPANX family, member N4" (SPANXN4 at Xq27.3), rs5933262 is located in "Ubiquitin specific peptidase 26" (USP26 at Xq26.2), and rs4128285 is not located near any gene within 20 kb. In the model with X-inactivation, rs6529005 was associated with CPO. This SNP is not located near any gene within 20 kb. In haplotype analyses, rs2024759-rs6609469-rs743046-rs2498183 was associated with CPO in the model without X-inactivation. The last SNP in this haplotype is located ∼24.2 kb from "Zinc finger protein 157" (ZNF157 at Xp11.3). In the model with X-inactivation, four haplotypes were associated with CPO, but only rs1139916-rs2266856-rs5925077-rs5925079 (q = 0.169) has SNPs located in a specific gene-in "Gammaaminobutyric acid type A receptor epsilon subunit" (GABRE at Xq28).

(i) Asian Sample
In single-marker analyses, we identified associations with the same three SNPs (rs5935623 in CPO, and both rs12690207 and rs2405829 in CL/P) as in the single-marker analyses based on the model with X-inactivation (Figure 2). In the haplotype analyses, two of the haplotypes that were associated in the analyses based on the model with X-inactivation (Figure 2) were also identified here and in the same cleft categories; notably, rs4366220-rs5928243-rs988530-rs5928245 in CPO and rs5979748-rs2162-rs5935409-rs5935412 in CL/P. There were two additional haplotypes associated with CPO: (i) rs982598-rs12843271-rs4826609-rs5961136 and (ii) rs6639176-rs5978529-rs1874039-rs5978530. The last SNP in haplotype (i) is located in "Inter-alpha-trypsin inhibitor heavy chain family member 6"  (ITIH6 at Xp11.22), while the SNPs in haplotype (ii) are located in "FERM and PDZ domain containing 4" (FRMPD4 at Xp22.2).

(ii) European Sample
In single-marker analyses, rs6529005 (q = 0.161) was associated with CPO, which was also identified in the single-marker analyses in the same cleft category based on the model with Xinactivation (Figure 2). There were no associations with CL/P. In the haplotype analyses, the same SNP identified in the single-marker analysis above, rs6529005, appeared to be driving the association here between rs2869525-rs6529005-rs2180212-rs5954651 and CPO. One more haplotype was associated with CPO: rs10521639-rs12115935-rs16979405-rs12009996. Both of  Fatemi et al. (2013) reported increased expression of GABRE in the lateral cerebella of subjects with schizophrenia, bipolar disorder and major depressive disorder. Fatemi et al., 2013 Asian; Girls only Singlemarker EGFL6 (Xp22.2) EGF like domain multiple 6 As above. As above.
Asian; Girls only Haplotype ITIH6 (Xp11.22) Inter-alpha-trypsin inhibitor heavy chain family member 6 ITIH6 encodes a protein belonging to the inter-alpha trypsin inhibitor heavy chain ( Frontiers in Genetics | www.frontiersin.org FRMPD4 encodes a multi-domain (PDZ and FERM) containing protein. It modulates the activity of key postsynaptic scaffold proteins that are involved in cognitive processes. Variants in FRMPD4 have been associated with X-linked intellectual disability and schizophrenia. Hu et al., 2016;Matosin et al., 2016;Trujillano et al., 2017 PRPS2 (Xp22.2) Phosphoribosyl pyrophosphate synthetase 2 As above. As above.
a Genes located within the same chromosomal band Xp22.2 are emboldened. b Information on these genes was collated from the NCBI Entrez gene database (https://www.ncbi.nlm.nih.gov/gene) and the references shown.
these haplotypes were already identified in the haplotype analyses based on the model with X-inactivation (Figure 2), but none of the SNPs in these haplotypes is located near any known gene within 20 kb.

Power Simulations
Haplin includes a complete framework for power simulations . Figure 4 displays the a priori power calculations for a single SNP to detect PoO effects on the X chromosome. All calculations were based on 1,000 simulated datasets with a 0.05 nominal significance level (note that a baseline risk of one was used throughout). The power of a single SNP to detect PoO effects on the X chromosome depends on several factors, including the minor allele frequency (MAF), effect size, sample size and family design. The power of the model focusing on girls only is similar to that of the model with Xinactivation in females (Figures 4A,C), which is consistent with the fact that those models provide similar estimates of RRR (as also mentioned in the Materials and Methods section). Overall, the power is sufficient for RRRs > 2, with the range of MAFs and sample sizes presented in our analyses (RRRs ≥ 2.5 for the smallest sample sizes).

DISCUSSION
Our analyses detected possible PoO effects with several SNPs and haplotypes on the X chromosome, some of which were located in or close to genes (summarized in Table 3). Perhaps the most prominent gene on the list is "Dystrophin" (DMD), the second largest gene in humans according to the size of the transcript and protein product (Richards and Hawley, 2011). Dystrophin forms part of the dystrophin-glycoprotein complex whose function is to provide mechanical stability to the plasma membrane of striated muscle cells (Michele and Campbell, 2003;Gao and McNally, 2015). Several lines of evidence point to a link between muscular dystrophy and OFCs. For example, both congenital muscular dystrophy and CL/P have been observed in the rare autosomalrecessive Walker-Warburg syndrome (Moore et al., 1988;Dobyns et al., 1989;Pratap et al., 2007;Vajsar et al., 2008). More recently, we and others have reported associations between OFCs and variants in DMD in several chromosome-wide studies of Xlinked SNPs (Patel et al., 2013;Fonseca et al., 2015;Wise et al., 2016;Skare et al., 2017). Even though only the Fonseca et al. (2015) study was based on a different GWAS dataset and study population compared to the other three studies, the repeated identification of DMD nevertheless constitutes an exciting new direction to pursue in future OFCs research.
Besides DMD, we identified PoO effects with several other genes, two of which-EGFL6 and FGF13-offer biologically plausible links to OFCs. EGFL6 belongs to the EGF-repeat containing superfamily of genes known to be involved in cell cycle regulation and proliferation (Buchner et al., 2000a). Expression analyses of murine Egf16 show strong expression in the mesenchymal components of both the hard and soft palate from embryonic stage E12.5 (Buchner et al., 2000b). This spatiotemporal pattern of gene expression supports a role for EGFL6 in the development of the orofacial complex, but as with DMD, the associations with EGFL6 will need to be validated in other OFCs samples.
FGF13 belongs to the FGF family of signaling molecules known to play key roles in embryonic development (Nie et al., 2006;Rahimov et al., 2012;Du et al., 2016). Several FGF ligands and receptors are expressed in the developing facial primordia, and mutations in specific FGF receptors have been reported to cause clefting syndromes (Rahimov et al., 2012). For example, Kallmann syndrome is caused by mutations in FGFR1 (Dodé et al., 2003) whereas Apert syndrome is caused by mutations in FGFR2 (Wilkie et al., 1995). FGF13 is thus a promising gene to pursue in other independent cleft samples.
As opposed to the above genes, there are no obvious connections between OFCs and the remaining genes in Table 3. It is nonetheless noteworthy that both PRPS2 and FRMPD4 are located in chromosomal region Xp22.2, which is also the location of EGFL6 whose link to OFCs was discussed above. Figure 5 is a collage of screenshots showing all the genes in chromosomal region Xp22.2 as they appear in the genome browser of The Ensembl Project (Kersey et al., 2016;Yates et al., 2016). This chromosomal region contains two genes that have been linked to clefting syndromes. The first is "Oral-facial-digital syndrome 1" (OFD1), a gene in which we had previously identified associations FIGURE 5 | Genes located in chromosomal region Xp22.2. This collage of screenshots of chromosomal region Xp22.2 was generated using the Ensembl genome browser (http://www.ensembl.org/) (Kersey et al., 2016;Yates et al., 2016). The three genes identified in this study are shown in boxes, in addition to the "Oral-facial-digital syndrome 1" (OFD1) gene, in which we had identified associations in our previous study on candidate genes on the X chromosome (Jugessur et al., 2012a), and the "Midline 1" (MID1) gene.
with OFCs in a candidate-gene analysis of X-linked markers (Jugessur et al., 2012a). Mutations in OFD1 underlie the Xlinked dominant oral-facial digital syndrome type 1, which is characterized by malformations of the face, oral cavity and digits, and lethality in most affected males (Ferrante et al., 2001). The orofacial abnormalities in Oral-facial-digital syndrome 1 include median cleft lip, clefts of the alveolar ridge, and cleft palate. The second gene in chromosomal region Xp22.2 region is "Midline 1" (MID1). Mutations in this gene cause the X-linked Opitz GBBB syndrome, a congenital midline malformation syndrome featuring clefting of the lip and palate as part of the overall clinical picture (Quaderi et al., 1997). Furthermore, associations between isolated CL/P and specific haplotypes in MID1 have been reported in an Italian population (Scapoli et al., 2008).
As seen in Figure 5, MID1, FRMPD4, PRPS2, EGFL6, and OFD1 all lie relatively close to one another on chromosome Xp22.2. Specifically, MID1 is ∼1.3 Mb from FRMPD4, FRMPD4 is ∼66.8 kb from PRPS2, PRPS2 is ∼745.3 kb from EGFL6, and EGFL6 is ∼81.8 kb from OFD1. Aside from the potential contributions of the genes themselves, their shared location on Xp22.2 suggests that there might be a specific locus (or loci) in that region driving the associations detected by our PoO analyses-a hypothesis worth pursuing in other OFCs samples. In addition, the apparent excess of genes on the X chromosome that are associated with sexually dimorphic traits (Saifi and Chandra, 1999;Vawter et al., 2004;Berletch et al., 2011) makes genes that escape X-inactivation Deng et al., 2014;Balaton and Brown, 2016) particularly attractive candidates in explaining some of the observed male-female differences in development and physiology (Ober et al., 2008;Berletch et al., 2011). Such a scenario would be captured by our model assuming no X-inactivation ( Figure 1B). Since both CL/P and CPO consistently exhibit a skewed sex ratio in prevalence, it may be worth exploring this hypothesis in future studies of PoO effects in OFCs. In addition, scenarios involving variable escape from Xchromosome inactivation (Peeters et al., 2014) are also amenable to testing using the X-inactivation models in Haplin.
Identifying associations between isolated OFCs and genes for clefting syndromes is not surprising, as it has long been recognized that Mendelian forms of clefting that phenocopy isolated clefts may provide important clues in identifying genes involved in isolated forms of clefts (Jugessur and Murray, 2005). One of the best examples to date is Van der Woude syndrome (VWS), in which hypodontia and lip pits are the only additional features distinguishing patients with VWS from those with isolated clefts. Mutations in interferon regulatory factor 6 (IRF6) were reported to cause VWS (Kondo et al., 2002), and, subsequently, we and others reported associations between IRF6 variants and isolated clefts (Zucchero et al., 2004;Jugessur et al., 2008Jugessur et al., , 2009bBeaty et al., 2013;Leslie et al., 2016b;Moreno Uribe et al., 2017). Our current identification of several genes associated with clefting syndromes, while studying isolated OFCs per se, supports the notion that there may be a wider spectrum of subclinical features beside the overt cleft itself, blurring the distinction between isolated and non-isolated clefts.
Indeed, a handful of subclinical features have been characterized in patients who were initially classified as "isolated" cleft cases, as well as in their immediate family members (Weinberg et al., 2006;Jugessur et al., 2009a;Marazita, 2012). The inclusion of one such subclinical feature-breaks in the orbicularis oris muscle in the mouth-revealed a pattern of familial segregation more akin to that of a Mendelian trait (Marazita, 2007;Neiswanger et al., 2007). This further supports the existence of a wider array of subphenotypes that are associated with the overt cleft, and the identification of genes underlying syndromic clefts while analyzing isolated OFCs samples, like we do in this paper, is consistent with previous descriptions of an "extended cleft phenotype" (Weinberg et al., 2006;Marazita, 2007Marazita, , 2012Jugessur et al., 2009a;Rahimov et al., 2012).
Another important factor to consider in genetic association analyses is whether associations identified in one ethnicity are generalizable to other ethnicities. DMD was the only gene in which associations were identified in both CPO and CL/P, in the model with X-inactivation, in both single-marker and haplotype analyses, and in Europeans only. The remaining associations were highly specific for ethnicity, cleft subtype and X-inactivation model, with none of the associations in the Asian sample overlapping with those in the European sample. For example, the associations with EGFL6, FGF13, DMD, and PRPS2 were only detected in the Asian sample and in the model with X-inactivation, whereas the associations with SPANXN4, ZNF157, and GABRE were only observed in the European sample. Furthermore, associations with SPANXN4 and ZNF157 were only detected in the model assuming no X-inactivation, whereas associations with GABRE were only detected in the model with X-inactivation. These findings challenge the common assumption that the remarkable phenotypic consistency of OFCs across different ethnicities is because at least a subset of the causal variants is shared across different ethnicities. The apparent lack of overlap between Asians and Europeans was also observed in our recent chromosome-wide analysis of Xlinked variants in the same GWAS dataset as here (Skare et al., 2017). Collectively, these findings underscore the importance of examining different ethnicities, X-inactivation models and cleft subtypes, even though this means adding more multiplicity to the analyses.
We observed a substantial overlap in the associations identified in the analyses based on the model with X-inactivation (Figure 2) and those identified in the analyses restricted to girls only (Figure 3). This was as expected, for the reasons provided in the Materials and Methods section. Regarding statistical power, our simulations indicated that there would be sufficient power for single-SNP analyses based on relative risk ratios (RRR) ≥ 2 (RRR ≥ 2.5 for the smaller sample of CPO triads) (Figure 4). It could be argued that the assumption of a RRR ≥ 2 or RRR ≥ 2.5 is optimistic in the context of OFCs, or most other complex traits for that matter. Our current analyses of X-linked PoO effects are novel and should be regarded as exploratory at this stage. With continuously accruing sample sizes of OFCs through multiple international collaborations, we envisage our new methods to gain wider currency in future analyses of X-linked PoO-effects in the larger pooled datasets.
This study benefited from being based on the largest collection of case-parent triads of OFCs to date, which enabled two major isolated cleft subtypes-CPO and CL/P-to be analyzed separately and without biases due to population stratification. In addition, having access to genotype data on two major ethnicities (Asian and European) enabled a more in-depth exploration of ethnicity-and sex-specific differences for each of the identified associations. The new methods presented here enabled an investigation of the hitherto untested possibility of PoO effects on the X chromosome, through the use of models with or without the assumption of X-inactivation in females. Limitations of the study include a lack of control triads to verify potential population transmission ratio distortions that could affect the PoO estimates. This could be the case if a genetic variant can cause early abortions that are not even registered as births. However, since the PoO RRR estimates are a ratio of two transmission distortions (from the mother and from the father), distortions unrelated to clefting, such as early abortions, would at least need to exhibit a parent-of-origin dependent pattern to substantially bias the RRR value. Furthermore, the lack of a comparable replication cohort precluded a formal validation of the current findings, which currently makes it difficult to interpret their relevance to clinical research.
To summarize, we developed new methods for a robust investigation of PoO-effects on the entire X chromosome and present the first results of such a screening in the largest available collection of OFCs triads to date. Our analyses identified associations with several genes, in particular with DMD, FGF13, and EFG16 that offer biologically plausible links to OFCs. The additional identification of associations with other genes on chromosomal region Xp22.2 deserves further scrutiny because several genes for clefting syndromes appear to cluster in this region. Except for DMD, in which associations were identified in both CPO and CL/P, the rest of the associations were highly specific for ethnicity, cleft subtype and X-inactivation model, highlighting the importance of performing such subanalyses, despite the need to adjust for additional multiple testing. More generally, the new methodology presented here is easily adaptable to the study of X-linked PoO effects in other outcomes that use a family-based study design.

AUTHOR CONTRIBUTIONS
Conception of the work, study design, data analysis and interpretation, manuscript preparation and final manuscript approval: AJ, ØS, HG, RL, ØH, MG, and JR. Statistical modeling and software design: HG, ØS, MG, JR, and ØH. Funding acquisition: AJ, HG, and RL.

ACKNOWLEDGMENTS
We are grateful to all the participating families, medical staff and field workers at all the 13 recruitment sites for helping to make this study possible.