DNA methylation differences in monozygotic twins with Van der Woude syndrome

Introduction: Van der Woude Syndrome (VWS) is an autosomal dominant disorder responsible for 2% of all syndromic orofacial clefts (OFCs) with IRF6 being the primary causal gene (70%). Cases may present with lip pits and either cleft lip, cleft lip with cleft palate, or cleft palate, with marked phenotypic discordance even among individuals carrying the same mutation. This suggests that genetic or epigenetic modifiers may play additional roles in the syndrome’s etiology and variability in expression. We report the first DNA methylation profiling of 2 pairs of monozygotic twins with VWS. Our goal is to explore epigenetic contributions to VWS etiology and variable phenotypic expressivity by comparing DNAm profiles in both twin pairs. While the mutations that cause VWS in these twins are known, the additional mechanism behind their phenotypic risk and variability in expression remains unclear. Methods: We generated whole genome DNAm data for both twin pairs. Differentially methylated positions (DMPs) were selected based on: (1) a coefficient of variation in DNAm levels in unaffected individuals < 20%, and (2) intra-twin pair absolute difference in DNAm levels >5% (delta beta > | 0.05|). We then divided the DMPs in two subgroups for each twin pair for further analysis: (1) higher methylation levels in twin A (Twin A > Twin B); and (2) higher methylation levels in twin B (Twin B >Twin A). Results and Discussion: Gene ontology analysis revealed a list of enriched genes that showed significant differential DNAm, including clef-associated genes. Among the cleft-associated genes, TP63 was the most significant hit (p=7.82E-12). Both twin pairs presented differential DNAm levels in CpG sites in/near TP63 (Twin 1A > Twin 1B and Twin 2A < Twin 2B). The genes TP63 and IRF6 function in a biological regulatory loop to coordinate epithelial proliferation and differentiation in a process that is critical for palatal fusion. The effects of the causal mutations in IRF6 can be further impacted by epigenetic dysregulation of IRF6 itself, or genes in its pathway. Our data shows evidence that changes in DNAm is a plausible mechanism that can lead to markedly distinct phenotypes, even among individuals carrying the same mutation.


Introduction
While most of the cases of cleft lip and palate are nonsyndromic (1 in 700-2,500 births), orofacial clefts have currently been associated with over 460 syndromes with known molecular basis [OMIM (https://www.omim.org)]. Van der Woude Syndrome (VWS) is among the common syndromic form of clefts, affecting 1/35,000 individuals (1). It is an autosomal dominant condition that accounts for 2% of all cases with orofacial clefts (2,3). About 44% of the cases with VWS display lip pits as their sole phenotype (1,4,5), but cases with VWS can also show cleft lip (CL), cleft lip and cleft palate (CLP) or cleft palate (CP), all with or without lip pits demonstrating the marked variable expression. Mutations in the gene IRF6 account for 70% of VWS cases, while mutations in the gene GRHL3 account for another 5%. However, in approximately 25% of VWS cases, the causal mutation and gene remain unknown.
Mixed clefting types is a common observation in different relatives with VWS within the same family (6)(7)(8)(9), further elucidating the variable phenotype observed even among cases carrying the same mutations. Other VWS features include greater prevalence of midfacial hypoplasia, and hypodontia (10)(11)(12). Moreover, it has been shown that patients with VWS are more likely to have wound complications following cleft repair, including fistulae recurrence (13), and are more likely to require pharyngeal flap surgery (10)(11)(12).
IRF6 is a transcription factor with a highly conserved helix-turnhelix DNA binding domain and a less conserved SMIR/IAD protein-binding domain (14). IRF6 is the only member of the IRF gene family involved in craniofacial development and the mutations that cause VWS are non-randomly distributed, with most occurring in the DNA-binding domain (exons 3 and 4) and the protein-binding domain (exons7-9) (15). Since the discovery of IRF6 as the first causal gene for VWS (8), more than 300 mutations have been identified in cases with VWS and PPS (15)(16)(17)(18).
Although most of the causal mutations are classified as missense mutations, nonsense and frameshift mutations have also been reported. The VWS mutations characterized so far exert their effect on the phenotype via haploinsufficiency or a dominant negative effect.
Animal studies have shown that disruption of IRF6 in mice leads to cleft lip and/or palate (CL/P) in addition to oral epithelial adhesions, poor epithelial barrier functions, and improper skin stratification, which suggests that oral epithelium plays an important role in directing palate development (19,20). In addition, molecular and histologic analyses showed that IRF6 mutated mice embryos lack periderm cells at the sites of oral adhesions (21), which leads to abnormal epithelial adhesions between the palatal shelves and the lingual, mandibular, and maxillary surfaces, preventing the proper elevation and fusion of the palatal shelves. Besides the abnormal adhesions, the medial edge epithelium (MEE) located at the medial edge of the palatal shelves failed to dissolve for proper palatal fusion also leading to a cleft palate. Similar abnormal bilateral adhesions leading to a cleft palate were observed in mice that were heterozygous for a mutated allele of GRHL3, the second VWS locus (22). Thus, both IRF6 and GRHL3 are essential to develop a normal oral periderm, necessary for palatogenesis (22).
While various established mutations in IRF6 account for the presence of the syndrome, they do not explain the variability and different levels of phenotypic severity. Since the first publication of IRF6 being the causal gene, authors have discussed the possible causes of the phenotypic variability observed even among individuals with the same mutation. Studies have suggested the action of stochastic factors or modifier genes on IRF6 function, but epigenetic factors that could play a role remain unexplored. Another important observation is that the mixed clefting phenotype that is common to VWS families is rare in nonsyndromic cleft families. However, this phenotypic variability is also observed in TP63 (23,24).
Despite decades of intriguing observations of phenotypic discordance, the first etiological genetic variant for VWS was only discovered in 2002 when Kondo et al. (8) reported a unique, and genetically confirmed, pair of monozygotic twins in which one twin is affected with bilateral CLP and lip pits and the other is unaffected. With the use of genetic mapping and DNA sequencing, an IRF6 mutation was identified in the affected twin and was absent in the unaffected co-twin which confirmed IRF6 as the first causal gene for VWS.
Later in 2011, another pair of monozygotic twins with VWS (25) was reported. In this case, both twins are affected with VWS, but despite carrying the same genetic mutation in IRF6, their phenotypes are markedly different, with one twin presenting bilateral cleft lip, cleft palate, and lower lip pits, while her twin sister has only lower lip pits. The variable expressivity of VWS phenotype and the phenotypic discordance even among monozygotic twins who carry the same mutation suggests the role of modifiers factors, which can be genetic or epigenetic.
While MZ twins share identical DNA sequence, the fact that they can be phenotypically distinct offers important insights into the role of environmental factors. With the increased number of epigenetic studies, evidence suggests that epigenetic mechanisms may be potential mediators between environment and phenotypic expression.
DNA methylation of cytosines at CpG dinucleotides was first proposed as a mechanism of mammalian gene regulation in 1975 (3,4), and has been since then, the most broadly studied epigenetic mark. It occurs predominantly at the carbon-5 position of symmetrical CpG (cytosine and guanine separated by a phosphate) dinucleotides (5 mC). The state of DNA methylation is mitotically heritable through the activity of DNA methyltransferases (DNMTs), and it is essential for control of gene expression. Epigenetic marks like DNA methylation are essential for cell differentiation and preservation of tissue homogeneity. During development and throughout life, parent-cells use epigenetic marks to ensure that their daughter-cells will differentiate properly and function appropriately, and this message may persist through thousands of cell divisions for the lifetime of the organism, unless they are actively erased (by demethylase enzymes) or lost through epimutations. Typically, methylation of CpG sites in promoter regions of genes inhibits gene expression either due to the inability of specific transcription factors to bind methylated CpGs or the recruitment of methyl-CpG-binding proteins with transcription repression activity (26-28).
Aberrant DNA methylation patterns are universally recognized as playing an important role in human diseases, including monogenic and complex disorders. Recent studies indicate that abnormal methylation levels of key genes and/or regulatory elements are involved in heart disorders (29-34), depression and anxiety (35-42), several craniofacial syndromes (43-47), as well as nonsyndromic cleft lip and palate (48-53). In addition, several recent twin studies have shown additional evidence that DNA methylation may play an important role in phenotypic discordance (35, 48, 54-58).
In this article, we report the first DNA methylation profiles of monozygotic twins who present with discordant affection status and phenotypic expression for VWS. Twin pair 1 was previously reported in the study that discovered IRF6 as the first causal gene for VWS (8). Even though they have been confirmed as monozygotic twins, one of them has VWS while the other is unaffected. This discordance in affection status is most likely a result of an early post-twinning mutation in the affected twin. Twin pair 2 was also previously reported (25), but both females are affected by VWS and carry the same mutation in IRF6. Despite being monozygotic and having the same mutation, their phenotype is markedly different.
Our goal is to explore the epigenetic contributions to VWS etiology by comparing the DNA methylation profiles of these two pairs of monozygotic discordant twins. While the mutation that causes VWS in these two twin pairs are known, additional mechanisms behind phenotypic risk and expression variability, especially in the second twin pair, remain underexplored.

Participants
We utilized DNA samples extracted from whole blood of 2 pairs of monozygotic twins with VWS. For the purposes of this study, we will refer to these subjects as "twin pair 1" (8) (no photo available) and "twin pair 2" (25) (Figure 1).
In addition to the twins with VWS, we utilized existing epigenetic data from unaffected individuals as controls (n = 13 male controls for twin pair 1, and n = 12 female controls for twin pair 2), also obtained from whole blood samples. All samples have been collected as part of previous studies after approval by their respective IRBs and signing of informed consent from parents or guardians.

DNA sequencing
DNA sequencing data were obtained from the respective previous studies (8,25). For twin pair 1. Sanger sequencing of IRF6 coding regions revealed a heterozygous de novo E92X nonsense mutation in exon 4 present in the affected twin. The mutation was absent in the unaffected twin and parents ( Figure 2). For twin pair 2, Sanger sequencing of IRF6 coding regions revealed that both twins and their affected father (lower lip pits) shared the same IRF6 mutation, Y97C, also located in exon 4 ( Figure 2).

Sample quality and bisulfite conversion
DNA quality was assessed and quantified with DropSense96 ™ and Qubit ™ dsDNA High Sensitivity Range Assay Kit (ThermoFisher Scientific). After quantification of each specimen, 500 ng of genomic DNA was submitted to bisulfite conversion using the EZ DNA Methylation ™ Kit (Zymo Research) according to manufacturer's protocol.

DNA methylation data
Genome-wide DNAm profiles were generated using Illumina's Infinium Methylation EPIC BeadChip assay (EPIC array) (Illumina, San Diego, CA, United States). The assay determines DNAm levels in more than 850,000 CpG sites and provides coverage of CpG islands, RefSeq genes, ENCODE open chromatin, ENCODE transcription factor-binding sites, and FANTOM5 enhancers. The assay was performed according to the manufacturer's instructions and scanned with the Illumina iScan System. To avoid batch effects, both members of each twin pair were assayed on the same array and inter batch duplicate samples were used as internal controls. As expected, duplicate samples showed high degrees of correlation (r 2 > 0.99).
many missing values or zero variability of their methylation values were eliminated next. Low-quality probes were removed using the Greedycut algorithm, based on a detection P-value threshold of 0.05, as implemented in the RnBeads package. Probes with less than three beads and probes with a missing value in at least 5% of the samples were also removed. Finally, probes that overlapped with known single nucleotide polymorphism (SNPs) as assigned by the ChAMP per the version of dbSNP derived from Genome Reference Consortium Human Build 37 patch release 10 (GRCh37.p10), or that are located on sex chromosomes were also removed (59). The methylation level for each probe was measured as a beta value, calculated from the ratio of the methylated signals vs. the total sum of unmethylated and methylated signals, ranging between 0 (no methylation) and 1 (full methylation). This value was used for biological interpretation, visualization, and calculation of the absolute methylation difference (Δβ = |Twin1 β-Twin2 β|) of each pair separately. For Twin pair A, which is a male pair, we used beta values of 13 male unaffected individuals to calculate the coefficient of variation in methylation levels of each CpG site after QC; as for Twin pair B, we used beta values of 12 female unaffected individuals.

Cell-type heterogeneity correction
We used the reference-based algorithm, EpiDISH (62), to performed an in-silico deconvolution of the DNA methylation data. The package allows for dissection of intrasample heterogeneity in EWAS and to infer the proportion of a priori known cell subtypes present in a mixture of cell types such as in blood. The estimated cell type composition was used in a logistic regression to correct for cell type heterogeneity.

Identification of differentially methylated positions (DMPs)
We selected differentially methylated positions (DMPs) based on the following criteria: (1) a coefficient of variation in methylation levels of unaffected individuals of less than 20%, and (2) intra-twin pair difference in methylation levels of at least 5% (Δβ > |0.05|). This strategy assumes that sites that exhibit larger inter-individual variation in methylation levels among the unaffected individuals are less likely to contribute to clefting. We used the DMPs coordinates to annotate them to nearby genes and potential regulatory elements, such as known craniofacial enhancers that are active during early embryonic development (63). We then divided the DMPs in two subgroups for each pair for further analysis: (1) DMPs with higher methylation levels in twin A of each pair (Twin A > Twin B); and (2) DMPs with higher methylation levels in twin B of each pair (Twin B > Twin A). After annotating the DMPs to nearby genes based on genomic coordinates, we compared the list of genes to which the DMPs were annotated to a list of genes that contain transcription factors binding sites for IRF6 (obtained from GeneCards https://www.genecards.org/). In addition, we compared them to a list of genes previously associated with any type of OFCs (Supplementary Table S1).

Gene ontology and pathways
We performed GO and enrichment analysis for the subgroup Twin A > Twin B and Twin B > Twin A groups separately using GREAT (http://great.stanford.edu) (64). We used a.bed file containing the genomic coordinates of all CpG sites interrogated that passed the initial QC (n = 735,653) as background.

Results
Since the two twin pairs were of opposite gender and distinct ages, we analyzed both pairs separately with the same analytical pipeline and compared their results. Twin Pair 1 is a male monozygotic twin pair; the affected twin (twin 1A) was shown to carry a nonsense mutation (E92X) in the exon 4 of IRF6 gene, which was absent in the unaffected co-twin (twin 1B) and parents. The affected twin in this pair presented with cleft lip and cleft palate and bilateral lower lip pits. Twin Pair 2 is a female monozygotic twin pair; both girls and their affected father carry the same missense mutation (Y97C) in the exon 4 of IRF6 gene.
Although all three individuals share the same mutation, twin 2A presents with cleft lip and cleft palate and bilateral lower lip pits, while twin 2B and their father present only bilateral lower lip pits (no epigenetic data was generated for the father).

Results for twin pair 1 (Twin 1a = CLP + lip pits; Twin 1b = unaffected)
The results for twin pair 1 are shown below and summarized in Figure 3.

Differentially methylated positions (DMPs)
We identified a total of 19,196 DMPs according to the inclusion criteria listed above, the DMPs were annotated to a total of 10,811 genes. Of all DMPs, 15,709 showed higher methylation levels (>5% difference) in twin 1A (Twin 1A > Twin 1B) and 3,487 showed higher methylation levels (>5% difference) in twin 1B (Twin 1B > Twin 1A); these were annotated to 9,935 and 3,702 genes, respectively. We found that 250 of the genes that contained DMPs in the Twin 1A > Twin 1B group and 126 genes in the Twin 1B > Twin 1A have been associated with some type of OFC. Moreover, 78 out of the 9,935 genes that contained DMPs in the Twin 1A > Twin 1B group also contained a TFBS for IRF6, while, for the Twin 1B > Twin 1A group, 30 out of the 3,702 annotated genes were IRF6 targets.

Gene ontology analysis using GREAT
Gene ontology (GO) analysis for twin pair 1 returned a set of 170 genes enriched for the DMPs with >5% difference in methylation for Twin 1A > Twin 1B (Table 1). From these, 22 genes have been previously associated with some type of orofacial cleft (genes in bold on Table 1), with TP63 gene being the top cleft-associated hit (P = 7.82E −12 ). We repeated the same process with the DMPs from the Twin 1B > Twin 1A group and observed 91 genes enriched (Table 2), with 11 of them previously associated with orofacial clefts (genes in bold on Table 2). For this group, TNF (P = 8.69E-09) and PAX7 (2.82E-03) were among the top hits.

Results for twin pair 2 (Twin 2a = CLP + lip pits; Twin 2b = lip pits)
The results for twin pair 2 are explained below and summarized in Figure 4.

Differentially methylated positions (DMPs)
We identified a total of 56,367 DMPs according to the inclusion criteria listed above (Δβ > |0.05|), the DMPs were annotated to a total of 15,424 genes. Of all DMPs, 16,962 showed higher methylation levels in twin 2A (Twin 2A > Twin 2B) and 39,405 showed higher methylation levels in twin 2B (Twin 2B > Twin 2A); these were annotated to 10,081 and 14.466 genes, respectively. We found that 257 of the genes that contained DMPs in the Twin 2A > Twin 2B group and 87 genes in the Twin 2B > Twin 2A have been associated with some type of OFC. Moreover, 303 out of the 10,081 genes that contained DMPs in the Twin 2A > Twin 2B group also contained a TFBS for IRF6, while, for the Twin 2B > Twin 2A group, 118 out of the 14,466 annotated genes that contained DMPs were IRF6 targets.

Gene ontology analysis using GREAT
Gene ontology (GO) analysis for twin pair 2 returned a set of 230 genes enriched for the DMPs with >5% difference in methylation for Twin 2A > Twin 2B (Table 3). From these, 6 genes have been previously associated with some type of orofacial cleft (genes in bold on Table 3). We repeated the same process with the DMPs from the Twin 2B > Twin 2A group and observed 169 genes enriched (Table 4), with 4 of them previously associated with orofacial clefts (genes in bold on Table 4). For twin pair 2, cleft-associated genes like TNF (P = 2.81E-11) and ARID5B (1.73E-07) were among the top hits.

Discussion
Monogenic diseases, like VWS, often feature highly variable phenotypes, despite a usually well-defined genetic cause for the disease. Modifier genes and epigenetic mechanisms are believed to influence phenotypic risk and varaible expressivity. In this article, we see the rare event of monozygotic twins carrying the same mutation (twin pair 2), who display extremely divergent phenotypes for VWS.
Monozygotic twins arise from the same fertilized oocyte and are, therefore, believed to have the same DNA sequence. Postzygotic mutations are responsible for a substantial proportion of de novo mutations in humans and have been shown to contribute to disease phenotypic variability, including among twins (65)(66)(67)(68)(69)(70)(71). In twins, the timing of postzygotic mutations determines whether they are present in both twins (pre-twinning mutation) or in only one twin (post-twinning mutation) (68). Our twin pair 1 is an example of post-twinning mutation, where only twin 1A became a carrier of the IRF6 mutation, while twin 1B does not have the causal mutation and is, therefore, unaffected. In twin pair 2, both females inherited the IRF6 mutation from their affected father ( Figure 2).
In addition to playing an essential role in estimating phenotypic variability, twin studies offer an opportunity to study epigenetic variation as a quantitative trait. The monozygotic twin discordance rate observed in OFCs (≥50%) suggests that unexplained environmental and epigenetic factors play an etiological role. Epigenetic studies have shown that changes in DNA methylation play a role in nonsyndromic (48-51, 53, 72, 73) and syndromic forms of OFCs. Among the cleft syndromes most commonly associated with differences in DNA methylation are Kabuki and Charge syndromes (43, 44, 46, 47, 74, 75), both of which have specific DNA methylation signatures. We describe the first whole genome methylation profiling of MZ twins discordant for VWS and discuss the possible roles that differences in DNAm may have in phenotypic discordance.
Gene ontology and enrichment analysis of the thousands of CpG sites with differential methylation identified for each pair revealed a list of enriched genes that included, but were not limited to, known cleft-associated genes. The top genes that showed different levels of DNAm between each twin pair include genes like HOXA5, LEPREL1, ARHGEF10,  OR4E2, DAD1, AZU1, TP63, among others (please refer to Tables 1-4). From these, TP63 is known to cause syndromic forms of OFCs and is associated with nonsyndromic clefts; in addition, TP63 is directly involved in the activation of IRF6.
Mutations in the gene encoding the transcription factor interferon regulatory factor 6 (IRF6), cause VWS and PPS, both characterized by ectodermal anomalies and CL/P or CP; mutations in the transcription factor p63 cause autosomal dominant ectodermal dysplasia syndromes such as ectrodactyly ectodermal dysplasia-clefting (EEC), which is also characterized by CL/P or CP and defects in ectoderm-derived tissues, such as the epidermis, hair, teeth, and glands (76). Moreover, mutation in both genes have been implicated in nonsyndromic OFCs (77).
Studies have shown that IRF6 is transcriptionally activated by TP63 and, in turn, induces the TP63 proteasome-mediated downregulation, thereby limiting epithelial cells proliferative potential (78), including in the medial epithelial seam around the time of palate closure. Such process is critical for normal palatal fusion (79)(80)(81)(82).
Among the cleft-associated genes in our dataset, TP63 was the most significant hit (P = 7.82E-12). Both twin pairs presented differential DNAm levels in CpG sites in/near TP63 (Twin 1A > Twin 1B and Twin 2A < Twin 2B). Interestingly, in the twin pair 2, the twin with only lip pits (twin 2B) presents higher levels of methylation in the promoter region of the TP63 gene. It has been shown that TP63 downregulation necessitates normal IRF6 function, and that this task is absent in irf6 mutant mice, who remain p63 positive in the palatal medial edge epithelial (83). Given that TP63 and IRF6 work together in a regulatory loop to coordinate epithelial proliferations, it is plausible that DNAm can modify the effects of the etiological variant and lead to markedly distinct phenotypes, even among individuals carrying the same mutation. Since methylation of promoters is often associated with gene silencing, it is possible that this epigenetic mechanism compensates for the inability of the mutated IRF6 to downregulate TP63, therefore contributing to a less severe phenotype in twin 2B.
There are some limitations in our study. First, our study includes two pairs of MZ twins discordant for affection status and phenotype, but given the rarity of the condition, we consider the presented data of importance to better understand the additional factors that may contribute to phenotypic discordance. We are aware that the control groups are relatively small and the age among the controls individuals varied; and we also acknowledge the fact that the DNAm profiles were obtained from blood DNA and that epigenetic marks can be cell type specific. We did our best to correct for cell type heterogeneity by using wellestablished bioinformatic tools. Finally, it is known that the cause of the VWS in the affected twins are the mutation reported by Kondo et al. and Jobling et al. (2011); however, this is the first study reporting a genome-wide epigenetic profiling of the syndrome, especially using the powerful discordant twin design.
Despite its limitations, the in silico functional analysis methods that we used, help us gain insight into the biology underlying the regions in which we detected differential methylation and allow us to explore whether pathways or processes are enriched among our best hits. We used over-representation analysis methods to identify potential pathways that could be affected by the observed changes in DNA methylation.
Clinical divergence between patients complicates diagnosis and genetic counseling, this is especially true when syndromic and nonsyndromic cases have overlapping clinical features, like VWS and nonsyndromic clefts. In addition to clinical implications and translational potential, studies of etiological factors of syndromic forms of clefting have greatly contributed to the understanding of the much more complex etiology of nonsyndromic cases. Therefore, our study emphasizes the need to understand the molecular mechanisms underlying phenotypic variability and the role of epigenetic factors in disease etiology.

Supplementary Material
Refer to Web version on PubMed Central for supplementary material.

Funding
This work was supported by grants from the National Institute of Dental and Craniofacial Research (grant no. K01DE027995, ALP and R01DE08559, JCM).

Data availability statement
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.  Pedigrees of the two monozygotic twin pairs in this study. Twin 1A is affected with CLP + LP and carries a nonsense mutation E92X in exon 4 of IRF6. The mutation is absent in Twin 1B. Twin 2A is affected with CLP + LP and carries a missense mutation Y97C, also located in exon 4 of IRF6. Twin 2B carries the same mutation but shows LP only (same as father).